Project structure: the staging→intermediate→marts pattern
A layout that scales from 10 to 1000 models without becoming spaghetti. The convention every dbt team converges on (eventually).
The 800-model project that nobody could refactor
A team I knew had 800 dbt models, no naming convention, no folder structure. Every analyst put new models in models/ flat. After two years of growth, every refactor terrified everyone — change a base table and you didn't know which 47 of 800 models would break.
The fix wasn't more discipline; it was structure. They migrated to staging→intermediate→marts. After three weeks of relabeling files, they could see the lineage on a single page. Refactors became routine.
Structure isn't aesthetic. It's a load-bearing feature of any project that lives longer than 6 months.
models/
staging/ # 1 model per source table; cleaning + renaming only
stg_oltp__orders.sql
stg_oltp__customers.sql
stg_stripe__payments.sql
schema.yml # tests for staging
intermediate/ # joins, business logic, multi-source
int_orders_with_customer.sql
int_payment_attempts_per_order.sql
marts/ # final star schemas, exposed to BI
finance/
fct_revenue_daily.sql
dim_customer.sql
product/
fct_user_activity_daily.sql
schema.yml
tests/ # custom singular tests (anything more complex than YAML generics)
macros/ # reusable Jinja
seeds/ # small reference CSVs (e.g. country code → region)
analyses/ # one-off queries that aren't models — checked into git
snapshots/ # SCD2 dimensions, dbt's snapshot featureThe same skeleton, drawn as a dependency DAG
sources oltp.*, stripe.*
↓
staging/ stg_oltp__orders, stg_oltp__customers, stg_stripe__payments
↓ (rename + cast only, no joins)
intermediate/ int_orders_with_customer, int_payment_attempts
↓ (joins, business logic, multi-source)
marts/ fct_revenue_daily, dim_customer
(final star schemas, exposed to BI)
Arrows are {{ ref(...) }} calls — that's how dbt builds the lineage graph and decides the run order. Dependencies flow only downward: a staging model ref-ing a marts model is a circular import and dbt errors at compile.
Three rules that make this scale.
- Naming convention is non-negotiable:
stg_<source>__<table>for staging,int_<noun>_<verb>for intermediate,fct_<noun>anddim_<noun>for marts. Read a model name and you know what layer it's in and what it does. {{ ref('model_name') }}everywhere, never raw schema-qualified table names. Lineage breaks the moment you writeFROM marts.fct_ordersdirectly.- Staging models do only renames and type casts — no joins, no business logic. The boundary between layers is enforced by everyone, not by dbt.
Write what a `stg_oltp__orders.sql` would compile to (post-Jinja). The convention: SELECT every source column, rename to snake-case if needed, cast to the right type, no business logic. Output: `order_id` (bigint), `customer_id` (bigint), `status` (text), `created_at` (timestamptz), `total_cents` (integer), ordered by `order_id`.
Takeaway: staging → intermediate → marts is a load-bearing convention. Naming is non-negotiable. {{ ref(...) }} is non-negotiable. With those three in place, a dbt project remains readable at 800 models.