← Deep SQL & Production Data Engineering

Project structure: the staging→intermediate→marts pattern

A layout that scales from 10 to 1000 models without becoming spaghetti. The convention every dbt team converges on (eventually).

The 800-model project that nobody could refactor

A team I knew had 800 dbt models, no naming convention, no folder structure. Every analyst put new models in models/ flat. After two years of growth, every refactor terrified everyone — change a base table and you didn't know which 47 of 800 models would break.

The fix wasn't more discipline; it was structure. They migrated to staging→intermediate→marts. After three weeks of relabeling files, they could see the lineage on a single page. Refactors became routine.

Structure isn't aesthetic. It's a load-bearing feature of any project that lives longer than 6 months.

The dbt project skeleton — convention over invention

models/
  staging/         # 1 model per source table; cleaning + renaming only
    stg_oltp__orders.sql
    stg_oltp__customers.sql
    stg_stripe__payments.sql
    schema.yml     # tests for staging
  intermediate/    # joins, business logic, multi-source
    int_orders_with_customer.sql
    int_payment_attempts_per_order.sql
  marts/           # final star schemas, exposed to BI
    finance/
      fct_revenue_daily.sql
      dim_customer.sql
    product/
      fct_user_activity_daily.sql
      schema.yml
tests/             # custom singular tests (anything more complex than YAML generics)
macros/            # reusable Jinja
seeds/             # small reference CSVs (e.g. country code → region)
analyses/          # one-off queries that aren't models — checked into git
snapshots/         # SCD2 dimensions, dbt's snapshot feature

The same skeleton, drawn as a dependency DAG

  sources         oltp.*, stripe.*
     ↓
  staging/        stg_oltp__orders, stg_oltp__customers, stg_stripe__payments
     ↓            (rename + cast only, no joins)
  intermediate/   int_orders_with_customer, int_payment_attempts
     ↓            (joins, business logic, multi-source)
  marts/          fct_revenue_daily, dim_customer
                  (final star schemas, exposed to BI)

Arrows are {{ ref(...) }} calls — that's how dbt builds the lineage graph and decides the run order. Dependencies flow only downward: a staging model ref-ing a marts model is a circular import and dbt errors at compile.

Three rules that make this scale.

Naming convention is non-negotiable: stg_<source>__<table> for staging, int_<noun>_<verb> for intermediate, fct_<noun> and dim_<noun> for marts. Read a model name and you know what layer it's in and what it does.
{{ ref('model_name') }} everywhere, never raw schema-qualified table names. Lineage breaks the moment you write FROM marts.fct_orders directly.
Staging models do only renames and type casts — no joins, no business logic. The boundary between layers is enforced by everyone, not by dbt.

Write what a `stg_oltp__orders.sql` would compile to (post-Jinja). The convention: SELECT every source column, rename to snake-case if needed, cast to the right type, no business logic. Output: `order_id` (bigint), `customer_id` (bigint), `status` (text), `created_at` (timestamptz), `total_cents` (integer), ordered by `order_id`.

Takeaway: staging → intermediate → marts is a load-bearing convention. Naming is non-negotiable. {{ ref(...) }} is non-negotiable. With those three in place, a dbt project remains readable at 800 models.