← Deep SQL & Production Data Engineering

Tests, incremental models, and the dbt run that didn't break

The four generic tests you'll use 95% of the time, and the incremental pattern that turns nightly full-refresh into a 30-second job.

schema.yml — the four generic tests every model deserves

version: 2
models:
  - name: fct_revenue_daily
    description: One row per (day, country). Grain stated; tested.
    columns:
      - name: day
        tests:
          - not_null
      - name: country
        tests:
          - not_null
          - accepted_values:
              values: ['US', 'GB', 'FI', 'NL', 'DE', 'JP']
      - name: revenue_cents
        tests:
          - not_null
    tests:
      # Composite uniqueness: (day, country) must be unique together.
      - dbt_utils.unique_combination_of_columns:
          combination_of_columns: [day, country]

An incremental model with a watermark and idempotent MERGE

{{ config(
    materialized='incremental',
    unique_key='order_id',
    on_schema_change='append_new_columns'
) }}

SELECT *
FROM {{ source('oltp', 'orders') }}
{% if is_incremental() %}
  -- Only pull rows updated since last run, with a 1-hour overlap to catch late writes.
  WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }}) - INTERVAL '1 hour'
{% endif %}

The incremental trap that doubles your numbers.

Missing unique_key means dbt does an INSERT, not a MERGE. Late-arriving updates create duplicate rows. Your daily revenue silently doubles for the affected window.

Always set unique_key on incremental models. Always test unique_combination_of_columns on the natural grain. The two together prevent the duplication; one without the other is theatre.

Implement the *grain test* manually — what `unique_combination_of_columns` would compile to. Return one row, one column `is_unique`, with value `true` if `(order_id)` is unique across all `orders` (it should be — it's the primary key).

Takeaway: tests are not optional. Generic YAML tests catch 95% of bugs (not_null, unique, relationships, accepted_values). Incremental models need unique_key and a composite-uniqueness test. The combination is what turns a nightly job from "hope it works" to "watched, tested, alerted".