The 15-question SQL review checklist — and the smells it catches
Run this list before merging any SQL. It catches the bugs your tests don't, and it's faster than a one-day production incident.
Why a checklist
A team I knew used to do SQL review by vibes — "looks fine, ship it". The bugs that escaped review were the same five bugs every quarter: SELECT * that broke when the source added a column; a fanout-induced doubled metric; a WHERE col != 'X' that silently dropped NULLs; a magic number nobody remembered the meaning of.
The team adopted a 15-question checklist. PRs took 5 minutes longer; production incidents per quarter dropped by ~70%. The checklist isn't fancy. It's just not forgetting. Below, the version that has earned its keep.
The 15 questions
- Grain — what does one row of the result mean? In one sentence.
- Many-to-many JOIN risk? Did you check row count before and after every JOIN?
SELECT *— any reason? In a model, almost never.- Filters — are they index- and partition-friendly?
WHERE date_trunc(...) = ...blocks pushdown. DISTINCT— papering over a fanout? If yes, fix the model.- NULL handling — what happens if the values are NULL? Are you using
=/!=on a nullable column? - Tests —
unique,not_null,relationships,accepted_valueson the right columns? - Idempotent — safe to re-run? UPSERT, not INSERT?
- Scaling — how does runtime change at 10× and 100× data?
- Lineage — what downstream models / dashboards depend on this? (Run
dbt ls -m '+model_name+'.) - Documentation — is the model's purpose written down? In one sentence.
- Owner — does this table have a named owner in
meta? - Permissions — does the right (smallest) set of roles have access?
- PII — does this query unnecessarily read PII?
- Monitoring — freshness / quality alerts in place?
Three smells worth refactoring on sight.
SELECT *in a model — when the source adds a column, your model silently changes shape. Downstream queries that reference positional columns break.DISTINCTafter a JOIN — almost always masking a fanout. Find which side has duplicates on the join key, fix the model.- A magic number in a
WHERE—WHERE plan_id = 17. Six months from now, who knows what plan 17 is? Use a seed ({{ ref('plan_codes') }}) or at least a comment.
Apply checklist item #1 (grain) and #5 (DISTINCT after JOIN). The query below has *both* problems: SELECT'ing customer details after a JOIN with multiple paid orders per customer, then DISTINCT to dedupe. Rewrite as a single SELECT that aggregates instead of dedupes. Return `customer_id`, `name`, `paid_revenue_cents` (sum), ordered by `customer_id`.
Takeaway: 15 questions, 5 minutes per PR, 70% fewer incidents. Catch the smells (SELECT *, DISTINCT-after-JOIN, magic numbers) on sight. State the grain upfront. Tests + lineage + ownership + monitoring complete the picture.