Codabra

The 15-question SQL review checklist — and the smells it catches

Run this list before merging any SQL. It catches the bugs your tests don't, and it's faster than a one-day production incident.

Why a checklist

A team I knew used to do SQL review by vibes"looks fine, ship it". The bugs that escaped review were the same five bugs every quarter: SELECT * that broke when the source added a column; a fanout-induced doubled metric; a WHERE col != 'X' that silently dropped NULLs; a magic number nobody remembered the meaning of.

The team adopted a 15-question checklist. PRs took 5 minutes longer; production incidents per quarter dropped by ~70%. The checklist isn't fancy. It's just not forgetting. Below, the version that has earned its keep.

The 15 questions

  1. Grain — what does one row of the result mean? In one sentence.
  2. Many-to-many JOIN risk? Did you check row count before and after every JOIN?
  3. SELECT * — any reason? In a model, almost never.
  4. Filters — are they index- and partition-friendly? WHERE date_trunc(...) = ... blocks pushdown.
  5. DISTINCT — papering over a fanout? If yes, fix the model.
  6. NULL handling — what happens if the values are NULL? Are you using =/!= on a nullable column?
  7. Testsunique, not_null, relationships, accepted_values on the right columns?
  8. Idempotent — safe to re-run? UPSERT, not INSERT?
  9. Scaling — how does runtime change at 10× and 100× data?
  10. Lineage — what downstream models / dashboards depend on this? (Run dbt ls -m '+model_name+'.)
  11. Documentation — is the model's purpose written down? In one sentence.
  12. Owner — does this table have a named owner in meta?
  13. Permissions — does the right (smallest) set of roles have access?
  14. PII — does this query unnecessarily read PII?
  15. Monitoring — freshness / quality alerts in place?

Three smells worth refactoring on sight.

  • SELECT * in a model — when the source adds a column, your model silently changes shape. Downstream queries that reference positional columns break.
  • DISTINCT after a JOIN — almost always masking a fanout. Find which side has duplicates on the join key, fix the model.
  • A magic number in a WHEREWHERE plan_id = 17. Six months from now, who knows what plan 17 is? Use a seed ({{ ref('plan_codes') }}) or at least a comment.

Apply checklist item #1 (grain) and #5 (DISTINCT after JOIN). The query below has *both* problems: SELECT'ing customer details after a JOIN with multiple paid orders per customer, then DISTINCT to dedupe. Rewrite as a single SELECT that aggregates instead of dedupes. Return `customer_id`, `name`, `paid_revenue_cents` (sum), ordered by `customer_id`.

Takeaway: 15 questions, 5 minutes per PR, 70% fewer incidents. Catch the smells (SELECT *, DISTINCT-after-JOIN, magic numbers) on sight. State the grain upfront. Tests + lineage + ownership + monitoring complete the picture.