Codabra

The six quality dimensions — vocabulary that ends taste arguments

A short shared vocabulary that lets you debate priority instead of arguing about whether a number 'feels off'.

"The dashboard feels wrong"

A PM I worked with kept saying the conversion dashboard was "off". The DE asked "off how?". The PM didn't know how to articulate it. Three days of investigation later, the issue was that one event — signup_complete — was being emitted with a 4-hour delay because of a bug in the mobile app. The hourly conversion rate looked plausible all morning but settled wrong by evening.

This is a freshness problem, not a correctness problem. Without shared vocabulary the team kept arguing about whether the dashboard was "wrong". Once we framed it as "the freshness SLO is 1 hour, current latency is 4 hours", the debate became "do we fix the producer or build a 4-hour-late dashboard?" — a question you can answer in 15 minutes instead of 3 days.

Six dimensions, in order of how often they break

  1. Freshnessthe data is at most N hours old. Most-broken dimension. Pipelines fail; sources lag. Track per-table SLOs.
  2. Completenessevery required field is present. not_null tests; null-rate alerts.
  3. Uniquenessprimary keys are actually unique. unique tests; composite uniqueness for marts.
  4. Validityvalues respect domain rules. Positive prices, valid emails, country codes from a list.
  5. Consistencysums across partitions agree. Total of daily revenue equals monthly revenue. Reconciliation queries.
  6. Accuracythe data agrees with reality. Hardest. Often only catchable via reconciliation against an external source (the books, a vendor's report, a competitor's number).

Most teams have rough alerting on 1–4. Few have systematic 5. Almost none have 6.

Blocking vs alerting — the policy choice.

  • Blocking checks — fail the build, prevent publication. "Total revenue today is exactly 0" — almost certainly a bug; don't update the dashboard. Reserved for clearly-wrong states.
  • Alerting checks — emit a warning, publish anyway. "Conversion rate is 3 standard deviations below normal" — could be a real drop, could be a bug. Tell the on-call but don't sit on the data.

Mix them up and either (a) every alert blocks publication and your stakeholders never get fresh data, or (b) nothing blocks and the broken numbers hit the CFO's inbox. Spend a meeting deciding which checks are which, before you write them.

Build a *uniqueness* check on `orders.order_id`. Return one row, three columns: `total_rows`, `distinct_keys`, `is_unique` (true when total_rows = distinct_keys).

Takeaway: name the dimension before debating the problem. Freshness, completeness, uniqueness, validity, consistency, accuracy. Blocking checks for clearly-wrong states; alerting for everything else. Vocabulary turns three days of arguing into a 15-minute decision.