Codabra

Capstone brief: what to build, how to defend it

The 14 questions every senior interviewer will ask, the rubric they're scoring on, and the one piece of advice that makes the difference.

Why a capstone

A hiring manager I worked with said the line "I worked on the data team" tells him nothing. The line "I built the incremental ingestion for fct_orders, found a write skew that was double-billing customers in 5% of cases, fixed it with SELECT FOR UPDATE, and the dashboard reconciles to the books to within $0.50 daily" tells him everything.

Your capstone is the project you can tell that second story about. The 22 modules of this course are the materials; the capstone is the artifact. The defense rubric below is what senior reviewers actually score on — including a few they won't tell you.

The 14 defense questions

  1. Why this architecture? What did you reject and why?
  2. Why these databases? Cite numbers (volume, concurrency, latency SLO).
  3. How is data ingested? Push, pull, batch, stream — and why.
  4. How does incremental processing work? Watermark + overlap, or CDC?
  5. How do you avoid duplicates? Show the unique constraint and the UPSERT.
  6. How is late-arriving data handled? It will arrive; what happens when it does?
  7. How is quality verified? Tests, dashboards, reconciliation.
  8. How does orchestration work? What runs when, what retries, what alerts.
  9. Where is the bottleneck? "Nowhere" is the wrong answer. Name it; show how you measured.
  10. How was the most expensive query optimized? Plan before, plan after, savings.
  11. How are accesses configured? Roles, masked views, audit logs.
  12. How does an upstream schema change propagate? Contract? Or fingers crossed?
  13. How do you recover after a failure? Walk the runbook.
  14. How would you scale 10×? What breaks first, what would you change?

Three things senior reviewers score on (that they don't write down).

  1. Specific numbers. "It runs in 90 seconds for 4M rows, scales linearly to 30 minutes at 100M" beats "it's fast" every time. Memorize the numbers.
  2. What you'd do differently. Naming a regret signals self-awareness. "The watermark is wall-clock; if I rebuilt I'd use logical clock from CDC" is a senior answer.
  3. Where the trade-offs live. Every choice has a cost. "Pre-aggregation costs 1.5 hours of pipeline time daily and saves 200 hours of dashboard latency" is the trade-off framing the rubric loves.

Build the *one query* that would answer questions 5 (no duplicates) and 9 (bottleneck) at once: a daily revenue mart, deduplicated, with row count + grain. Return today's-equivalent (any single day) revenue with `day`, `revenue_cents`, `order_count`, ordered by day. (Pick day from the fixture; March 1 has the most orders.)

Takeaway: the capstone is the artifact you can tell a specific story about. Memorize numbers. Name regrets. Frame trade-offs. The 14 questions above are the rubric every senior reviewer carries — and now you have the same one.