Retention cohorts and conversion funnels — pure SQL
The two analytics queries you'll be asked for in every job interview, built from raw events with CTEs and FILTER. No Excel pivots, no Python.
Why cohorts and funnels matter
Product managers care about two questions more than any others:
- "Of users who signed up in March, how many are still active in May?" — that's a retention cohort.
- "Of users who saw a product page, how many added to cart, and how many checked out?" — that's a conversion funnel.
Both are pure SQL once you have raw event data. People reach for Mixpanel and Amplitude because doing this in SQL feels hard the first time. After this 15-minute lesson, you'll see why it's actually one CTE-stack and a FILTER.
The funnel, drawn against the seeded fixture
step users
──────────── ─────
page_view ████████████ 3 (100 %)
product_view ████████████ 3 (100 %) ◄ all viewers reached this
add_to_cart ████████ 2 ( 67 %) ◄ user 2 dropped here
checkout ████████ 2 ( 67 %)
│ │
│ └ retention vs the prior step
└ COUNT(DISTINCT user_id) at this step
Each row's width is proportional to the users still in the funnel. The drop between product_view and add_to_cart is the conversion gap a PM would investigate first — that's the row the SQL points the finger at. The pattern below produces exactly these four numbers from raw events.
-- Pattern: one row per user, columns are step boolean flags or step timestamps.
-- Then COUNT(DISTINCT user_id) FILTER (...) for each step.
WITH user_steps AS (
SELECT user_id,
MAX(occurred_at) FILTER (WHERE event_name = 'page_view') AS first_view,
MAX(occurred_at) FILTER (WHERE event_name = 'product_view') AS first_product,
MAX(occurred_at) FILTER (WHERE event_name = 'add_to_cart') AS first_cart,
MAX(occurred_at) FILTER (WHERE event_name = 'checkout') AS first_checkout
FROM events
GROUP BY user_id
)
SELECT
COUNT(DISTINCT user_id) FILTER (WHERE first_view IS NOT NULL) AS step_view,
COUNT(DISTINCT user_id) FILTER (WHERE first_product IS NOT NULL) AS step_product,
COUNT(DISTINCT user_id) FILTER (WHERE first_cart IS NOT NULL) AS step_cart,
COUNT(DISTINCT user_id) FILTER (WHERE first_checkout IS NOT NULL) AS step_checkout
FROM user_steps;Build a 4-step funnel from the seeded `events` table: `page_view → product_view → add_to_cart → checkout`. Return one row with four columns: `step_view`, `step_product`, `step_cart`, `step_checkout`. Each is the count of distinct users who reached that step *at any point* (no time-ordering between steps required for this exercise — interview classic, simplest version).
Retention cohorts use the same CTE stack. First grouping: signup_month = date_trunc('month', signup_at). Then for each (signup_month, month_offset), count distinct users who had any activity that month. Plot the result as a triangle and you have the classic cohort heatmap. Same FILTER + COUNT DISTINCT trick, applied across two grouping keys instead of four event names.
Takeaway: cohorts and funnels are not advanced SQL — they're a CTE that flattens events to one row per user, then COUNT(DISTINCT user_id) FILTER (...) per step. Internalize this pattern and most product-analytics questions become 20-line queries.