Window functions: ROW_NUMBER, LAG, and the running total that took 9 hours
Compute per-row analytics without collapsing rows. Top-N-per-group, running totals, day-over-day deltas — and the partition trap that made one query 200× slower than it had to be.
OVER changes the game
A window function reads a frame of rows for each row, instead of collapsing rows like GROUP BY. Same input row count → same output row count, but with new analytical columns.
It is how you express:
- the first order each customer ever placed —
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at) - running revenue over time —
SUM(total_cents) OVER (ORDER BY created_at ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) - how does today compare to yesterday? —
LAG(value) OVER (ORDER BY day) - top 3 products per category —
RANK() OVER (PARTITION BY category ORDER BY revenue DESC)then filterrank <= 3
Anatomy of OVER (...)
Three parts inside the parentheses:
SUM(total_cents) OVER (
PARTITION BY customer_id -- independent streams
ORDER BY created_at -- order within each stream
ROWS BETWEEN ... AND ... -- frame
)
PARTITION BY is GROUP BY without collapsing. ORDER BY is required for ordered functions (ROW_NUMBER, LAG, LEAD). ROWS BETWEEN (the frame) controls which neighbors each row's aggregate sees — the default for ordered windows is UNBOUNDED PRECEDING AND CURRENT ROW, the running total. The picture below is that running total in motion.
The frame, animated
For SUM(amount) OVER (PARTITION BY customer_id ORDER BY created_at ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW):
customer A customer B ◄ PARTITION BY splits these
┌──────────┐ ┌──────────┐
│ row amt │ │ row amt │
│ 1 100 │ ◄ frame: row 1 running_sum = 100
│ 2 50 │ ◄ frame: rows 1..2 running_sum = 150
│ 3 200 │ ◄ frame: rows 1..3 running_sum = 350
│ 4 25 │ ◄ frame: rows 1..4 running_sum = 375
└──────────┘ ┌──────────┐
│ 1 75 │ ◄ frame restarts = 75
│ 2 125 │ ◄ rows 1..2 of B = 200
│ 3 300 │ ◄ rows 1..3 of B = 500
└──────────┘
Two things this picture makes hard to forget:
- The frame resets at every partition boundary — customer A's running total never bleeds into customer B because PARTITION BY isolates them.
- The frame moves — it is not just "all rows in this partition". For
ROW_NUMBERandRANKthe frame is one row at a time. ForSUM ... ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROWit grows by one each step (the running total). ForAVG ... ROWS BETWEEN 6 PRECEDING AND CURRENT ROWit slides — a 7-day moving average.
The 9-hour running total — a real failure mode I've debugged twice in two different companies.
A query computes a running total across 200M rows: SUM(amount) OVER (ORDER BY created_at). No PARTITION BY. The optimizer must sort all 200M rows on created_at to compute the frame, then keep them all in memory (or spill) for the cumulative sum.
Fix: add PARTITION BY on whatever natural grouping makes sense (customer, account, day) — Postgres now sorts and computes within each partition independently, dropping memory pressure by orders of magnitude. The same query went from 9 hours to 4 minutes.
Rule: every window function should have a PARTITION BY unless you genuinely need the global running total.
For each customer, return their **first paid order**. Output: `customer_id`, `order_id`, `created_at`, ordered by `customer_id`. Customers with zero paid orders should not appear. Use `ROW_NUMBER()` — not `MIN(created_at)` and join — to do this in one pass.
-- daily revenue with day-over-day delta:
SELECT day,
revenue,
LAG(revenue) OVER (ORDER BY day) AS prev_day_revenue,
revenue - LAG(revenue) OVER (ORDER BY day) AS delta
FROM (
SELECT date_trunc('day', created_at)::date AS day,
SUM(total_cents) AS revenue
FROM orders
WHERE status = 'paid'
GROUP BY 1
) d
ORDER BY day;
-- The first day has prev_day_revenue = NULL — there is no day before it.You want exactly one row per category — the highest-revenue product in each. Two products in the same category have identical revenue. Which window function gives you exactly one row per category?
Takeaway: window functions = aggregates that don't collapse. Always set PARTITION BY unless you really want a global window. Always set a tiebreaker in ORDER BY if you're using ROW_NUMBER. Read the frame clause (ROWS BETWEEN ...) — defaults are not always what you want.