← Deep SQL & Production Data Engineering

Window functions: ROW_NUMBER, LAG, and the running total that took 9 hours

Compute per-row analytics without collapsing rows. Top-N-per-group, running totals, day-over-day deltas — and the partition trap that made one query 200× slower than it had to be.

OVER changes the game

A window function reads a frame of rows for each row, instead of collapsing rows like GROUP BY. Same input row count → same output row count, but with new analytical columns.

It is how you express:

the first order each customer ever placed — ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at)
running revenue over time — SUM(total_cents) OVER (ORDER BY created_at ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
how does today compare to yesterday? — LAG(value) OVER (ORDER BY day)
top 3 products per category — RANK() OVER (PARTITION BY category ORDER BY revenue DESC) then filter rank <= 3

Anatomy of `OVER (...)`

Three parts inside the parentheses:

SUM(total_cents) OVER (
  PARTITION BY customer_id    -- independent streams
  ORDER BY created_at         -- order within each stream
  ROWS BETWEEN ... AND ...    -- frame
)

PARTITION BY is GROUP BY without collapsing. ORDER BY is required for ordered functions (ROW_NUMBER, LAG, LEAD). ROWS BETWEEN (the frame) controls which neighbors each row's aggregate sees — the default for ordered windows is UNBOUNDED PRECEDING AND CURRENT ROW, the running total. The picture below is that running total in motion.

The frame, animated

For SUM(amount) OVER (PARTITION BY customer_id ORDER BY created_at ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW):

  customer A                customer B          ◄ PARTITION BY splits these
  ┌──────────┐              ┌──────────┐
  │ row  amt │              │ row  amt │
  │  1   100 │ ◄ frame: row 1            running_sum = 100
  │  2    50 │ ◄ frame: rows 1..2        running_sum = 150
  │  3   200 │ ◄ frame: rows 1..3        running_sum = 350
  │  4    25 │ ◄ frame: rows 1..4        running_sum = 375
  └──────────┘              ┌──────────┐
                            │  1    75 │ ◄ frame restarts            =  75
                            │  2   125 │ ◄ rows 1..2 of B            = 200
                            │  3   300 │ ◄ rows 1..3 of B            = 500
                            └──────────┘

Two things this picture makes hard to forget:

The frame resets at every partition boundary — customer A's running total never bleeds into customer B because PARTITION BY isolates them.
The frame moves — it is not just "all rows in this partition". For ROW_NUMBER and RANK the frame is one row at a time. For SUM ... ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW it grows by one each step (the running total). For AVG ... ROWS BETWEEN 6 PRECEDING AND CURRENT ROW it slides — a 7-day moving average.

The 9-hour running total — a real failure mode I've debugged twice in two different companies.

A query computes a running total across 200M rows: SUM(amount) OVER (ORDER BY created_at). No PARTITION BY. The optimizer must sort all 200M rows on created_at to compute the frame, then keep them all in memory (or spill) for the cumulative sum.

Fix: add PARTITION BY on whatever natural grouping makes sense (customer, account, day) — Postgres now sorts and computes within each partition independently, dropping memory pressure by orders of magnitude. The same query went from 9 hours to 4 minutes.

Rule: every window function should have a PARTITION BY unless you genuinely need the global running total.

For each customer, return their **first paid order**. Output: `customer_id`, `order_id`, `created_at`, ordered by `customer_id`. Customers with zero paid orders should not appear. Use `ROW_NUMBER()` — not `MIN(created_at)` and join — to do this in one pass.

LAG: day-over-day comparison without a self-join

-- daily revenue with day-over-day delta:
SELECT day,
       revenue,
       LAG(revenue) OVER (ORDER BY day) AS prev_day_revenue,
       revenue - LAG(revenue) OVER (ORDER BY day) AS delta
FROM (
  SELECT date_trunc('day', created_at)::date AS day,
         SUM(total_cents) AS revenue
  FROM orders
  WHERE status = 'paid'
  GROUP BY 1
) d
ORDER BY day;
-- The first day has prev_day_revenue = NULL — there is no day before it.

You want exactly one row per category — the highest-revenue product in each. Two products in the same category have identical revenue. Which window function gives you exactly one row per category?

Takeaway: window functions = aggregates that don't collapse. Always set PARTITION BY unless you really want a global window. Always set a tiebreaker in ORDER BY if you're using ROW_NUMBER. Read the frame clause (ROWS BETWEEN ...) — defaults are not always what you want.

Window functions: ROW_NUMBER, LAG, and the running total that took 9 hours

OVER changes the game

Anatomy of OVER (...)

The frame, animated

Anatomy of `OVER (...)`