← Deep SQL & Production Data Engineering

Isolation levels — the bonus that was paid twice

Three levels, three failure modes, picked on purpose. Plus the write-skew anomaly nobody warns you about.

The bonus that was paid twice

A payroll team I knew shipped logic that read like this:

BEGIN;
SELECT bonus_amount FROM payroll WHERE user_id = 42 AND paid = false;
-- 'paid' was false; bonus_amount is $5000
INSERT INTO payouts (user_id, amount) VALUES (42, 5000);
UPDATE payroll SET paid = true WHERE user_id = 42;
COMMIT;

Under the default Postgres isolation (Read Committed), if two workers ran this code simultaneously for user 42, both could read paid = false before either INSERT'd. Both would write a payout. The user got $10,000 instead of $5,000. Twelve users, six figures total, before anyone noticed.

The fix is two characters: FOR UPDATE on the SELECT. But the diagnosis required understanding what "transaction" actually buys you under concurrent load. Today, that.

MVCC in 90 seconds

Postgres uses Multi-Version Concurrency Control: every UPDATE creates a new row version with a hidden xmin (transaction that wrote it) and xmax (transaction that deleted/superseded it). Old versions stay around until vacuum removes them.

Why this matters:

Readers don't block writers, writers don't block readers. Two transactions can SELECT the same row while a third UPDATEs it; nobody waits.
Long-running transactions block vacuum. A transaction held open for an hour means vacuum can't reclaim any row version newer than that transaction's snapshot. Tables bloat. This is the single most common cause of "why is my Postgres slow" in production.

Each transaction sees a snapshot — a consistent view of the database at some point in time. The timeline below shows exactly how that snapshot mechanic produces the bonus-paid-twice bug under the default isolation level.

The bonus-paid-twice timeline

  T1 (worker A)                          T2 (worker B)
  ─────────────────────────────────────────────────────────────────
  t=0  BEGIN
  t=1  SELECT paid → false                 ◄ both SELECTs read the
  t=2                                        BEGIN          latest *committed*
  t=3                                        SELECT paid → false   value of paid
  t=4  INSERT INTO payouts (42, 5000)       ◄ payment #1
  t=5  UPDATE bonuses SET paid=true
  t=6  COMMIT ✓
  t=7                                        INSERT INTO payouts (42, 5000)  ◄ #2!
  t=8                                        UPDATE bonuses SET paid=true
  t=9                                        COMMIT ✓
                                                ↓ payouts has TWO rows for user 42  ☢

Both SELECTs read the latest committed paid=false (Read Committed re-reads per statement; it is not snapshot-based — that's RR/Serializable). Because both saw false, both inserted a payout. The duplicate is in payouts, not bonuses — the UPDATE just sets paid=true twice, which is harmless. Fix: two characters on T1's SELECT — ... FOR UPDATE — locks the row, T2 waits, sees paid=true, application skips the INSERT.

Isolation levels — which anomaly each prevents

                            │ Dirty │ Non-rep │ Phantom │ Write
ANSI level                  │ read  │ read    │ read    │ skew
────────────────────────────┼───────┼─────────┼─────────┼──────
Read Uncommitted            │   ✗   │   ✗     │   ✗     │  ✗
Read Committed (PG default) │   ✓   │   ✗     │   ✗     │  ✗
Repeatable Read             │   ✓   │   ✓     │   ✓ *   │  ✗
Serializable                │   ✓   │   ✓     │   ✓     │  ✓

  ✓ = prevented   ✗ = possible
  * Postgres' RR is snapshot isolation; it is stronger than ANSI RR and
    blocks classical phantoms — but write skew still slips through.
  ◄ The bonus story lives on the Read Committed row.

One sentence each:

Read Committed (Postgres default) — re-reads latest committed values per statement. Open to non-repeatable reads, phantoms, write skew.
Repeatable Read — Postgres' snapshot isolation: one snapshot for the whole transaction; blocks non-repeatable reads and classical phantoms, not write skew.
Serializable — Serializable Snapshot Isolation (SSI). Behaves as if transactions ran one at a time; raises SQLSTATE 40001 on conflict and the app retries.

Wrong default: assuming Read Committed is enough because you're "using transactions". For multi-row invariants, use SELECT ... FOR UPDATE or Serializable.

Write skew, the one anomaly nobody mentions in interviews.

Classic example: a hospital must always have at least one on-call doctor. Two doctors are on call. Each opens a transaction, queries SELECT count(*) FROM doctors WHERE on_call = true (gets 2), thinks "OK to go off-call, there's still one left", and updates their own row to on_call = false. Both commit. Now zero doctors are on call.

Neither transaction modified the row the other read. No row-level lock conflict. Read Committed and Repeatable Read both let this through. Only Serializable catches it (or SELECT ... FOR UPDATE on the read).

The fixture has two `accounts` rows with `balance_cents = 100000` each. Write a query that returns each account with a flag `safe_to_debit` set to `true` only if the *transferring out* of $50,000 (50000 cents) would leave the balance non-negative — but the constraint is *both accounts together* must keep at least $50,000 in total. Output: `account_id`, `balance_cents`, `safe_to_debit`, ordered by `account_id`.

Takeaway: transactions don't automatically prevent every concurrent-write bug. Read Committed is fast and safe for single-row operations. Multi-row invariants need either Serializable (handles 40001 retries) or explicit FOR UPDATE locks. Write skew is the anomaly that gets through Repeatable Read; know it exists.