← Deep SQL & Production Data Engineering

B-tree, GIN, BRIN and the field guide to picking one

A short tour of the index types Postgres ships with — and the times reaching for the wrong one made queries slower, not faster.

The 14 GB index that made writes 6× slower

A team I knew had a 200M-row events table with 11 indexes — one per column anyone had ever filtered on. The reads were fast. The writes were terrible. INSERT throughput had dropped from 60k/sec to 10k/sec over a year.

We ran pg_stat_user_indexes and discovered six of those indexes had zero scans in 90 days. Six indexes nobody used, costing every INSERT six extra B-tree updates and six WAL records, and 14 GB of disk on the primary plus 14 GB on every replica.

We dropped them. INSERT throughput went back to 55k/sec. Reads got no slower because nobody had been using those indexes.

The lesson: every index is a write tax. Pay it only for read patterns you can name.

A field guide

B-tree — the default. Equality and range queries on ordered types. ~95% of all indexes you'll ever create.
GIN — Generalized Inverted Index. Multi-valued data: jsonb keys, arrays, full-text search vectors. Build is slow, lookups are fast.
GiST — Generalized Search Tree. Overlap and proximity queries on geometric types, range types, trigrams. The right answer for &&, <->, @> operators on geo or range data.
BRIN — Block Range INdex. Stores a summary per N pages of physical disk. Tiny on disk (KB instead of GB). Perfect for append-only time-series where physical order tracks logical order. Useless for random access.

(Postgres also ships Hash indexes, but they're equality-only, lose to B-tree on every other axis, and rarely earn their keep — skip them unless you're benchmarking.)

B-tree finds a row; BRIN finds a range

B-TREE lookup for `email = 'ada@example.com'` on a 200M-row table:

  read 1   ROOT     [_, A, F, K, Q, V]              fanout ≈ 250
  read 2   BRANCH   [Aa, Ad, Ag, Al, Ap, …]
  read 3   LEAF     ada@example.com → tid (heap pointer)
  read 4   HEAP     customer_id, name, country, …   (skipped if covering)

  log_250(2×10⁸) ≈ 4 levels. Same speed at any sane scale.

BRIN on append-only events(occurred_at):

  heap pages →  [1..4]        [5..8]        [9..12]   …  (millions)
  summary       [09:00, 09:59][10:00, 10:59][11:00, 11:59]

  WHERE occurred_at BETWEEN '10:30' AND '10:45'
     → skip range 0, read range 1 (4 pages), skip range 2.

  On a 200M-row table, BRIN is kilobytes; B-tree on same column is GB.

B-tree wins when you need one row and any row order is possible. BRIN wins when the column tracks physical insert order (append-only occurred_at, monotonic id) and you query by range. Opposite ends of the spectrum, not interchangeable.

Functions on indexed columns disable the index.

-- index on customers(email) exists
WHERE lower(email) = 'ada@example.com'
  -- the planner cannot use the index — it would have to call lower() on every row

Fix with an expression index:

CREATE INDEX customers_email_lower_idx ON customers (lower(email));

Now WHERE lower(email) = 'ada@example.com' uses the new index. Same trap with WHERE date_trunc('day', created_at) = '2025-01-15' — needs an expression index on date_trunc('day', created_at) or a rewrite to WHERE created_at >= '2025-01-15' AND created_at < '2025-01-16'.

Use `EXPLAIN` to confirm Postgres uses an index for one filter and not for the other. The fixture's `customers` table has a `UNIQUE` index on `email` (a B-tree). Compare these two queries by checking the plan node type. Return one row, one column `index_used_for_lower`, with value `false` (because `lower(email) = ...` cannot use the plain `email` index).

You have a 200M-row `events` table that is ingested in time order and queried mostly with `WHERE occurred_at BETWEEN ... AND ...`. Which index is the most space-efficient choice?

Takeaway: every index is a write tax. Pay it only for query patterns you can name. Audit pg_stat_user_indexes monthly and drop the unused. Match the index type to the query type — B-tree for the 95%, BRIN for append-only time-series, GIN for jsonb / arrays / full-text, GiST for geometry and ranges.