B-tree, GIN, BRIN and the field guide to picking one
A short tour of the index types Postgres ships with — and the times reaching for the wrong one made queries slower, not faster.
The 14 GB index that made writes 6× slower
A team I knew had a 200M-row events table with 11 indexes — one per column anyone had ever filtered on. The reads were fast. The writes were terrible. INSERT throughput had dropped from 60k/sec to 10k/sec over a year.
We ran pg_stat_user_indexes and discovered six of those indexes had zero scans in 90 days. Six indexes nobody used, costing every INSERT six extra B-tree updates and six WAL records, and 14 GB of disk on the primary plus 14 GB on every replica.
We dropped them. INSERT throughput went back to 55k/sec. Reads got no slower because nobody had been using those indexes.
The lesson: every index is a write tax. Pay it only for read patterns you can name.
A field guide
- B-tree — the default. Equality and range queries on ordered types. ~95% of all indexes you'll ever create.
- GIN — Generalized Inverted Index. Multi-valued data:
jsonbkeys, arrays, full-text search vectors. Build is slow, lookups are fast. - GiST — Generalized Search Tree. Overlap and proximity queries on geometric types, range types, trigrams. The right answer for
&&,<->,@>operators on geo or range data. - BRIN — Block Range INdex. Stores a summary per N pages of physical disk. Tiny on disk (KB instead of GB). Perfect for append-only time-series where physical order tracks logical order. Useless for random access.
(Postgres also ships Hash indexes, but they're equality-only, lose to B-tree on every other axis, and rarely earn their keep — skip them unless you're benchmarking.)
B-tree finds a row; BRIN finds a range
B-TREE lookup for `email = 'ada@example.com'` on a 200M-row table:
read 1 ROOT [_, A, F, K, Q, V] fanout ≈ 250
read 2 BRANCH [Aa, Ad, Ag, Al, Ap, …]
read 3 LEAF ada@example.com → tid (heap pointer)
read 4 HEAP customer_id, name, country, … (skipped if covering)
log_250(2×10⁸) ≈ 4 levels. Same speed at any sane scale.
BRIN on append-only events(occurred_at):
heap pages → [1..4] [5..8] [9..12] … (millions)
summary [09:00, 09:59][10:00, 10:59][11:00, 11:59]
WHERE occurred_at BETWEEN '10:30' AND '10:45'
→ skip range 0, read range 1 (4 pages), skip range 2.
On a 200M-row table, BRIN is kilobytes; B-tree on same column is GB.
B-tree wins when you need one row and any row order is possible. BRIN wins when the column tracks physical insert order (append-only occurred_at, monotonic id) and you query by range. Opposite ends of the spectrum, not interchangeable.
Functions on indexed columns disable the index.
-- index on customers(email) exists
WHERE lower(email) = 'ada@example.com'
-- the planner cannot use the index — it would have to call lower() on every row
Fix with an expression index:
CREATE INDEX customers_email_lower_idx ON customers (lower(email));
Now WHERE lower(email) = 'ada@example.com' uses the new index. Same trap with WHERE date_trunc('day', created_at) = '2025-01-15' — needs an expression index on date_trunc('day', created_at) or a rewrite to WHERE created_at >= '2025-01-15' AND created_at < '2025-01-16'.
Use `EXPLAIN` to confirm Postgres uses an index for one filter and not for the other. The fixture's `customers` table has a `UNIQUE` index on `email` (a B-tree). Compare these two queries by checking the plan node type. Return one row, one column `index_used_for_lower`, with value `false` (because `lower(email) = ...` cannot use the plain `email` index).
You have a 200M-row `events` table that is ingested in time order and queried mostly with `WHERE occurred_at BETWEEN ... AND ...`. Which index is the most space-efficient choice?
Takeaway: every index is a write tax. Pay it only for query patterns you can name. Audit pg_stat_user_indexes monthly and drop the unused. Match the index type to the query type — B-tree for the 95%, BRIN for append-only time-series, GIN for jsonb / arrays / full-text, GiST for geometry and ranges.