Three engines, one dataset — and the wrong choice that cost a quarter
Postgres, DuckDB, ClickHouse — when each wins, when each loses, and the question that actually matters when you pick.
"We picked ClickHouse because it's fast."
A team I knew picked ClickHouse for an OLTP workload because of a benchmark blog post. Three months later they'd shipped a checkout system that took 800ms per single-row insert under load. ClickHouse is a column-store optimized for batch ingest and wide aggregations — point inserts are not its job. They migrated to Postgres in a six-week panic.
The lesson is not "ClickHouse is bad". ClickHouse is excellent for the right workload (analytics over billions of rows, where Postgres would be 100× slower). The lesson is picking an engine without naming the workload. Same dataset, same SQL, totally different runtime depending on the engine.
At a glance
| Engine | Sweet spot | Storage | Notes |
|---|---|---|---|
| Postgres | OLTP + light OLAP | row-store (heap) | The default. Extensions: pgvector, postgis, timescaledb. ACID, transactions, foreign keys. |
| DuckDB | Single-node OLAP, embedded | column-store | Reads Parquet/CSV directly. Runs in-process — the analyst-laptop database. No server to manage. |
| ClickHouse | Distributed OLAP | column-store, MergeTree | Throws billions of rows at aggregations. Weak at point lookups, no transactions, eventual-consistent replication. |
The five questions that pick the engine for you.
- Workload shape — point lookups (Postgres) vs wide scans (column-store)?
- Concurrency — 1000s of small queries (Postgres) vs dozens of huge ones (column-store)?
- Data volume — <1 TB (Postgres / DuckDB) vs 10s-100s of TB (ClickHouse, BigQuery, Snowflake)?
- Update frequency — frequent UPDATEs (Postgres) vs append-mostly (column-store)?
- Latency tolerance — sub-second (Postgres for OLTP, column-store for cached OLAP) vs minutes ok (Spark, Trino)?
Answer these before you read a vendor blog. Most engine debates are about (1) and (4), and most choices are wrong because nobody asked.
-- Inside duckdb (or via the CLI: `duckdb`)
-- Read every Parquet file in a folder as a table:
CREATE VIEW orders AS
SELECT * FROM read_parquet('/data/lake/orders/year=2025/month=*/*.parquet');
-- Now query it like any SQL table.
SELECT date_trunc('week', created_at) AS week,
SUM(total_cents) AS revenue
FROM orders
WHERE status = 'paid'
GROUP BY 1
ORDER BY 1;
-- DuckDB is column-store — this aggregates 100M rows in seconds on a laptop.
-- The Parquet files don't have to be loaded; DuckDB reads only the columns and
-- row groups it needs (predicate pushdown, projection pushdown).Demonstrate the *projection pushdown* idea — a column-store would only read the columns the query asks for. Return one row, one column `selected_columns`, with the count of columns in your SELECT (here it's 2 — `customer_id` and `revenue`).
Takeaway: pick the engine for the workload, not the workload for the engine. Postgres handles 80% of teams' needs. DuckDB is the analyst's secret weapon for ad-hoc Parquet analysis. ClickHouse / BigQuery / Snowflake are for when aggregations over billions is the daily job. Document the why in your README — your future hire will thank you.