BigQuery vs Snowflake — and the SELECT * that filled a quarterly bill
Mental model for picking the right warehouse for the right org, plus the cost-control levers each one gives you.
The SELECT * that filled a quarterly bill
A junior analyst at a company I worked with ran an exploratory query in BigQuery's UI:
SELECT * FROM events.clicks ORDER BY created_at DESC LIMIT 100;
The events.clicks table was partitioned by date, had four years of data, and was 1.2 PB. With no partition filter, the query scanned the entire table. At BigQuery's on-demand rate (~$6/TB scanned in 2025), the bill for that single query was about $7,000. The analyst wanted to see 100 recent rows. They saw them.
The org-level fix: enable require_partition_filter on every partitioned table — queries without a WHERE date >= ... predicate fail before they execute. The personal fix: WHERE created_at >= CURRENT_DATE - 7 becomes a habit you build in week one. Same mistake on a smaller 8 TB table is only $50, but the habit you didn't form there is the one that bites you on the petabyte table.
How they bill you
BigQuery
- On-demand: $5–6.25 per TB scanned (region-dependent). Only pay for what you read. Beautifully simple, ruinously bad without partition discipline.
- Reserved (slots): prepay for compute. Predictable bill, harder to right-size.
- Levers: partition (by
datetypically), cluster (up to 4 columns, secondary sort within partition),require_partition_filter.
Snowflake
- Warehouse seconds: pay per second of compute time (rounded up to the minute). Bigger warehouse = more parallelism, more $/sec.
- Levers: warehouse size + auto-suspend (default 10 min idle → drop to 60s in production), clustering (automatic on small tables, manual
CLUSTER BYon large), result cache (free re-reads of identical queries within 24h).
Partition by date, cluster by user — what gets read
date partitions → Mar 1 │ Mar 2 │ Mar 3 │ Mar 4 │ … (1460 daily slabs)
↓
inside one day, clustered by user_id:
user 0001..0099 │ 0100..0199 │ 0200..0299 │ …
(contiguous on disk per cluster)
WHERE created_at >= '2025-03-03'
AND created_at < '2025-03-04' ◄ partition prune: 1 of 1460 (~0.07 %)
AND user_id = 47 ◄ cluster prune: 1 cluster within that day
Without the date filter: full 1.2 PB scan. With require_partition_filter = TRUE on the table, the query fails before it costs you anything — that's the org-level fix from the story.
-- BigQuery
CREATE TABLE events.clicks
PARTITION BY DATE(created_at)
CLUSTER BY user_id, country
OPTIONS (
require_partition_filter = TRUE -- queries without WHERE on created_at fail
)
AS SELECT * FROM staging.clicks;
-- Snowflake
CREATE TABLE events.clicks (
...
)
CLUSTER BY (DATE_TRUNC('day', created_at), country);
-- Snowflake doesn't have hard partitions; clustering keys reorganize micro-partitions.
-- The query that uses both:
SELECT user_id, COUNT(*) AS clicks
FROM events.clicks
WHERE created_at >= '2025-03-01' -- partition prune (BQ) or partition skip (SF)
AND country = 'US' -- cluster prune
GROUP BY user_id;
-- Without these predicates: full scan, full bill.Demonstrate *partition pruning intent* — write a query that filters on a date column and would benefit from partitioning. Return one row, one column `would_partition_prune`, with value `true` (because the query filters on `created_at` with a range predicate, which BigQuery / Snowflake can prune).
Takeaway: cloud warehouses are simple to operate and easy to overspend. BigQuery: partition + cluster + require_partition_filter. Snowflake: warehouse size + auto-suspend + clustering. Watch the cost dashboard every Monday. The $7k-on-a-petabyte story is real; the prevention is WHERE created_at >= ... as a typing habit.