← Deep SQL & Production Data Engineering

BigQuery vs Snowflake — and the SELECT * that filled a quarterly bill

Mental model for picking the right warehouse for the right org, plus the cost-control levers each one gives you.

The SELECT * that filled a quarterly bill

A junior analyst at a company I worked with ran an exploratory query in BigQuery's UI:

SELECT * FROM events.clicks ORDER BY created_at DESC LIMIT 100;

The events.clicks table was partitioned by date, had four years of data, and was 1.2 PB. With no partition filter, the query scanned the entire table. At BigQuery's on-demand rate (~$6/TB scanned in 2025), the bill for that single query was about $7,000. The analyst wanted to see 100 recent rows. They saw them.

The org-level fix: enable require_partition_filter on every partitioned table — queries without a WHERE date >= ... predicate fail before they execute. The personal fix: WHERE created_at >= CURRENT_DATE - 7 becomes a habit you build in week one. Same mistake on a smaller 8 TB table is only $50, but the habit you didn't form there is the one that bites you on the petabyte table.

How they bill you

BigQuery

On-demand: $5–6.25 per TB scanned (region-dependent). Only pay for what you read. Beautifully simple, ruinously bad without partition discipline.
Reserved (slots): prepay for compute. Predictable bill, harder to right-size.
Levers: partition (by date typically), cluster (up to 4 columns, secondary sort within partition), require_partition_filter.

Snowflake

Warehouse seconds: pay per second of compute time (rounded up to the minute). Bigger warehouse = more parallelism, more $/sec.
Levers: warehouse size + auto-suspend (default 10 min idle → drop to 60s in production), clustering (automatic on small tables, manual CLUSTER BY on large), result cache (free re-reads of identical queries within 24h).

Partition by date, cluster by user — what gets read

  date partitions →   Mar 1 │ Mar 2 │ Mar 3 │ Mar 4 │ … (1460 daily slabs)
                                       ↓
  inside one day, clustered by user_id:
                      user 0001..0099 │ 0100..0199 │ 0200..0299 │ …
                      (contiguous on disk per cluster)

  WHERE created_at >= '2025-03-03'
    AND created_at <  '2025-03-04'    ◄ partition prune: 1 of 1460 (~0.07 %)
    AND user_id = 47                  ◄ cluster prune: 1 cluster within that day

Without the date filter: full 1.2 PB scan. With require_partition_filter = TRUE on the table, the query fails before it costs you anything — that's the org-level fix from the story.

Partition and cluster — the cost lever

-- BigQuery
CREATE TABLE events.clicks
PARTITION BY DATE(created_at)
CLUSTER BY user_id, country
OPTIONS (
  require_partition_filter = TRUE  -- queries without WHERE on created_at fail
)
AS SELECT * FROM staging.clicks;

-- Snowflake
CREATE TABLE events.clicks (
  ...
)
CLUSTER BY (DATE_TRUNC('day', created_at), country);
-- Snowflake doesn't have hard partitions; clustering keys reorganize micro-partitions.

-- The query that uses both:
SELECT user_id, COUNT(*) AS clicks
FROM events.clicks
WHERE created_at >= '2025-03-01'      -- partition prune (BQ) or partition skip (SF)
  AND country = 'US'                  -- cluster prune
GROUP BY user_id;
-- Without these predicates: full scan, full bill.

Demonstrate *partition pruning intent* — write a query that filters on a date column and would benefit from partitioning. Return one row, one column `would_partition_prune`, with value `true` (because the query filters on `created_at` with a range predicate, which BigQuery / Snowflake can prune).

Takeaway: cloud warehouses are simple to operate and easy to overspend. BigQuery: partition + cluster + require_partition_filter. Snowflake: warehouse size + auto-suspend + clustering. Watch the cost dashboard every Monday. The $7k-on-a-petabyte story is real; the prevention is WHERE created_at >= ... as a typing habit.