duckdbsnowflakecost-reductiondata-warehouse

We Replaced Our $8k/month Snowflake with DuckDB and Parquet

Michael San Martim · 2026-04-23

This is the story of how we went from $8,000/month on Snowflake to $0/month — without losing anything that mattered to our team of 12 analysts.

The Problem

Our Snowflake bill was growing 20% month-over-month. We had:

  • 2.3 TB of data across 47 tables
  • 12 analysts running queries
  • 90% of queries were simple: aggregations, filters, joins on 3-4 tables
  • Average query scanned less than 50 GB
  • The warehouse was idle 18 hours/day

We were paying enterprise data warehouse prices for workloads that a laptop could handle.

The Discovery

Parquet files on S3 + DuckDB = a free data warehouse for our scale.

DuckDB can:

  • Read Parquet directly from S3 (no COPY INTO, no staging)
  • Execute complex SQL with joins, window functions, CTEs
  • Process 50 GB in seconds on a single machine
  • Cache results for repeated queries

The missing piece was tooling. We needed:

  • Table discovery (what’s in the bucket?)
  • Schema management (what columns exist?)
  • A query interface for analysts
  • AI integration for business users

DataSpoc Lens provided all of that.

The Migration

Step 1: Export from Snowflake to Parquet

We already had our raw data in S3 (Snowflake was loading from there). For curated tables, we exported:

Terminal window
pip install dataspoc-pipe
# We used Pipe to extract our Snowflake curated tables to Parquet
dataspoc-pipe add snowflake \
--account xy12345.us-east-1 \
--database ANALYTICS \
--schema CURATED \
--tables revenue,customers,products,orders,sessions \
--destination s3://company-lake

One-time migration took 15 minutes for 2.3 TB.

Step 2: Point Lens at the Bucket

Terminal window
pip install dataspoc-lens
dataspoc-lens add-bucket s3://company-lake
dataspoc-lens discover

Output:

Discovered 47 tables in s3://company-lake:
raw/postgres/orders (1.2M rows, 340 MB)
raw/postgres/customers (89K rows, 12 MB)
raw/postgres/products (2.4K rows, 1.1 MB)
curated/finance/revenue (4.2M rows, 890 MB)
curated/product/sessions (12M rows, 2.1 GB)
...

Step 3: Query Like Snowflake

The SQL is identical. Every query we had in Snowflake worked in DuckDB without modification:

Snowflake:

-- snowflake worksheet
SELECT
DATE_TRUNC('month', order_date) AS month,
product_category,
SUM(amount) AS revenue,
COUNT(DISTINCT customer_id) AS unique_customers
FROM analytics.curated.revenue
WHERE order_date >= '2024-01-01'
GROUP BY 1, 2
ORDER BY 1, 2;

DataSpoc Lens (same SQL):

Terminal window
dataspoc-lens query "
SELECT
DATE_TRUNC('month', order_date) AS month,
product_category,
SUM(amount) AS revenue,
COUNT(DISTINCT customer_id) AS unique_customers
FROM curated.finance.revenue
WHERE order_date >= '2024-01-01'
GROUP BY 1, 2
ORDER BY 1, 2
"

Same results. Zero cost. 1.2 seconds instead of 3.4 seconds (Snowflake XS warehouse).

Step 4: Give Analysts an Interactive Shell

Terminal window
dataspoc-lens shell
DataSpoc Lens Shell (DuckDB 0.10.1)
Connected to: s3://company-lake (47 tables)
Type .tables to list, .schema <table> to inspect, .quit to exit
lens> .tables
┌─────────────────────────────────┬──────────┬─────────┐
│ table │ rows │ size │
├─────────────────────────────────┼──────────┼─────────┤
│ curated.finance.revenue │ 4.2M │ 890 MB │
│ curated.product.sessions │ 12M │ 2.1 GB │
│ raw.postgres.orders │ 1.2M │ 340 MB │
│ raw.postgres.customers │ 89K │ 12 MB │
│ ... │ │ │
└─────────────────────────────────┴──────────┴─────────┘
lens> SELECT COUNT(*) FROM curated.finance.revenue WHERE order_date >= '2024-01-01';
┌──────────┐
│ count │
├──────────┤
│ 1847293 │
└──────────┘
(0.4s)

Analysts who knew SQL were productive immediately. No retraining.

Step 5: AI Ask for Business Users

For the CEO and PMs who don’t write SQL:

Terminal window
dataspoc-lens ask "What was our MRR growth last quarter?"
Based on curated.finance.revenue:
MRR grew from $1.23M (Jul 2024) to $1.47M (Sep 2024), a 19.5% increase
over Q3. The strongest month was September (+8.2% MoM), driven by
Enterprise plan upgrades.
SQL used:
SELECT DATE_TRUNC('month', order_date) AS month,
SUM(amount) AS mrr
FROM curated.finance.revenue
WHERE order_date >= '2024-07-01' AND order_date < '2024-10-01'
AND revenue_type = 'recurring'
GROUP BY 1 ORDER BY 1

The AI shows its work — the exact SQL query — so analysts can verify.

Performance Comparison

We benchmarked our 20 most common queries:

Query TypeSnowflake (XS)DataSpoc Lens (DuckDB)
Simple aggregation (1 table)1.8s0.3s
Join 2 tables + group by3.2s1.1s
Window function over 1M rows4.1s1.8s
Full table scan 12M rows6.7s4.2s
Complex CTE with 4 joins8.3s3.9s

DuckDB was faster for every single query at our scale. The secret: no network round-trip to a cloud warehouse, Parquet’s columnar format means DuckDB only reads the columns it needs, and S3 reads are parallelized.

The Cost Comparison

Cost CenterSnowflakeDataSpoc Lens
Compute (warehouse)$6,200/mo$0
Storage (managed)$1,800/mo$0 (already paying for S3)
S3 storage$47/mo$47/mo (same data)
S3 GET requests$12/mo
DataSpoc Lens license$0 (open source)
Total$8,047/mo$59/mo

Annual savings: $95,856.

What We Gained

Beyond cost savings:

  1. No warehouse management. No sizing, no auto-suspend tuning, no credit monitoring.
  2. Instant start. No cold-start warehouse spin-up (15-45 seconds on Snowflake).
  3. Local development. Analysts can query the same data locally with dataspoc-lens shell.
  4. AI-native. MCP server means Claude, GPT, and internal agents query our data directly.
  5. Portable. Not locked into any vendor. Parquet is an open format.

What We Lost (and honest caveats)

This approach does NOT replace Snowflake if you have:

RequirementSnowflakeDataSpoc Lens
50+ concurrent analystsHandles itSingle-user per process
Petabyte-scale dataDistributed computeSingle-machine (limit ~100 GB effective)
Fine-grained RBACNative roles + policiesIAM-based (bucket level)
Time travel / versioningBuilt-inYou manage Parquet versions
Semi-structured (JSON)VARIANT typeDuckDB JSON functions (works, less elegant)
Governance / lineageNative + partnersDIY
Data sharingSecure sharesShare the bucket (less governed)

Our rule of thumb: If your data fits on a single machine’s memory (up to ~100 GB actively queried), you probably don’t need a cloud warehouse. That covers 80% of startups and mid-market companies.

The Migration Playbook

For teams considering this:

  1. Audit your Snowflake usage. Run ACCOUNT_USAGE.QUERY_HISTORY — check data volumes and concurrency.
  2. If 90% of queries scan < 50 GB: you’re a candidate.
  3. Export curated tables to Parquet on S3. Use DataSpoc Pipe or COPY INTO with Parquet format.
  4. Set up Lens. pip install dataspoc-lens && dataspoc-lens add-bucket s3://your-bucket
  5. Run your top 20 queries. Validate correctness and performance.
  6. Migrate analysts gradually. Keep Snowflake alive for 1 month as fallback.
  7. Turn off the warehouse.
Terminal window
# The full migration in 4 commands
pip install dataspoc-pipe dataspoc-lens
dataspoc-pipe add snowflake --account YOUR_ACCOUNT --tables ALL --destination s3://your-lake
dataspoc-pipe run
dataspoc-lens add-bucket s3://your-lake
dataspoc-lens shell # You're done

One Year Later

12 months after the migration:

  • Saved $95K in Snowflake costs
  • Query performance improved (no cold starts)
  • Analysts are happier (instant shell, AI Ask)
  • Zero incidents related to the migration
  • Added MCP server so product bots query data autonomously

The $8K/month was buying us things we didn’t need. For teams at our scale, the cloud data warehouse is a solved problem — and the solution is free.

Recommended