Skip to content

Quickstart

This guide walks you through setting up DataSpoc Lens and running your first queries against a data lake.

Terminal window
pip install dataspoc-lens[s3]

Replace [s3] with [gcs] or [azure] depending on your cloud provider.

Terminal window
dataspoc-lens init
Initialized DataSpoc Lens in ~/.dataspoc-lens

This creates the configuration directory at ~/.dataspoc-lens/ with a default config.yaml.

Terminal window
dataspoc-lens add-bucket s3://my-company-data
Bucket added: s3://my-company-data
Discovering tables...
┌──────────────┬─────────┬──────┬────────────┐
│ Table │ Columns │ Rows │ Source │
├──────────────┼─────────┼──────┼────────────┤
│ customers │ 8 │ 5420 │ postgres │
│ orders │ 12 │ 48k │ postgres │
│ products │ 6 │ 312 │ postgres │
└──────────────┴─────────┴──────┴────────────┘
3 table(s) found.

Lens reads the manifest written by DataSpoc Pipe (or scans for .parquet files) and mounts each table as a DuckDB view.

Terminal window
dataspoc-lens catalog
┌──────────────┬─────────┬──────┬────────────┐
│ Table │ Columns │ Rows │ Source │
├──────────────┼─────────┼──────┼────────────┤
│ customers │ 8 │ 5420 │ postgres │
│ orders │ 12 │ 48k │ postgres │
│ products │ 6 │ 312 │ postgres │
└──────────────┴─────────┴──────┴────────────┘

See column details for a specific table:

Terminal window
dataspoc-lens catalog --detail orders
┌─────────────────┬───────────┐
│ Column │ Type │
├─────────────────┼───────────┤
│ order_id │ INTEGER │
│ customer_id │ INTEGER │
│ order_date │ DATE │
│ total │ DOUBLE │
│ status │ VARCHAR │
└─────────────────┴───────────┘
Terminal window
dataspoc-lens query "SELECT status, COUNT(*) as cnt FROM orders GROUP BY status"
┌───────────┬───────┐
│ status │ cnt │
├───────────┼───────┤
│ completed │ 32100 │
│ pending │ 8450 │
│ cancelled │ 2130 │
└───────────┴───────┘
(3 row(s), 0.142s)
Terminal window
dataspoc-lens shell
DataSpoc Lens Shell (DuckDB)
Type SQL or .help for commands.
lens> SELECT * FROM customers LIMIT 3;
┌─────┬──────────────┬───────────────────────┐
│ id │ name │ email │
├─────┼──────────────┼───────────────────────┤
│ 1 │ Alice Smith │ alice@example.com │
│ 2 │ Bob Johnson │ bob@example.com │
│ 3 │ Carol White │ carol@example.com │
└─────┴──────────────┴───────────────────────┘
(3 row(s), 0.008s)
lens> .quit

For free local AI using Ollama:

Terminal window
dataspoc-lens setup-ai

Or configure a cloud provider:

Terminal window
export DATASPOC_LLM_PROVIDER=anthropic
export DATASPOC_LLM_API_KEY=sk-ant-...
Terminal window
dataspoc-lens ask "What are the top 5 customers by total spending?"
SQL: SELECT c.name, SUM(o.total) as total_spent
FROM customers c JOIN orders o ON c.id = o.customer_id
GROUP BY c.name ORDER BY total_spent DESC LIMIT 5
┌──────────────┬─────────────┐
│ name │ total_spent │
├──────────────┼─────────────┤
│ Alice Smith │ 15420.50 │
│ Bob Johnson │ 12300.00 │
│ Carol White │ 9870.25 │
│ Dave Brown │ 8540.00 │
│ Eve Davis │ 7210.75 │
└──────────────┴─────────────┘
(5 row(s), 1.230s)