Quickstart
This guide walks you through setting up DataSpoc Lens and running your first queries against a data lake.
1. Install Lens
Section titled “1. Install Lens”pip install dataspoc-lens[s3]Replace [s3] with [gcs] or [azure] depending on your cloud provider.
2. Initialize configuration
Section titled “2. Initialize configuration”dataspoc-lens initInitialized DataSpoc Lens in ~/.dataspoc-lensThis creates the configuration directory at ~/.dataspoc-lens/ with a default config.yaml.
3. Register a bucket
Section titled “3. Register a bucket”dataspoc-lens add-bucket s3://my-company-dataBucket added: s3://my-company-dataDiscovering tables...┌──────────────┬─────────┬──────┬────────────┐│ Table │ Columns │ Rows │ Source │├──────────────┼─────────┼──────┼────────────┤│ customers │ 8 │ 5420 │ postgres ││ orders │ 12 │ 48k │ postgres ││ products │ 6 │ 312 │ postgres │└──────────────┴─────────┴──────┴────────────┘
3 table(s) found.Lens reads the manifest written by DataSpoc Pipe (or scans for .parquet files) and mounts each table as a DuckDB view.
4. Explore the catalog
Section titled “4. Explore the catalog”dataspoc-lens catalog┌──────────────┬─────────┬──────┬────────────┐│ Table │ Columns │ Rows │ Source │├──────────────┼─────────┼──────┼────────────┤│ customers │ 8 │ 5420 │ postgres ││ orders │ 12 │ 48k │ postgres ││ products │ 6 │ 312 │ postgres │└──────────────┴─────────┴──────┴────────────┘See column details for a specific table:
dataspoc-lens catalog --detail orders┌─────────────────┬───────────┐│ Column │ Type │├─────────────────┼───────────┤│ order_id │ INTEGER ││ customer_id │ INTEGER ││ order_date │ DATE ││ total │ DOUBLE ││ status │ VARCHAR │└─────────────────┴───────────┘5. Run a SQL query
Section titled “5. Run a SQL query”dataspoc-lens query "SELECT status, COUNT(*) as cnt FROM orders GROUP BY status"┌───────────┬───────┐│ status │ cnt │├───────────┼───────┤│ completed │ 32100 ││ pending │ 8450 ││ cancelled │ 2130 │└───────────┴───────┘
(3 row(s), 0.142s)6. Open the interactive shell
Section titled “6. Open the interactive shell”dataspoc-lens shellDataSpoc Lens Shell (DuckDB)Type SQL or .help for commands.
lens> SELECT * FROM customers LIMIT 3;┌─────┬──────────────┬───────────────────────┐│ id │ name │ email │├─────┼──────────────┼───────────────────────┤│ 1 │ Alice Smith │ alice@example.com ││ 2 │ Bob Johnson │ bob@example.com ││ 3 │ Carol White │ carol@example.com │└─────┴──────────────┴───────────────────────┘
(3 row(s), 0.008s)
lens> .quit7. Set up AI queries (optional)
Section titled “7. Set up AI queries (optional)”For free local AI using Ollama:
dataspoc-lens setup-aiOr configure a cloud provider:
export DATASPOC_LLM_PROVIDER=anthropicexport DATASPOC_LLM_API_KEY=sk-ant-...8. Ask questions in natural language
Section titled “8. Ask questions in natural language”dataspoc-lens ask "What are the top 5 customers by total spending?"SQL: SELECT c.name, SUM(o.total) as total_spent FROM customers c JOIN orders o ON c.id = o.customer_id GROUP BY c.name ORDER BY total_spent DESC LIMIT 5
┌──────────────┬─────────────┐│ name │ total_spent │├──────────────┼─────────────┤│ Alice Smith │ 15420.50 ││ Bob Johnson │ 12300.00 ││ Carol White │ 9870.25 ││ Dave Brown │ 8540.00 ││ Eve Davis │ 7210.75 │└──────────────┴─────────────┘
(5 row(s), 1.230s)Next steps
Section titled “Next steps”- Interactive Shell — learn dot commands and shell features
- AI Ask — configure AI providers and advanced usage
- Notebooks — use Jupyter or Marimo with your data
- Commands Reference — full CLI reference