AI Ask

DataSpoc Lens can translate natural language questions into SQL, execute the query, and return the results. No SQL knowledge required.

Setup

Option 1: Local AI with Ollama (free, no API key)

dataspoc-lens setup-ai

This installs and configures Ollama with a local model. Completely free, no data leaves your machine.

Option 2: Cloud providers

Install the AI extra:

pip install dataspoc-lens[ai]

Configure via environment variables:

# Anthropic (Claude)
export DATASPOC_LLM_PROVIDER=anthropic
export DATASPOC_LLM_API_KEY=sk-ant-...

# OpenAI (GPT)
export DATASPOC_LLM_PROVIDER=openai
export DATASPOC_LLM_API_KEY=sk-...

Or add to ~/.dataspoc-lens/config.yaml:

llm:
  provider: anthropic     # ollama, anthropic, or openai
  api_key: sk-ant-...     # not needed for ollama
  model: ""               # optional, uses provider default

Providers

Provider	Cost	Privacy	Model
`ollama`	Free	Local (no data sent externally)	Configurable (default set by setup-ai)
`anthropic`	Paid (per token)	Data sent to Anthropic API	Claude (latest)
`openai`	Paid (per token)	Data sent to OpenAI API	GPT (latest)

Configuration priority

Environment variables (DATASPOC_LLM_PROVIDER, DATASPOC_LLM_API_KEY, DATASPOC_LLM_MODEL)
~/.dataspoc-lens/config.yaml
Default: ollama

Usage

dataspoc-lens ask "What are the top 5 customers by total spending?"

SQL: SELECT c.name, SUM(o.total) as total_spent
     FROM customers c JOIN orders o ON c.id = o.customer_id
     GROUP BY c.name ORDER BY total_spent DESC LIMIT 5

┌──────────────┬─────────────┐
│ name         │ total_spent │
├──────────────┼─────────────┤
│ Alice Smith  │ 15420.50    │
│ Bob Johnson  │ 12300.00    │
│ Carol White  │ 9870.25     │
│ Dave Brown   │ 8540.00     │
│ Eve Davis    │ 7210.75     │
└──────────────┴─────────────┘

(5 row(s), 1.230s)

Debug mode

Use --debug to see the full prompt sent to the LLM:

dataspoc-lens ask "Monthly revenue trend" --debug

--- Prompt sent to LLM ---
You are a SQL assistant. Given the following table schemas, write a DuckDB SQL
query to answer the user's question.

Tables:
- customers (id INTEGER, name VARCHAR, email VARCHAR, ...)
- orders (order_id INTEGER, customer_id INTEGER, order_date DATE, total DOUBLE, ...)
- products (product_id INTEGER, name VARCHAR, price DOUBLE, ...)

Question: Monthly revenue trend

Return ONLY the SQL query, no explanation.
--- End of prompt ---

SQL: SELECT DATE_TRUNC('month', order_date) as month,
     SUM(total) as revenue
     FROM orders GROUP BY month ORDER BY month

┌────────────┬──────────┐
│ month      │ revenue  │
├────────────┼──────────┤
│ 2024-01-01 │ 125400   │
│ 2024-02-01 │ 138200   │
│ 2024-03-01 │ 152800   │
└────────────┴──────────┘

(3 row(s), 1.450s)

How it works

Lens reads the schema of all mounted tables (column names and types)
A sample of rows is included for context
The schema, sample data, and your question are sent to the configured LLM
The LLM returns a DuckDB-compatible SQL query
Lens executes the SQL against your data lake
Results are displayed (or exported with --export)

Export results

dataspoc-lens ask "Revenue by product category" --export revenue.csv
dataspoc-lens ask "Customer segments" --export segments.parquet

JSON output

dataspoc-lens ask "Total orders" --output json

{
  "sql": "SELECT COUNT(*) as total_orders FROM orders",
  "columns": ["total_orders"],
  "rows": [[48680]],
  "duration": 0.95,
  "error": null
}

Examples

# Aggregations
dataspoc-lens ask "What is the average order value?"

# Joins
dataspoc-lens ask "Which products have never been ordered?"

# Time series
dataspoc-lens ask "Show weekly order counts for the last 3 months"

# Filtering
dataspoc-lens ask "How many orders were cancelled in January?"