Examples

All examples are available in the dataspoc-lens/examples directory on GitHub.

Python SDK

sdk_usage.py shows how to use Lens programmatically from Python instead of the CLI.

"""Example: Using the DataSpoc Lens Python SDK."""

from dataspoc_lens import LensClient

# Connect to your data lake (reads ~/.dataspoc-lens/config.yaml)
with LensClient() as client:
    # 1. Discover tables
    print("Tables:", client.tables())

    # 2. Inspect schema
    schema = client.schema("orders")
    for col in schema:
        print(f"  {col['column_name']}: {col['data_type']}")

    # 3. Run SQL
    result = client.query("SELECT status, COUNT(*) AS cnt FROM orders GROUP BY 1")
    print(f"\nQuery returned {result['row_count']} rows in {result['duration']:.3f}s")
    for row in result["rows"]:
        print(f"  {row}")

    # 4. Ask in natural language
    answer = client.ask("top 5 customers by total spending")
    print(f"\nSQL: {answer['sql']}")
    for row in answer["rows"]:
        print(f"  {row}")

    # 5. Cache management
    statuses = client.cache_status()
    for s in statuses:
        print(f"  {s['table']}: {s['status']}")

    # Refresh stale caches
    refreshed = client.cache_refresh_stale()
    if refreshed:
        print(f"Refreshed {len(refreshed)} table(s)")

Key SDK methods

Method	Description
`LensClient()`	Create a client (reads `~/.dataspoc-lens/config.yaml`). Use as context manager
`client.tables()`	List all mounted tables
`client.schema(table)`	Get column names and types for a table
`client.query(sql)`	Run a SQL query, returns `{rows, row_count, duration}`
`client.ask(question)`	Natural language to SQL (requires AI extra and API key)
`client.cache_status()`	Check cache freshness for all tables
`client.cache_refresh_stale()`	Re-download stale cached tables

MCP Config (Claude Desktop)

mcp_config.json configures DataSpoc Lens as an MCP server for Claude Desktop, allowing Claude to query your data lake, inspect schemas, and run SQL via natural language.

{
  "mcpServers": {
    "dataspoc-lens": {
      "command": "dataspoc-lens",
      "args": ["mcp"]
    }
  }
}

Setup

Install DataSpoc Lens: pip install dataspoc-lens
Make sure you have at least one bucket registered: dataspoc-lens add-bucket s3://my-bucket
Open Claude Desktop settings
Add the configuration above to your MCP servers config
Restart Claude Desktop

Once connected, you can ask Claude things like:

“What tables are available in my data lake?”
“Show me the schema of the orders table”
“What are the top 10 customers by revenue?”
“Export the monthly sales summary to CSV”

Using both Pipe and Lens MCP servers

You can register both Pipe and Lens as MCP servers to give Claude full control over the data pipeline:

{
  "mcpServers": {
    "dataspoc-pipe": {
      "command": "dataspoc-pipe",
      "args": ["mcp"]
    },
    "dataspoc-lens": {
      "command": "dataspoc-lens",
      "args": ["mcp"]
    }
  }
}

Querying the E2E Demo lake

After running the End-to-End Demo (which ingests the Iris dataset with Pipe), you can query it with Lens. Here are example queries to explore the data.

Setup

# If you haven't already run the E2E demo:
cd dataspoc-pipe
bash examples/e2e-demo.sh

# Or manually set up Lens to point at an existing lake:
dataspoc-lens init
dataspoc-lens add-bucket "file:///path/to/lake"

View the catalog

dataspoc-lens catalog

Sample queries

First 10 rows:

dataspoc-lens query "SELECT * FROM iris LIMIT 10"

Row count:

dataspoc-lens query "SELECT COUNT(*) AS total_rows FROM iris"

Average measurements per species:

dataspoc-lens query "
  SELECT
    species,
    ROUND(AVG(sepal_length), 2) AS avg_sepal_len,
    ROUND(AVG(sepal_width), 2)  AS avg_sepal_wid,
    ROUND(AVG(petal_length), 2) AS avg_petal_len,
    ROUND(AVG(petal_width), 2)  AS avg_petal_wid
  FROM iris
  GROUP BY species
  ORDER BY species
"

Species distribution:

dataspoc-lens query "
  SELECT species, COUNT(*) AS n
  FROM iris
  GROUP BY species
  ORDER BY n DESC
"

Top 5 largest petals:

dataspoc-lens query "
  SELECT species, petal_length, petal_width
  FROM iris
  ORDER BY petal_length DESC
  LIMIT 5
"

Correlation between sepal and petal dimensions:

dataspoc-lens query "
  SELECT
    species,
    ROUND(CORR(sepal_length, petal_length), 3) AS sepal_petal_len_corr,
    ROUND(CORR(sepal_width, petal_width), 3)   AS sepal_petal_wid_corr
  FROM iris
  GROUP BY species
"

Export results

# Export full dataset to CSV
dataspoc-lens export "SELECT * FROM iris" --format csv --output iris_export.csv

# Export summary to JSON
dataspoc-lens export "
  SELECT species,
         ROUND(AVG(sepal_length), 2) AS avg_sepal_len,
         ROUND(AVG(petal_length), 2) AS avg_petal_len
  FROM iris
  GROUP BY species
  ORDER BY species
" --format json --output iris_summary.json

Interactive shell

dataspoc-lens shell

lens> .tables
  iris

lens> .schema iris
  sepal_length  DOUBLE
  sepal_width   DOUBLE
  petal_length  DOUBLE
  petal_width   DOUBLE
  species       VARCHAR

lens> SELECT species, COUNT(*) AS n FROM iris GROUP BY species;
┌───────────┬────┐
│ species   │ n  │
├───────────┼────┤
│ setosa    │ 50 │
│ versicolor│ 50 │
│ virginica │ 50 │
└───────────┴────┘

(3 row(s), 0.005s)

Scenario examples

Beyond the example scripts, here are complete scenarios showing Lens in real-world use.

Explore an e-commerce data lake

pip install dataspoc-lens[s3]
dataspoc-lens init
dataspoc-lens add-bucket s3://acme-ecommerce-data

dataspoc-lens query "
  SELECT region, COUNT(*) as orders, SUM(total) as revenue
  FROM orders
  WHERE order_date >= '2026-01-01'
  GROUP BY region
  ORDER BY revenue DESC
"

Build a curated layer with transforms

Create numbered SQL files in ~/.dataspoc-lens/transforms/:

001_clean_customers.sql:

CREATE OR REPLACE TABLE clean_customers AS
SELECT
    id,
    TRIM(LOWER(email)) AS email,
    COALESCE(name, 'Unknown') AS name,
    created_at,
    CASE
        WHEN status IN ('active', 'Active', 'ACTIVE') THEN 'active'
        WHEN status IN ('inactive', 'Inactive') THEN 'inactive'
        ELSE 'unknown'
    END AS status
FROM customers
WHERE email IS NOT NULL
  AND email NOT LIKE '%test%';

002_order_metrics.sql:

CREATE OR REPLACE TABLE order_metrics AS
SELECT
    customer_id,
    COUNT(*) AS total_orders,
    SUM(total) AS lifetime_value,
    AVG(total) AS avg_order_value,
    MIN(order_date) AS first_order,
    MAX(order_date) AS last_order
FROM orders
GROUP BY customer_id;

Run the transforms:

dataspoc-lens transform run

Multi-cloud analysis with Jupyter

pip install dataspoc-lens[s3,gcs,jupyter]
dataspoc-lens init
dataspoc-lens add-bucket s3://acme-product-data
dataspoc-lens add-bucket gs://acme-analytics
dataspoc-lens notebook

Tables from both clouds are available in a single Jupyter session, ready for cross-cloud joins and visualizations.