Skip to content

Examples

All examples are available in the dataspoc-lens/examples directory on GitHub.


sdk_usage.py shows how to use Lens programmatically from Python instead of the CLI.

"""Example: Using the DataSpoc Lens Python SDK."""
from dataspoc_lens import LensClient
# Connect to your data lake (reads ~/.dataspoc-lens/config.yaml)
with LensClient() as client:
# 1. Discover tables
print("Tables:", client.tables())
# 2. Inspect schema
schema = client.schema("orders")
for col in schema:
print(f" {col['column_name']}: {col['data_type']}")
# 3. Run SQL
result = client.query("SELECT status, COUNT(*) AS cnt FROM orders GROUP BY 1")
print(f"\nQuery returned {result['row_count']} rows in {result['duration']:.3f}s")
for row in result["rows"]:
print(f" {row}")
# 4. Ask in natural language
answer = client.ask("top 5 customers by total spending")
print(f"\nSQL: {answer['sql']}")
for row in answer["rows"]:
print(f" {row}")
# 5. Cache management
statuses = client.cache_status()
for s in statuses:
print(f" {s['table']}: {s['status']}")
# Refresh stale caches
refreshed = client.cache_refresh_stale()
if refreshed:
print(f"Refreshed {len(refreshed)} table(s)")
MethodDescription
LensClient()Create a client (reads ~/.dataspoc-lens/config.yaml). Use as context manager
client.tables()List all mounted tables
client.schema(table)Get column names and types for a table
client.query(sql)Run a SQL query, returns {rows, row_count, duration}
client.ask(question)Natural language to SQL (requires AI extra and API key)
client.cache_status()Check cache freshness for all tables
client.cache_refresh_stale()Re-download stale cached tables

mcp_config.json configures DataSpoc Lens as an MCP server for Claude Desktop, allowing Claude to query your data lake, inspect schemas, and run SQL via natural language.

{
"mcpServers": {
"dataspoc-lens": {
"command": "dataspoc-lens",
"args": ["mcp"]
}
}
}
  1. Install DataSpoc Lens: pip install dataspoc-lens
  2. Make sure you have at least one bucket registered: dataspoc-lens add-bucket s3://my-bucket
  3. Open Claude Desktop settings
  4. Add the configuration above to your MCP servers config
  5. Restart Claude Desktop

Once connected, you can ask Claude things like:

  • “What tables are available in my data lake?”
  • “Show me the schema of the orders table”
  • “What are the top 10 customers by revenue?”
  • “Export the monthly sales summary to CSV”

You can register both Pipe and Lens as MCP servers to give Claude full control over the data pipeline:

{
"mcpServers": {
"dataspoc-pipe": {
"command": "dataspoc-pipe",
"args": ["mcp"]
},
"dataspoc-lens": {
"command": "dataspoc-lens",
"args": ["mcp"]
}
}
}

After running the End-to-End Demo (which ingests the Iris dataset with Pipe), you can query it with Lens. Here are example queries to explore the data.

Terminal window
# If you haven't already run the E2E demo:
cd dataspoc-pipe
bash examples/e2e-demo.sh
# Or manually set up Lens to point at an existing lake:
dataspoc-lens init
dataspoc-lens add-bucket "file:///path/to/lake"
Terminal window
dataspoc-lens catalog

First 10 rows:

Terminal window
dataspoc-lens query "SELECT * FROM iris LIMIT 10"

Row count:

Terminal window
dataspoc-lens query "SELECT COUNT(*) AS total_rows FROM iris"

Average measurements per species:

Terminal window
dataspoc-lens query "
SELECT
species,
ROUND(AVG(sepal_length), 2) AS avg_sepal_len,
ROUND(AVG(sepal_width), 2) AS avg_sepal_wid,
ROUND(AVG(petal_length), 2) AS avg_petal_len,
ROUND(AVG(petal_width), 2) AS avg_petal_wid
FROM iris
GROUP BY species
ORDER BY species
"

Species distribution:

Terminal window
dataspoc-lens query "
SELECT species, COUNT(*) AS n
FROM iris
GROUP BY species
ORDER BY n DESC
"

Top 5 largest petals:

Terminal window
dataspoc-lens query "
SELECT species, petal_length, petal_width
FROM iris
ORDER BY petal_length DESC
LIMIT 5
"

Correlation between sepal and petal dimensions:

Terminal window
dataspoc-lens query "
SELECT
species,
ROUND(CORR(sepal_length, petal_length), 3) AS sepal_petal_len_corr,
ROUND(CORR(sepal_width, petal_width), 3) AS sepal_petal_wid_corr
FROM iris
GROUP BY species
"
Terminal window
# Export full dataset to CSV
dataspoc-lens export "SELECT * FROM iris" --format csv --output iris_export.csv
# Export summary to JSON
dataspoc-lens export "
SELECT species,
ROUND(AVG(sepal_length), 2) AS avg_sepal_len,
ROUND(AVG(petal_length), 2) AS avg_petal_len
FROM iris
GROUP BY species
ORDER BY species
" --format json --output iris_summary.json
Terminal window
dataspoc-lens shell
lens> .tables
iris
lens> .schema iris
sepal_length DOUBLE
sepal_width DOUBLE
petal_length DOUBLE
petal_width DOUBLE
species VARCHAR
lens> SELECT species, COUNT(*) AS n FROM iris GROUP BY species;
┌───────────┬────┐
│ species │ n │
├───────────┼────┤
│ setosa │ 50 │
│ versicolor│ 50 │
│ virginica │ 50 │
└───────────┴────┘
(3 row(s), 0.005s)

Beyond the example scripts, here are complete scenarios showing Lens in real-world use.

Terminal window
pip install dataspoc-lens[s3]
dataspoc-lens init
dataspoc-lens add-bucket s3://acme-ecommerce-data
Terminal window
dataspoc-lens query "
SELECT region, COUNT(*) as orders, SUM(total) as revenue
FROM orders
WHERE order_date >= '2026-01-01'
GROUP BY region
ORDER BY revenue DESC
"

Create numbered SQL files in ~/.dataspoc-lens/transforms/:

001_clean_customers.sql:

CREATE OR REPLACE TABLE clean_customers AS
SELECT
id,
TRIM(LOWER(email)) AS email,
COALESCE(name, 'Unknown') AS name,
created_at,
CASE
WHEN status IN ('active', 'Active', 'ACTIVE') THEN 'active'
WHEN status IN ('inactive', 'Inactive') THEN 'inactive'
ELSE 'unknown'
END AS status
FROM customers
WHERE email IS NOT NULL
AND email NOT LIKE '%test%';

002_order_metrics.sql:

CREATE OR REPLACE TABLE order_metrics AS
SELECT
customer_id,
COUNT(*) AS total_orders,
SUM(total) AS lifetime_value,
AVG(total) AS avg_order_value,
MIN(order_date) AS first_order,
MAX(order_date) AS last_order
FROM orders
GROUP BY customer_id;

Run the transforms:

Terminal window
dataspoc-lens transform run
Terminal window
pip install dataspoc-lens[s3,gcs,jupyter]
dataspoc-lens init
dataspoc-lens add-bucket s3://acme-product-data
dataspoc-lens add-bucket gs://acme-analytics
dataspoc-lens notebook

Tables from both clouds are available in a single Jupyter session, ready for cross-cloud joins and visualizations.