Examples
All examples are available in the dataspoc-lens/examples directory on GitHub.
Python SDK
Section titled “Python SDK”sdk_usage.py shows how to use Lens programmatically from Python instead of the CLI.
"""Example: Using the DataSpoc Lens Python SDK."""
from dataspoc_lens import LensClient
# Connect to your data lake (reads ~/.dataspoc-lens/config.yaml)with LensClient() as client: # 1. Discover tables print("Tables:", client.tables())
# 2. Inspect schema schema = client.schema("orders") for col in schema: print(f" {col['column_name']}: {col['data_type']}")
# 3. Run SQL result = client.query("SELECT status, COUNT(*) AS cnt FROM orders GROUP BY 1") print(f"\nQuery returned {result['row_count']} rows in {result['duration']:.3f}s") for row in result["rows"]: print(f" {row}")
# 4. Ask in natural language answer = client.ask("top 5 customers by total spending") print(f"\nSQL: {answer['sql']}") for row in answer["rows"]: print(f" {row}")
# 5. Cache management statuses = client.cache_status() for s in statuses: print(f" {s['table']}: {s['status']}")
# Refresh stale caches refreshed = client.cache_refresh_stale() if refreshed: print(f"Refreshed {len(refreshed)} table(s)")Key SDK methods
Section titled “Key SDK methods”| Method | Description |
|---|---|
LensClient() | Create a client (reads ~/.dataspoc-lens/config.yaml). Use as context manager |
client.tables() | List all mounted tables |
client.schema(table) | Get column names and types for a table |
client.query(sql) | Run a SQL query, returns {rows, row_count, duration} |
client.ask(question) | Natural language to SQL (requires AI extra and API key) |
client.cache_status() | Check cache freshness for all tables |
client.cache_refresh_stale() | Re-download stale cached tables |
MCP Config (Claude Desktop)
Section titled “MCP Config (Claude Desktop)”mcp_config.json configures DataSpoc Lens as an MCP server for Claude Desktop, allowing Claude to query your data lake, inspect schemas, and run SQL via natural language.
{ "mcpServers": { "dataspoc-lens": { "command": "dataspoc-lens", "args": ["mcp"] } }}- Install DataSpoc Lens:
pip install dataspoc-lens - Make sure you have at least one bucket registered:
dataspoc-lens add-bucket s3://my-bucket - Open Claude Desktop settings
- Add the configuration above to your MCP servers config
- Restart Claude Desktop
Once connected, you can ask Claude things like:
- “What tables are available in my data lake?”
- “Show me the schema of the orders table”
- “What are the top 10 customers by revenue?”
- “Export the monthly sales summary to CSV”
Using both Pipe and Lens MCP servers
Section titled “Using both Pipe and Lens MCP servers”You can register both Pipe and Lens as MCP servers to give Claude full control over the data pipeline:
{ "mcpServers": { "dataspoc-pipe": { "command": "dataspoc-pipe", "args": ["mcp"] }, "dataspoc-lens": { "command": "dataspoc-lens", "args": ["mcp"] } }}Querying the E2E Demo lake
Section titled “Querying the E2E Demo lake”After running the End-to-End Demo (which ingests the Iris dataset with Pipe), you can query it with Lens. Here are example queries to explore the data.
# If you haven't already run the E2E demo:cd dataspoc-pipebash examples/e2e-demo.sh
# Or manually set up Lens to point at an existing lake:dataspoc-lens initdataspoc-lens add-bucket "file:///path/to/lake"View the catalog
Section titled “View the catalog”dataspoc-lens catalogSample queries
Section titled “Sample queries”First 10 rows:
dataspoc-lens query "SELECT * FROM iris LIMIT 10"Row count:
dataspoc-lens query "SELECT COUNT(*) AS total_rows FROM iris"Average measurements per species:
dataspoc-lens query " SELECT species, ROUND(AVG(sepal_length), 2) AS avg_sepal_len, ROUND(AVG(sepal_width), 2) AS avg_sepal_wid, ROUND(AVG(petal_length), 2) AS avg_petal_len, ROUND(AVG(petal_width), 2) AS avg_petal_wid FROM iris GROUP BY species ORDER BY species"Species distribution:
dataspoc-lens query " SELECT species, COUNT(*) AS n FROM iris GROUP BY species ORDER BY n DESC"Top 5 largest petals:
dataspoc-lens query " SELECT species, petal_length, petal_width FROM iris ORDER BY petal_length DESC LIMIT 5"Correlation between sepal and petal dimensions:
dataspoc-lens query " SELECT species, ROUND(CORR(sepal_length, petal_length), 3) AS sepal_petal_len_corr, ROUND(CORR(sepal_width, petal_width), 3) AS sepal_petal_wid_corr FROM iris GROUP BY species"Export results
Section titled “Export results”# Export full dataset to CSVdataspoc-lens export "SELECT * FROM iris" --format csv --output iris_export.csv
# Export summary to JSONdataspoc-lens export " SELECT species, ROUND(AVG(sepal_length), 2) AS avg_sepal_len, ROUND(AVG(petal_length), 2) AS avg_petal_len FROM iris GROUP BY species ORDER BY species" --format json --output iris_summary.jsonInteractive shell
Section titled “Interactive shell”dataspoc-lens shelllens> .tables iris
lens> .schema iris sepal_length DOUBLE sepal_width DOUBLE petal_length DOUBLE petal_width DOUBLE species VARCHAR
lens> SELECT species, COUNT(*) AS n FROM iris GROUP BY species;┌───────────┬────┐│ species │ n │├───────────┼────┤│ setosa │ 50 ││ versicolor│ 50 ││ virginica │ 50 │└───────────┴────┘
(3 row(s), 0.005s)Scenario examples
Section titled “Scenario examples”Beyond the example scripts, here are complete scenarios showing Lens in real-world use.
Explore an e-commerce data lake
Section titled “Explore an e-commerce data lake”pip install dataspoc-lens[s3]dataspoc-lens initdataspoc-lens add-bucket s3://acme-ecommerce-datadataspoc-lens query " SELECT region, COUNT(*) as orders, SUM(total) as revenue FROM orders WHERE order_date >= '2026-01-01' GROUP BY region ORDER BY revenue DESC"Build a curated layer with transforms
Section titled “Build a curated layer with transforms”Create numbered SQL files in ~/.dataspoc-lens/transforms/:
001_clean_customers.sql:
CREATE OR REPLACE TABLE clean_customers ASSELECT id, TRIM(LOWER(email)) AS email, COALESCE(name, 'Unknown') AS name, created_at, CASE WHEN status IN ('active', 'Active', 'ACTIVE') THEN 'active' WHEN status IN ('inactive', 'Inactive') THEN 'inactive' ELSE 'unknown' END AS statusFROM customersWHERE email IS NOT NULL AND email NOT LIKE '%test%';002_order_metrics.sql:
CREATE OR REPLACE TABLE order_metrics ASSELECT customer_id, COUNT(*) AS total_orders, SUM(total) AS lifetime_value, AVG(total) AS avg_order_value, MIN(order_date) AS first_order, MAX(order_date) AS last_orderFROM ordersGROUP BY customer_id;Run the transforms:
dataspoc-lens transform runMulti-cloud analysis with Jupyter
Section titled “Multi-cloud analysis with Jupyter”pip install dataspoc-lens[s3,gcs,jupyter]dataspoc-lens initdataspoc-lens add-bucket s3://acme-product-datadataspoc-lens add-bucket gs://acme-analyticsdataspoc-lens notebookTables from both clouds are available in a single Jupyter session, ready for cross-cloud joins and visualizations.