Open-Source · Apache 2.0

Your warehouse is a bucket. You just don't know it yet.

You have Parquet files in S3. You have DuckDB. You don't need Snowflake. Lens mounts your cloud bucket as SQL tables and lets you query from the terminal, from Jupyter, or from plain English.

$ pip install dataspoc-lens
$ dataspoc-lens add-bucket s3://my-data
$ dataspoc-lens shell

Quickstart GitHub

📊

Your DA Agent is ready.

Lens ships with AGENT.md — a skill file that turns any AI agent into your data analyst. It discovers tables, writes SQL, answers business questions, refreshes cache, and exports reports. In plain English.

The agent runs real SQL on real data — not embeddings, not RAG, not approximations. Every answer is grounded in your actual Parquet files. It checks freshness before querying, so you never get stale answers.

Explores schemas Writes SQL Answers in English Refreshes cache Exports CSV/JSON

Claude via MCP → Lens

You: "Build me a weekly revenue
      report by product line."

[MCP] cache_refresh_stale()
[MCP] list_tables()
[MCP] describe_table("orders")
[MCP] query("SELECT ...")

Agent: "Revenue this week: $312k.
Electronics leads at $98k (+15%),
followed by Software at $87k.
Here's the CSV export."

Stop paying for queries

$8k/month

Your Snowflake bill is $8k/month and 90% of queries are SELECT COUNT(*).

Blocked

The analyst needs to wait for the data team to write a SQL query. Every. Single. Time.

40% wrong

You built a RAG pipeline so your AI agent can answer questions about your data. It hallucinates 40% of the time.

Before and after

What changes when you switch to Lens.

Before

$8k/month Snowflake

After

$0 — DuckDB is free

Before

"Can someone run this query for me?"

After

dataspoc-lens ask "monthly revenue"

Before

Custom RAG pipeline for AI

After

dataspoc-lens mcp — real SQL, not hallucinations

Six superpowers

Every way you could want to query your data lake — for humans and machines.

SQL Shell

Interactive SQL shell powered by DuckDB. Tab completion, history, dot commands. Your Parquet files feel like Postgres tables.

lens> SELECT customer, SUM(total)
  FROM orders
  GROUP BY customer
  ORDER BY 2 DESC LIMIT 5;

┌────────────┬──────────┐
│ customer   │ sum      │
│ Acme Corp  │ $247,891 │
│ Initech    │ $189,234 │
└────────────┴──────────┘

AI Ask

Ask questions in plain English. Lens generates real SQL, runs it against real data, and shows the results. No embeddings. No vector stores. No guessing.

$ dataspoc-lens ask "top customers by revenue"

SQL: SELECT customer, SUM(total) ...

Acme Corp leads with $247k, followed
by Initech at $189k (+12% QoQ).

Jupyter & Marimo

Launch notebooks pre-connected to your data lake. All tables are already mounted. Just write SQL or Python.

$ dataspoc-lens notebook
  3 buckets, 47 tables mounted
  http://localhost:8888

$ dataspoc-lens notebook --marimo

Local Cache

Cache Parquet files locally. Repeated queries are instant. Work offline on a plane. Reduce cloud egress costs to near zero.

$ dataspoc-lens cache orders
$ dataspoc-lens cache --list
  orders  | 24 MB | fresh
  users   |  3 MB | stale
$ dataspoc-lens cache users --refresh

MCP Server

Turn your data lake into an API for AI agents. Claude, Cursor, Windsurf — any MCP client gets read-only SQL access. Real queries, real answers, zero hallucinations.

$ dataspoc-lens mcp

# Your agent can now:
#   list_tables → discover data
#   query(sql)  → run real SQL
#   ask(question) → NL to SQL

Python SDK

Feed your AI agents with structured, governed data. LensClient gives programmatic access to your entire lake — tables, schemas, queries, and cache — with the same access controls as humans. No scraping. No CSV exports. No shadow pipelines.

from dataspoc_lens import LensClient

with LensClient() as client:
    tables = client.tables()
    result = client.query("SELECT ...")
    answer = client.ask("monthly revenue")
    client.cache_refresh_stale()

🔒

Serve data to agents the right way.

Your AI agents need data. But letting them scrape databases, parse CSV exports, or build custom RAG pipelines is a security and quality nightmare. Lens gives agents a governed, read-only interface to your data lake — the same tables, the same schemas, the same IAM permissions that your human analysts use.

No new credentials to manage. No shadow data pipelines. No unaudited access. Agents query through MCP or the Python SDK, and every query runs as real SQL on real Parquet — not embeddings, not approximations. Your data governance stays intact. Your agents get the truth.

Read-only by default Cloud IAM enforced No hallucinations Same data as humans Audit-friendly

When your AI agent queries with Lens, it tells the truth

Real SQL on real data. No embeddings. No vector stores. No hallucinations.

"How did sales perform last week compared to the week before?"


MCP call: list_tables

MCP call: describe_table("raw.postgres.orders")

MCP call: query("SELECT ... GROUP BY week")

Sales were $247k last week, up 8% from $228k the week before. Electronics had the biggest jump at 15%.

Every number comes from a real SQL query against your actual data. The agent can show its work — because there is work to show.

Query your lake in 5 minutes. Not 5 months.

$ pip install dataspoc-lens
$ dataspoc-lens add-bucket s3://my-data
$ dataspoc-lens shell

Or connect your AI agent: dataspoc-lens mcp

Read the Quickstart View on GitHub