DataSpoc Lens

DataSpoc Lens is a virtual warehouse over cloud Parquet. It mounts your data lake as DuckDB views and lets you query with SQL, an interactive shell, Jupyter notebooks, Marimo, natural language (AI), or programmatically via the Python SDK.

What Lens does

Mounts cloud Parquet as DuckDB views — no data warehouse infrastructure needed
SQL shell with syntax highlighting, autocomplete, and dot commands
Jupyter and Marimo notebooks with tables pre-mounted
AI queries — ask questions in natural language, get SQL + results
Local cache — work offline and reduce cloud egress costs
SQL transforms — build curated datasets with numbered SQL files
MCP server — connect AI agents to your data lake via Model Context Protocol

Interfaces

Interface	Description
CLI	`dataspoc-lens` commands for all operations
Python SDK	`from dataspoc_lens import LensClient`
MCP Server	`dataspoc-lens mcp` for AI agent integration
Jupyter	`dataspoc-lens notebook` with `%%sql` magic
Marimo	`dataspoc-lens notebook --marimo`

How it works

Cloud Bucket → Catalog Discovery → DuckDB Views → Query / Shell / Notebook / AI
     │                │
     │          manifest.json (from Pipe)
     │          or scan-based (glob *.parquet)
     │
     └── read via DuckDB httpfs (remote Parquet, no download needed)
         └── or local cache (~/.dataspoc-lens/cache/) for offline work

Lens reads the manifest written by DataSpoc Pipe for table discovery. If no manifest is found, it scans the bucket for .parquet files and groups them by directory.

Install

pip install dataspoc-lens

Open source

DataSpoc Lens is licensed under Apache 2.0 — free to use, modify, and distribute.

GitHub: github.com/dataspoclab/dataspoc-lens
PyPI: pypi.org/project/dataspoc-lens