DataSpoc Lens
DataSpoc Lens is a virtual warehouse over cloud Parquet. It mounts your data lake as DuckDB views and lets you query with SQL, an interactive shell, Jupyter notebooks, Marimo, natural language (AI), or programmatically via the Python SDK.
What Lens does
Section titled “What Lens does”- Mounts cloud Parquet as DuckDB views — no data warehouse infrastructure needed
- SQL shell with syntax highlighting, autocomplete, and dot commands
- Jupyter and Marimo notebooks with tables pre-mounted
- AI queries — ask questions in natural language, get SQL + results
- Local cache — work offline and reduce cloud egress costs
- SQL transforms — build curated datasets with numbered SQL files
- MCP server — connect AI agents to your data lake via Model Context Protocol
Interfaces
Section titled “Interfaces”| Interface | Description |
|---|---|
| CLI | dataspoc-lens commands for all operations |
| Python SDK | from dataspoc_lens import LensClient |
| MCP Server | dataspoc-lens mcp for AI agent integration |
| Jupyter | dataspoc-lens notebook with %%sql magic |
| Marimo | dataspoc-lens notebook --marimo |
How it works
Section titled “How it works”Cloud Bucket → Catalog Discovery → DuckDB Views → Query / Shell / Notebook / AI │ │ │ manifest.json (from Pipe) │ or scan-based (glob *.parquet) │ └── read via DuckDB httpfs (remote Parquet, no download needed) └── or local cache (~/.dataspoc-lens/cache/) for offline workLens reads the manifest written by DataSpoc Pipe for table discovery. If no manifest is found, it scans the bucket for .parquet files and groups them by directory.
Install
Section titled “Install”pip install dataspoc-lensOpen source
Section titled “Open source”DataSpoc Lens is licensed under Apache 2.0 — free to use, modify, and distribute.