Skip to content

DataSpoc Lens

DataSpoc Lens is a virtual warehouse over cloud Parquet. It mounts your data lake as DuckDB views and lets you query with SQL, an interactive shell, Jupyter notebooks, Marimo, natural language (AI), or programmatically via the Python SDK.

  • Mounts cloud Parquet as DuckDB views — no data warehouse infrastructure needed
  • SQL shell with syntax highlighting, autocomplete, and dot commands
  • Jupyter and Marimo notebooks with tables pre-mounted
  • AI queries — ask questions in natural language, get SQL + results
  • Local cache — work offline and reduce cloud egress costs
  • SQL transforms — build curated datasets with numbered SQL files
  • MCP server — connect AI agents to your data lake via Model Context Protocol
InterfaceDescription
CLIdataspoc-lens commands for all operations
Python SDKfrom dataspoc_lens import LensClient
MCP Serverdataspoc-lens mcp for AI agent integration
Jupyterdataspoc-lens notebook with %%sql magic
Marimodataspoc-lens notebook --marimo
Cloud Bucket → Catalog Discovery → DuckDB Views → Query / Shell / Notebook / AI
│ │
│ manifest.json (from Pipe)
│ or scan-based (glob *.parquet)
└── read via DuckDB httpfs (remote Parquet, no download needed)
└── or local cache (~/.dataspoc-lens/cache/) for offline work

Lens reads the manifest written by DataSpoc Pipe for table discovery. If no manifest is found, it scans the bucket for .parquet files and groups them by directory.

Terminal window
pip install dataspoc-lens

DataSpoc Lens is licensed under Apache 2.0 — free to use, modify, and distribute.