DataSpoc Platform
DataSpoc is a data platform built for both humans and AI agents. It turns any data source into a queryable data lake using three CLI tools connected by Parquet files in your cloud bucket.
Three Products, One Platform
Section titled “Three Products, One Platform”Pipe --- Ingestion (Open-Source)
Section titled “Pipe --- Ingestion (Open-Source)”Pipe connects to 400+ data sources and writes Parquet files to your bucket. It handles incremental extraction, schema detection, and partitioning out of the box.
- Apache 2.0 license
- github.com/dataspoclab/dataspoc-pipe
Lens --- Query (Open-Source)
Section titled “Lens --- Query (Open-Source)”Lens mounts your bucket as a SQL database. Query with SQL, explore in Jupyter or Marimo notebooks, or ask questions in natural language with AI.
- Apache 2.0 license
- github.com/dataspoclab/dataspoc-lens
ML --- AutoML (Commercial)
Section titled “ML --- AutoML (Commercial)”ML reads Parquet from the bucket, runs automated feature engineering, trains models, and writes predictions back as Parquet for Lens to query.
How They Connect
Section titled “How They Connect”Source ──► [Pipe] ──► Parquet in Bucket ──► [Lens] ──► SQL / Jupyter / AI │ [ML] ──► train / predict │ [MCP] ──► Claude / Cursor / WindsurfAll communication between products happens through Parquet files in a shared bucket. Pipe writes, Lens reads, ML reads and writes. No product imports code from another.
Key Metrics
Section titled “Key Metrics”| Metric | Value |
|---|---|
| Data sources supported | 400+ |
| Time to first query | 15 minutes |
| Cost to start | $0 |
Three Ways to Use It
Section titled “Three Ways to Use It”- Terminal ---
dataspoc-pipe runanddataspoc-lens shellfrom any shell - Python --- Import
LensClientorPipeClientin your scripts and agents - MCP for AI agents --- Connect Claude Desktop, Claude Code, Cursor, or Windsurf directly to your data lake
GitHub
Section titled “GitHub”- dataspoc-pipe --- Ingestion CLI
- dataspoc-lens --- Query CLI