Skip to content

DataSpoc Platform

DataSpoc is a data platform built for both humans and AI agents. It turns any data source into a queryable data lake using three CLI tools connected by Parquet files in your cloud bucket.

Pipe connects to 400+ data sources and writes Parquet files to your bucket. It handles incremental extraction, schema detection, and partitioning out of the box.

Lens mounts your bucket as a SQL database. Query with SQL, explore in Jupyter or Marimo notebooks, or ask questions in natural language with AI.

ML reads Parquet from the bucket, runs automated feature engineering, trains models, and writes predictions back as Parquet for Lens to query.

Source ──► [Pipe] ──► Parquet in Bucket ──► [Lens] ──► SQL / Jupyter / AI
[ML] ──► train / predict
[MCP] ──► Claude / Cursor / Windsurf

All communication between products happens through Parquet files in a shared bucket. Pipe writes, Lens reads, ML reads and writes. No product imports code from another.

MetricValue
Data sources supported400+
Time to first query15 minutes
Cost to start$0
  1. Terminal --- dataspoc-pipe run and dataspoc-lens shell from any shell
  2. Python --- Import LensClient or PipeClient in your scripts and agents
  3. MCP for AI agents --- Connect Claude Desktop, Claude Code, Cursor, or Windsurf directly to your data lake