The data platform for humans and AI agents.
Every data team starts the same way: 3 months setting up Airflow, dbt, and a warehouse before anyone runs a query. DataSpoc is the shortcut. Three CLI tools. One pip install. Your data stays in your bucket. Your AI agent queries it via MCP.
pip install dataspoc-pipe dataspoc-lens
An AI agent for every role.
DataSpoc ships with AGENT.md — a skill file that teaches AI agents how to use your data platform. Drop it into Claude, Cursor, or any MCP client and watch your team accelerate.
DE Agent
Data Engineer Agent
Ingests data from any source. Monitors pipelines. Detects failures and retries. Adds new sources when you ask. Your always-on data engineer that never takes PTO.
# Agent reads AGENT.md, connects via MCP "Add our Stripe API as a source and schedule it every 6 hours" → dataspoc-pipe add stripe → dataspoc-pipe run stripe → dataspoc-pipe schedule install
DA Agent
Data Analyst Agent
Explores your data lake. Answers business questions in plain English. Builds reports. Refreshes cache before querying. Your analyst that works at 3am without complaining.
# Agent reads AGENT.md, connects via MCP "Which customers are at risk of churning? Export the list as CSV" → cache_refresh_stale() → ask("customers with churn risk") → query("SELECT ...") → export
ML Agent
ML Engineer Agent
Trains models on your lake data. Generates predictions. Explains results. Monitors drift. Your ML engineer that turns "can we predict X?" into a model in minutes.
# Agent reads AGENT.md, connects via MCP "Train a churn model on our customer data and explain it" → ml train --target churn --from customers → ml explain --model churn → ml predict --model churn --from new
Every DataSpoc repo ships with an AGENT.md — a skill file that documents every function, pattern, and constraint. AI agents read it and know exactly what to do. No custom integration code. No prompt engineering. Just drop the file and go.
Sound familiar?
These are the stories we hear every week from data teams.
"2 months just to move CSVs"
You spent 2 months setting up Airflow, debugging Docker containers, and writing DAGs — just to move CSV files to S3. The business still has no dashboard.
"The warehouse costs more than the insights"
Your data warehouse bill hit $4k/month. The CFO asks what it produces. You look at the dashboards. Three people use them.
"Every AI tool needs a custom wrapper"
You want Claude to query your data. So you build a custom API, a vector store, a retrieval pipeline... just to answer "what were last month's sales?"
"Analysts wait days for a query"
Your analyst has a question. They file a ticket. The data engineer writes a query. Three days later, the answer is "42." The moment has passed.
What if your data platform was just pip install?
The old way is expensive, slow, and fragile. There is a simpler path.
Airflow
+ dbt
+ Snowflake
+ Looker
+ custom API for AI agents
6 months + $50k/year
pip install dataspoc-pipe
pip install dataspoc-lens
Ingest, query, AI — done.
15 minutes + $0
How it works
Three steps. No infrastructure to provision, no accounts to create, no YAML to debug.
Pipe ingests
Connect any source. Data lands as Parquet in your bucket.
$ dataspoc-pipe add my-postgres $ dataspoc-pipe run my-postgres # → Parquet files in s3://bucket/raw/
Lens queries
Ask questions in SQL or plain English. Instant results.
$ dataspoc-lens ask "top 10 customers by revenue" # → SQL generated, results displayed
Agents connect
One command turns your data lake into an MCP server for AI.
$ dataspoc-lens mcp # → Claude, Cursor, any agent queries your data
Three tools. One bucket.
Each tool does one job well. They connect through Parquet files in your cloud storage.
Pipe
Data Ingestion
"When I need data from a source, I want it in my bucket as Parquet — without managing infrastructure."
400+ Singer sources. Streaming and incremental. Auto-catalog. S3, GCS, Azure.
$ pip install dataspoc-pipe Lens
Data Query Engine
"When I have a question about my data, I want to ask it in SQL or plain English — without spinning up a warehouse."
DuckDB-powered. Interactive shell, Jupyter, Marimo. AI queries via natural language. MCP server.
$ pip install dataspoc-lens ML
AutoML
"When I need predictions, I want to train a model on my lake data — without being a data scientist."
Automated feature engineering, model selection, training, and prediction on Parquet data.
$ dataspoc-lens ml train Built for your team
From the data engineer who builds pipelines to the CTO who signs off on the budget.
Data Engineer
Stop writing Airflow DAGs
One command to ingest from any source. No containers, no schedulers, no YAML. Just pipe run.
Data Analyst
Ask questions in English
Type your question. Get SQL + results. No ticket, no waiting, no context switching. Just lens ask.
Platform Team
One tool for humans and AI
Same CLI, same data, for analysts and AI agents. MCP-native. No infrastructure to manage, no API layer to build.
Founder / CTO
Data platform in 15 minutes
$0 to start. Open source. No vendor lock-in. Your data stays in your bucket. Scale when ready.
400+
Singer data sources
Apache 2.0
Open source license
MCP
Native for AI agents
PyPI
pip install & go
Start in 5 minutes.
Not 5 months.
Four commands. That is it. Your data goes from source to queryable lake — for humans and AI agents — in the time it takes to make coffee.
$ pip install dataspoc-pipe dataspoc-lens
$ dataspoc-pipe add my-postgres
$ dataspoc-pipe run my-postgres
$ dataspoc-lens ask "top customers by revenue"