DataSpoc ML
AutoML on your data lake. Train, predict, and explain machine learning models directly on Parquet data in your cloud bucket. Automated feature engineering, model selection, and drift monitoring.
Your ML Agent is ready.
DataSpoc ML ships with AGENT.md — a skill file that lets AI agents train models, generate predictions, and explain results on your data lake. Ask in plain English: "Can we predict which customers will churn?" and the agent handles feature engineering, training, and evaluation.
Predictions land as Parquet in your bucket — queryable via Lens. No notebook handoffs. No model deployment pipelines. From question to production prediction in one conversation.
Claude via MCP → ML
You: "Train a model to predict which customers will churn" [MCP] list_tables() [MCP] ml train --target churn --from customers [MCP] ml explain --model churn Agent: "Model trained. AUC: 0.87. Top predictors: days_since_order, support_tickets, contract_type. Predictions saved to ml/predictions/ Query with: SELECT * FROM churn"
Three commands to production ML
From raw data to deployed predictions without leaving your terminal.
Train
Automated feature engineering, model selection, and hyperparameter tuning. Reads Parquet from your bucket, writes model artifacts back.
$ dataspoc-lens ml train \
--table curated.finance.customers \
--target churn \
--output s3://my-lake/ml/models/churn Predict
Run predictions on new data using trained models. Reads the model from your bucket, writes predictions as Parquet.
$ dataspoc-lens ml predict \
--model s3://my-lake/ml/models/churn \
--input curated.finance.new_customers \
--output s3://my-lake/ml/predictions/churn Explain
Understand why the model makes each prediction. Feature importance, SHAP values, and drift detection out of the box.
$ dataspoc-lens ml explain \
--model s3://my-lake/ml/models/churn \
--format html How ML connects to your data lake
ML reads curated data, trains models, and writes predictions back to the bucket. Lens can query everything.
bucket/
curated/finance/customers/ ← ML reads training data
*.parquet
ml/models/churn/ ← ML writes model artifacts
model.pkl
features.json
metrics.json
ml/predictions/churn/ ← ML writes predictions
*.parquet ← Lens can query these
──────────────────────────────────────────────
[Pipe] ──→ raw/ ──→ curated/ ──→ [ML Train]
│
▼
ml/models/churn/
│
[ML Predict]
│
▼
ml/predictions/churn/
│
▼
[Lens] ──→ SQL / Notebooks / AI Query predictions with Lens
Once ML writes predictions to your bucket, Lens discovers them automatically. Query with SQL, notebooks, or your AI agent.
SQL query on predictions
dataspoc-lens> SELECT customer_id, name, churn_probability
FROM ml.predictions.churn
WHERE churn_probability > 0.8
ORDER BY churn_probability DESC
LIMIT 10; AI Ask on predictions
$ dataspoc-lens ask "Which enterprise customers have the highest churn risk?"
Querying ml.predictions.churn joined with raw.postgres.customers...
Found 5 enterprise customers with churn probability above 80%. Model metrics
$ dataspoc-lens ml metrics --model s3://my-lake/ml/models/churn
Model: churn_classifier_v2
Accuracy: 0.94
Precision: 0.91
Recall: 0.87
F1: 0.89
AUC: 0.96
Trained: 2024-12-15T10:30:00Z
Samples: 45,230 Bring ML to your data lake
DataSpoc ML is a commercial product. Contact us for pricing, demos, and pilot programs.