Commercial

DataSpoc ML

AutoML on your data lake. Train, predict, and explain machine learning models directly on Parquet data in your cloud bucket. Automated feature engineering, model selection, and drift monitoring.

🧠

Your ML Agent is ready.

DataSpoc ML ships with AGENT.md — a skill file that lets AI agents train models, generate predictions, and explain results on your data lake. Ask in plain English: "Can we predict which customers will churn?" and the agent handles feature engineering, training, and evaluation.

Predictions land as Parquet in your bucket — queryable via Lens. No notebook handoffs. No model deployment pipelines. From question to production prediction in one conversation.

Trains models Generates predictions Explains results Monitors drift

Claude via MCP → ML

You: "Train a model to predict
      which customers will churn"

[MCP] list_tables()
[MCP] ml train --target churn
      --from customers
[MCP] ml explain --model churn

Agent: "Model trained. AUC: 0.87.
Top predictors: days_since_order,
support_tickets, contract_type.
Predictions saved to ml/predictions/
Query with: SELECT * FROM churn"

Three commands to production ML

From raw data to deployed predictions without leaving your terminal.

Train

Automated feature engineering, model selection, and hyperparameter tuning. Reads Parquet from your bucket, writes model artifacts back.

$ dataspoc-lens ml train \
    --table curated.finance.customers \
    --target churn \
    --output s3://my-lake/ml/models/churn

Predict

Run predictions on new data using trained models. Reads the model from your bucket, writes predictions as Parquet.

$ dataspoc-lens ml predict \
    --model s3://my-lake/ml/models/churn \
    --input curated.finance.new_customers \
    --output s3://my-lake/ml/predictions/churn

Explain

Understand why the model makes each prediction. Feature importance, SHAP values, and drift detection out of the box.

$ dataspoc-lens ml explain \
    --model s3://my-lake/ml/models/churn \
    --format html

How ML connects to your data lake

ML reads curated data, trains models, and writes predictions back to the bucket. Lens can query everything.

bucket/
  curated/finance/customers/       ← ML reads training data
    *.parquet

  ml/models/churn/                 ← ML writes model artifacts
    model.pkl
    features.json
    metrics.json

  ml/predictions/churn/            ← ML writes predictions
    *.parquet                      ← Lens can query these

──────────────────────────────────────────────

[Pipe] ──→ raw/ ──→ curated/ ──→ [ML Train]
                                      │
                                      ▼
                               ml/models/churn/
                                      │
                               [ML Predict]
                                      │
                                      ▼
                            ml/predictions/churn/
                                      │
                                      ▼
                                   [Lens] ──→ SQL / Notebooks / AI

Query predictions with Lens

Once ML writes predictions to your bucket, Lens discovers them automatically. Query with SQL, notebooks, or your AI agent.

SQL query on predictions

dataspoc-lens> SELECT customer_id, name, churn_probability
  FROM ml.predictions.churn
  WHERE churn_probability > 0.8
  ORDER BY churn_probability DESC
  LIMIT 10;

AI Ask on predictions

$ dataspoc-lens ask "Which enterprise customers have the highest churn risk?"

Querying ml.predictions.churn joined with raw.postgres.customers...

Found 5 enterprise customers with churn probability above 80%.

Model metrics

$ dataspoc-lens ml metrics --model s3://my-lake/ml/models/churn

Model: churn_classifier_v2
Accuracy:  0.94
Precision: 0.91
Recall:    0.87
F1:        0.89
AUC:       0.96
Trained:   2024-12-15T10:30:00Z
Samples:   45,230

Bring ML to your data lake

DataSpoc ML is a commercial product. Contact us for pricing, demos, and pilot programs.