Generating Predictions

Score new data against a trained model to generate predictions.

Usage

dataspoc-lens ml predict --model <model> --from <table>

Flag	Description
`--model`	Name of a previously trained model
`--from`	The source table containing new data to score

What happens

Loads the model — reads model.pkl and features.json from bucket/ml/models/<model>/.
Reads new data — loads the source table from your bucket.
Applies feature engineering — transforms the input data using the same pipeline used during training.
Generates predictions — scores every row and produces prediction columns.
Saves to bucket — writes Parquet files to bucket/ml/predictions/<model>/.

Output

Predictions are saved as Parquet files at:

bucket/
  ml/
    predictions/
      <model>/
        predictions_20260415_120000.parquet

Each prediction file includes the original key columns plus the prediction output and confidence scores.

Predictions in Lens

Once predictions are written to the bucket, they become queryable as SQL tables in Lens:

SELECT customer_id, prediction, confidence
FROM ml_predictions.churned_activity
WHERE confidence > 0.8
ORDER BY confidence DESC

No additional configuration is needed — Lens discovers prediction Parquet files automatically.

Example

Score new customer data against a trained churn model:

dataspoc-lens ml predict --model churned_activity --from curated/customers/activity

Output:

[ML] Loading model churned_activity...
[ML] Loading table curated/customers/activity...
[ML] 12,045 rows to score
[ML] Generating predictions...
[ML] 3,218 predicted to churn (26.7%)
[ML] Saved to ml/predictions/churned_activity/
[ML] Done. Query with: SELECT * FROM ml_predictions.churned_activity