Skip to content

Commands Reference

Complete reference for every dataspoc-pipe CLI command.

Initialize the configuration directory structure.

Terminal window
dataspoc-pipe init

Creates the following at ~/.dataspoc-pipe/:

PathPurpose
config.yamlGlobal defaults (compression, partition)
sources/Tap configuration JSON files
pipelines/Pipeline YAML definitions
transforms/Optional Python transform scripts

If the structure already exists, the command is a no-op.


Create a new pipeline via interactive wizard.

Terminal window
dataspoc-pipe add <name>
ArgumentDescription
namePipeline name (used as filename and identifier)

The wizard prompts for:

  1. Singer tap — the tap command to use (e.g., tap-csv, tap-postgres)
  2. Destination bucket — URI like s3://my-bucket, gs://my-bucket, or file:///tmp/lake
  3. Base path — subdirectory in the bucket (default: raw)
  4. Compressionzstd (default), snappy, gzip, or none
  5. Incremental extraction — enable Singer bookmark-based incremental
  6. Cron expression — schedule for automated runs (optional)

Outputs:

  • ~/.dataspoc-pipe/sources/<name>.json — tap configuration (from template or auto-discovery)
  • ~/.dataspoc-pipe/pipelines/<name>.yaml — pipeline definition

Example:

Terminal window
dataspoc-pipe add orders

Run an extraction pipeline.

Terminal window
dataspoc-pipe run <name> [--full] [--all]
Argument/OptionDescription
namePipeline name
--fullForce full extraction, ignoring incremental state
--allRun all configured pipelines sequentially

When --all is used, the name argument is ignored and all pipelines in ~/.dataspoc-pipe/pipelines/ are executed. A summary is printed at the end.

Examples:

Terminal window
# Run a single pipeline
dataspoc-pipe run orders
# Force full re-extraction
dataspoc-pipe run orders --full
# Run all pipelines
dataspoc-pipe run _ --all

Show the status of all configured pipelines.

Terminal window
dataspoc-pipe status [--output table|json]
OptionDescription
--outputOutput format: table (default) or json

Displays a table with columns: Pipeline, Last Run, Status, Duration, Records. Status is read from the latest execution log in each pipeline’s bucket.

Examples:

Terminal window
# Table output (default)
dataspoc-pipe status
# Machine-readable JSON
dataspoc-pipe status --output json

Show logs from the last pipeline execution.

Terminal window
dataspoc-pipe logs <name> [--output table|json]
Argument/OptionDescription
namePipeline name
--outputOutput format: table (default) or json

Reads the most recent log file from <bucket>/.dataspoc/logs/<name>/.

Example:

Terminal window
dataspoc-pipe logs orders
dataspoc-pipe logs orders --output json

Test connections to sources and buckets.

Terminal window
dataspoc-pipe validate [<name>] [--output table|json]
Argument/OptionDescription
namePipeline name (omit to validate all)
--outputOutput format: table (default) or json

Checks:

  1. Bucket — writes a test file, verifies it exists, deletes it
  2. Tap — checks if the tap command is available in PATH

Examples:

Terminal window
# Validate one pipeline
dataspoc-pipe validate orders
# Validate all pipelines
dataspoc-pipe validate
# JSON output for scripting
dataspoc-pipe validate --output json

Show the manifest (catalog) of a bucket.

Terminal window
dataspoc-pipe manifest <bucket> [--output table|json]
Argument/OptionDescription
bucketBucket URI (e.g., s3://my-bucket, file:///tmp/lake)
--outputOutput format: table (default) or json

The manifest is the JSON catalog at <bucket>/.dataspoc/manifest.json that tracks all tables, schemas, timestamps, and row counts.

Example:

Terminal window
dataspoc-pipe manifest s3://my-datalake
dataspoc-pipe manifest file:///tmp/lake --output json

Install cron schedules for all pipelines that have schedule.cron configured.

Terminal window
dataspoc-pipe schedule install

For each pipeline with a cron expression, it creates a crontab entry using flock to prevent overlapping runs. Previous entries for the same pipeline are replaced.

Example crontab entry created:

# dataspoc-pipe:orders
0 */2 * * * flock -n /tmp/dataspoc-pipe-orders.lock /usr/local/bin/dataspoc-pipe run orders

Remove all dataspoc-pipe schedules from the user’s crontab.

Terminal window
dataspoc-pipe schedule remove

Removes any crontab entry whose comment starts with dataspoc-pipe:.


Start the MCP (Model Context Protocol) server for AI agent integration.

Terminal window
dataspoc-pipe mcp

Requires the mcp extra: pip install dataspoc-pipe[mcp]. See the MCP Server page for configuration details.


Show the installed version.

Terminal window
dataspoc-pipe --version
dataspoc-pipe 0.2.0

Also available as -v:

Terminal window
dataspoc-pipe -v