Commands Reference

Complete reference for every dataspoc-pipe CLI command.

`dataspoc-pipe init`

Initialize the configuration directory structure.

dataspoc-pipe init

Creates the following at ~/.dataspoc-pipe/:

Path	Purpose
`config.yaml`	Global defaults (compression, partition)
`sources/`	Tap configuration JSON files
`pipelines/`	Pipeline YAML definitions
`transforms/`	Optional Python transform scripts

If the structure already exists, the command is a no-op.

`dataspoc-pipe add`

Create a new pipeline via interactive wizard.

dataspoc-pipe add <name>

Argument	Description
`name`	Pipeline name (used as filename and identifier)

The wizard prompts for:

Singer tap — the tap command to use (e.g., tap-csv, tap-postgres)
Destination bucket — URI like s3://my-bucket, gs://my-bucket, or file:///tmp/lake
Base path — subdirectory in the bucket (default: raw)
Compression — zstd (default), snappy, gzip, or none
Incremental extraction — enable Singer bookmark-based incremental
Cron expression — schedule for automated runs (optional)

Outputs:

~/.dataspoc-pipe/sources/<name>.json — tap configuration (from template or auto-discovery)
~/.dataspoc-pipe/pipelines/<name>.yaml — pipeline definition

Example:

dataspoc-pipe add orders

`dataspoc-pipe run`

Run an extraction pipeline.

dataspoc-pipe run <name> [--full] [--all]

Argument/Option	Description
`name`	Pipeline name
`--full`	Force full extraction, ignoring incremental state
`--all`	Run all configured pipelines sequentially

When --all is used, the name argument is ignored and all pipelines in ~/.dataspoc-pipe/pipelines/ are executed. A summary is printed at the end.

Examples:

# Run a single pipeline
dataspoc-pipe run orders

# Force full re-extraction
dataspoc-pipe run orders --full

# Run all pipelines
dataspoc-pipe run _ --all

`dataspoc-pipe status`

Show the status of all configured pipelines.

dataspoc-pipe status [--output table|json]

Option	Description
`--output`	Output format: `table` (default) or `json`

Displays a table with columns: Pipeline, Last Run, Status, Duration, Records. Status is read from the latest execution log in each pipeline’s bucket.

Examples:

# Table output (default)
dataspoc-pipe status

# Machine-readable JSON
dataspoc-pipe status --output json

`dataspoc-pipe logs`

Show logs from the last pipeline execution.

dataspoc-pipe logs <name> [--output table|json]

Argument/Option	Description
`name`	Pipeline name
`--output`	Output format: `table` (default) or `json`

Reads the most recent log file from <bucket>/.dataspoc/logs/<name>/.

Example:

dataspoc-pipe logs orders
dataspoc-pipe logs orders --output json

`dataspoc-pipe validate`

Test connections to sources and buckets.

dataspoc-pipe validate [<name>] [--output table|json]

Argument/Option	Description
`name`	Pipeline name (omit to validate all)
`--output`	Output format: `table` (default) or `json`

Checks:

Bucket — writes a test file, verifies it exists, deletes it
Tap — checks if the tap command is available in PATH

Examples:

# Validate one pipeline
dataspoc-pipe validate orders

# Validate all pipelines
dataspoc-pipe validate

# JSON output for scripting
dataspoc-pipe validate --output json

`dataspoc-pipe manifest`

Show the manifest (catalog) of a bucket.

dataspoc-pipe manifest <bucket> [--output table|json]

Argument/Option	Description
`bucket`	Bucket URI (e.g., `s3://my-bucket`, `file:///tmp/lake`)
`--output`	Output format: `table` (default) or `json`

The manifest is the JSON catalog at <bucket>/.dataspoc/manifest.json that tracks all tables, schemas, timestamps, and row counts.

Example:

dataspoc-pipe manifest s3://my-datalake
dataspoc-pipe manifest file:///tmp/lake --output json

`dataspoc-pipe schedule install`

Install cron schedules for all pipelines that have schedule.cron configured.

dataspoc-pipe schedule install

For each pipeline with a cron expression, it creates a crontab entry using flock to prevent overlapping runs. Previous entries for the same pipeline are replaced.

Example crontab entry created:

# dataspoc-pipe:orders
0 */2 * * * flock -n /tmp/dataspoc-pipe-orders.lock /usr/local/bin/dataspoc-pipe run orders

`dataspoc-pipe schedule remove`

Remove all dataspoc-pipe schedules from the user’s crontab.

dataspoc-pipe schedule remove

Removes any crontab entry whose comment starts with dataspoc-pipe:.

`dataspoc-pipe mcp`

Start the MCP (Model Context Protocol) server for AI agent integration.

dataspoc-pipe mcp

Requires the mcp extra: pip install dataspoc-pipe[mcp]. See the MCP Server page for configuration details.

`dataspoc-pipe --version`

Show the installed version.

dataspoc-pipe --version

dataspoc-pipe 0.2.0

Also available as -v:

dataspoc-pipe -v