Commands Reference
Complete reference for every dataspoc-pipe CLI command.
dataspoc-pipe init
Section titled “dataspoc-pipe init”Initialize the configuration directory structure.
dataspoc-pipe initCreates the following at ~/.dataspoc-pipe/:
| Path | Purpose |
|---|---|
config.yaml | Global defaults (compression, partition) |
sources/ | Tap configuration JSON files |
pipelines/ | Pipeline YAML definitions |
transforms/ | Optional Python transform scripts |
If the structure already exists, the command is a no-op.
dataspoc-pipe add
Section titled “dataspoc-pipe add”Create a new pipeline via interactive wizard.
dataspoc-pipe add <name>| Argument | Description |
|---|---|
name | Pipeline name (used as filename and identifier) |
The wizard prompts for:
- Singer tap — the tap command to use (e.g.,
tap-csv,tap-postgres) - Destination bucket — URI like
s3://my-bucket,gs://my-bucket, orfile:///tmp/lake - Base path — subdirectory in the bucket (default:
raw) - Compression —
zstd(default),snappy,gzip, ornone - Incremental extraction — enable Singer bookmark-based incremental
- Cron expression — schedule for automated runs (optional)
Outputs:
~/.dataspoc-pipe/sources/<name>.json— tap configuration (from template or auto-discovery)~/.dataspoc-pipe/pipelines/<name>.yaml— pipeline definition
Example:
dataspoc-pipe add ordersdataspoc-pipe run
Section titled “dataspoc-pipe run”Run an extraction pipeline.
dataspoc-pipe run <name> [--full] [--all]| Argument/Option | Description |
|---|---|
name | Pipeline name |
--full | Force full extraction, ignoring incremental state |
--all | Run all configured pipelines sequentially |
When --all is used, the name argument is ignored and all pipelines in ~/.dataspoc-pipe/pipelines/ are executed. A summary is printed at the end.
Examples:
# Run a single pipelinedataspoc-pipe run orders
# Force full re-extractiondataspoc-pipe run orders --full
# Run all pipelinesdataspoc-pipe run _ --alldataspoc-pipe status
Section titled “dataspoc-pipe status”Show the status of all configured pipelines.
dataspoc-pipe status [--output table|json]| Option | Description |
|---|---|
--output | Output format: table (default) or json |
Displays a table with columns: Pipeline, Last Run, Status, Duration, Records. Status is read from the latest execution log in each pipeline’s bucket.
Examples:
# Table output (default)dataspoc-pipe status
# Machine-readable JSONdataspoc-pipe status --output jsondataspoc-pipe logs
Section titled “dataspoc-pipe logs”Show logs from the last pipeline execution.
dataspoc-pipe logs <name> [--output table|json]| Argument/Option | Description |
|---|---|
name | Pipeline name |
--output | Output format: table (default) or json |
Reads the most recent log file from <bucket>/.dataspoc/logs/<name>/.
Example:
dataspoc-pipe logs ordersdataspoc-pipe logs orders --output jsondataspoc-pipe validate
Section titled “dataspoc-pipe validate”Test connections to sources and buckets.
dataspoc-pipe validate [<name>] [--output table|json]| Argument/Option | Description |
|---|---|
name | Pipeline name (omit to validate all) |
--output | Output format: table (default) or json |
Checks:
- Bucket — writes a test file, verifies it exists, deletes it
- Tap — checks if the tap command is available in
PATH
Examples:
# Validate one pipelinedataspoc-pipe validate orders
# Validate all pipelinesdataspoc-pipe validate
# JSON output for scriptingdataspoc-pipe validate --output jsondataspoc-pipe manifest
Section titled “dataspoc-pipe manifest”Show the manifest (catalog) of a bucket.
dataspoc-pipe manifest <bucket> [--output table|json]| Argument/Option | Description |
|---|---|
bucket | Bucket URI (e.g., s3://my-bucket, file:///tmp/lake) |
--output | Output format: table (default) or json |
The manifest is the JSON catalog at <bucket>/.dataspoc/manifest.json that tracks all tables, schemas, timestamps, and row counts.
Example:
dataspoc-pipe manifest s3://my-datalakedataspoc-pipe manifest file:///tmp/lake --output jsondataspoc-pipe schedule install
Section titled “dataspoc-pipe schedule install”Install cron schedules for all pipelines that have schedule.cron configured.
dataspoc-pipe schedule installFor each pipeline with a cron expression, it creates a crontab entry using flock to prevent overlapping runs. Previous entries for the same pipeline are replaced.
Example crontab entry created:
# dataspoc-pipe:orders0 */2 * * * flock -n /tmp/dataspoc-pipe-orders.lock /usr/local/bin/dataspoc-pipe run ordersdataspoc-pipe schedule remove
Section titled “dataspoc-pipe schedule remove”Remove all dataspoc-pipe schedules from the user’s crontab.
dataspoc-pipe schedule removeRemoves any crontab entry whose comment starts with dataspoc-pipe:.
dataspoc-pipe mcp
Section titled “dataspoc-pipe mcp”Start the MCP (Model Context Protocol) server for AI agent integration.
dataspoc-pipe mcpRequires the mcp extra: pip install dataspoc-pipe[mcp]. See the MCP Server page for configuration details.
dataspoc-pipe --version
Section titled “dataspoc-pipe --version”Show the installed version.
dataspoc-pipe --versiondataspoc-pipe 0.2.0Also available as -v:
dataspoc-pipe -v