MCP Server

Pipe includes an MCP (Model Context Protocol) server that lets AI agents manage and run data pipelines. Any MCP-compatible client — such as Claude Desktop — can list, run, and monitor pipelines through natural language.

Install

pip install dataspoc-pipe[mcp]

Start the server

dataspoc-pipe mcp

The server runs on stdio, following the MCP transport protocol. It is designed to be launched by an MCP client, not run manually in a terminal.

Claude Desktop configuration

Add the following to your Claude Desktop MCP configuration file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json

Linux: ~/.config/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "dataspoc-pipe": {
      "command": "dataspoc-pipe",
      "args": ["mcp"]
    }
  }
}

If dataspoc-pipe is installed in a virtual environment, use the full path:

{
  "mcpServers": {
    "dataspoc-pipe": {
      "command": "/home/you/.venv/bin/dataspoc-pipe",
      "args": ["mcp"]
    }
  }
}

Restart Claude Desktop after updating the configuration.

Available tools

The MCP server exposes the following tools:

`list_pipelines`

List all configured pipeline names.

Returns: One pipeline name per line, or “No pipelines configured.”

`pipeline_config`

Return the full configuration of a pipeline as JSON.

Parameters:

name (string, required): Pipeline name

Returns: JSON with source, destination, incremental, and schedule configuration.

`run_pipeline`

Run an extraction pipeline.

Parameters:

name (string, required): Pipeline name
full (boolean, optional): Force full extraction, ignoring incremental state. Default: false

Returns: JSON with success, streams (record counts per stream), and error.

`pipeline_status`

Return status for all configured pipelines.

Returns: JSON array, each entry with name, last_run, status, duration, records.

`pipeline_logs`

Return the latest execution log for a pipeline.

Parameters:

name (string, required): Pipeline name

Returns: JSON with full execution log, or “No logs found”.

`show_manifest`

Return the manifest (catalog) of a bucket.

Parameters:

bucket (string, required): Bucket URI (e.g., s3://my-bucket, file:///tmp/lake)

Returns: JSON with table catalog including schemas, timestamps, and row counts.

`validate_pipeline`

Validate bucket connectivity and tap availability for a pipeline.

Parameters:

name (string, required): Pipeline name

Returns: JSON with pipeline, bucket_ok, tap_ok, and errors.

Resources

`pipe://pipelines`

An MCP resource that lists all pipeline names with their tap and bucket:

[
  {"name": "orders", "tap": "tap-csv", "bucket": "s3://my-lake"},
  {"name": "customers", "tap": "tap-postgres", "bucket": "s3://my-lake"}
]

Example agent interactions

Once configured, you can interact with Pipe through Claude using natural language:

“What pipelines are configured?”

The agent calls list_pipelines and returns the list.

“Run the orders pipeline”

The agent calls run_pipeline(name="orders") and reports: “The orders pipeline completed successfully. 5,200 records were extracted across 1 stream.”

“Show me the status of all pipelines”

The agent calls pipeline_status and presents a formatted summary of each pipeline’s last run, status, duration, and record count.

“Do a full re-extraction of customers”

The agent calls run_pipeline(name="customers", full=True) to ignore incremental state and re-extract everything.

“Is the orders pipeline healthy? Check the bucket and tap.”

The agent calls validate_pipeline(name="orders") and reports whether the bucket is writable and the tap is available.

“What tables are in the s3://my-lake bucket?”

The agent calls show_manifest(bucket="s3://my-lake") and lists the available tables with their schemas and record counts.