Skip to content

MCP Server

Pipe includes an MCP (Model Context Protocol) server that lets AI agents manage and run data pipelines. Any MCP-compatible client — such as Claude Desktop — can list, run, and monitor pipelines through natural language.

Terminal window
pip install dataspoc-pipe[mcp]
Terminal window
dataspoc-pipe mcp

The server runs on stdio, following the MCP transport protocol. It is designed to be launched by an MCP client, not run manually in a terminal.

Add the following to your Claude Desktop MCP configuration file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json

Linux: ~/.config/Claude/claude_desktop_config.json

{
"mcpServers": {
"dataspoc-pipe": {
"command": "dataspoc-pipe",
"args": ["mcp"]
}
}
}

If dataspoc-pipe is installed in a virtual environment, use the full path:

{
"mcpServers": {
"dataspoc-pipe": {
"command": "/home/you/.venv/bin/dataspoc-pipe",
"args": ["mcp"]
}
}
}

Restart Claude Desktop after updating the configuration.

The MCP server exposes the following tools:

List all configured pipeline names.

Returns: One pipeline name per line, or “No pipelines configured.”

Return the full configuration of a pipeline as JSON.

Parameters:

  • name (string, required): Pipeline name

Returns: JSON with source, destination, incremental, and schedule configuration.

Run an extraction pipeline.

Parameters:

  • name (string, required): Pipeline name
  • full (boolean, optional): Force full extraction, ignoring incremental state. Default: false

Returns: JSON with success, streams (record counts per stream), and error.

Return status for all configured pipelines.

Returns: JSON array, each entry with name, last_run, status, duration, records.

Return the latest execution log for a pipeline.

Parameters:

  • name (string, required): Pipeline name

Returns: JSON with full execution log, or “No logs found”.

Return the manifest (catalog) of a bucket.

Parameters:

  • bucket (string, required): Bucket URI (e.g., s3://my-bucket, file:///tmp/lake)

Returns: JSON with table catalog including schemas, timestamps, and row counts.

Validate bucket connectivity and tap availability for a pipeline.

Parameters:

  • name (string, required): Pipeline name

Returns: JSON with pipeline, bucket_ok, tap_ok, and errors.

An MCP resource that lists all pipeline names with their tap and bucket:

[
{"name": "orders", "tap": "tap-csv", "bucket": "s3://my-lake"},
{"name": "customers", "tap": "tap-postgres", "bucket": "s3://my-lake"}
]

Once configured, you can interact with Pipe through Claude using natural language:

“What pipelines are configured?”

The agent calls list_pipelines and returns the list.

“Run the orders pipeline”

The agent calls run_pipeline(name="orders") and reports: “The orders pipeline completed successfully. 5,200 records were extracted across 1 stream.”

“Show me the status of all pipelines”

The agent calls pipeline_status and presents a formatted summary of each pipeline’s last run, status, duration, and record count.

“Do a full re-extraction of customers”

The agent calls run_pipeline(name="customers", full=True) to ignore incremental state and re-extract everything.

“Is the orders pipeline healthy? Check the bucket and tap.”

The agent calls validate_pipeline(name="orders") and reports whether the bucket is writable and the tap is available.

“What tables are in the s3://my-lake bucket?”

The agent calls show_manifest(bucket="s3://my-lake") and lists the available tables with their schemas and record counts.