Skip to content

Agent Quickstart

This guide gets you from zero to a working AI agent connected to your data lake in 5 minutes. By the end, you will have Claude Desktop (or any MCP-compatible client) managing your pipelines and querying your data through natural conversation.

Install both DataSpoc products with MCP support:

Terminal window
pip install dataspoc-pipe[mcp] dataspoc-lens[mcp]

This gives you two MCP servers: one for pipeline management (Pipe) and one for data querying (Lens).

Open your Claude Desktop MCP configuration file:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Linux: ~/.config/Claude/claude_desktop_config.json

Paste this configuration with both servers:

{
"mcpServers": {
"dataspoc-pipe": {
"command": "dataspoc-pipe",
"args": ["mcp"],
"env": {
"DATASPOC_BUCKET": "s3://my-data"
}
},
"dataspoc-lens": {
"command": "dataspoc-lens",
"args": ["mcp"],
"env": {
"DATASPOC_BUCKET": "s3://my-data"
}
}
}
}

Replace s3://my-data with your actual bucket URI. For local testing, use file:///tmp/lake.

If dataspoc-pipe and dataspoc-lens are installed in a virtual environment, use the full path:

{
"mcpServers": {
"dataspoc-pipe": {
"command": "/home/you/.venv/bin/dataspoc-pipe",
"args": ["mcp"]
},
"dataspoc-lens": {
"command": "/home/you/.venv/bin/dataspoc-lens",
"args": ["mcp"]
}
}
}

Restart Claude Desktop after saving the file.

Open Claude Desktop and ask:

You: What tables do I have?

Claude calls the Lens list_tables tool and responds:

Claude: You have 3 tables in your data lake:

  • customers (4 columns: id, name, email, created_at)
  • orders (5 columns: order_id, customer_id, order_date, total, status)
  • products (6 columns: product_id, name, category, price, sku, active)

Try a follow-up:

You: How many orders were placed this week?

Claude (runs a SQL query): There were 2,480 orders this week with a total revenue of $225,120.

Ask Claude to manage your pipelines:

You: What pipelines are configured?

Claude calls the Pipe list_pipelines tool:

Claude: You have 3 pipelines: orders, customers, and events.

You: Run the sales pipeline.

Claude (calls run_pipeline): The orders pipeline completed successfully. 5,200 records were extracted.

You: Check if any pipelines failed recently.

Claude (calls pipeline_status): All 3 pipelines are healthy. The last run was today at 02:00.

The real power is using both servers together:

You: Run the orders pipeline to get fresh data, refresh the cache, then tell me today’s revenue.

Claude: I will run the pipeline, refresh the Lens cache, and query the results.

  1. Pipeline orders completed: 5,200 records extracted.
  2. Cache refreshed for orders table.
  3. Today’s revenue: $17,023.50 across 187 orders.

Now that your agent is connected, explore the detailed guides for each product:

  • Pipe Agent Integration — Full MCP tool reference, example conversations, CrewAI and LangGraph examples, and best practices for pipeline automation.
  • Lens Agent Integration — Full MCP tool reference, example conversations, CrewAI, LangGraph, and AutoGen examples, and best practices for data analysis.
  • MCP Server Setup — Detailed setup for Claude Desktop, Cursor, Windsurf, and Claude Code.
  • Python SDK — Build custom agents with the PipeClient and LensClient Python classes.
  • JSON Output — Use --output json with shell scripts and subprocess calls.