cursormcpdata-analysisai-agentside

Analyze Your Data Lake from Cursor IDE with DataSpoc MCP

Michael San Martim · 2026-04-23

Cursor is an AI-powered code editor. DataSpoc Lens exposes a data lake as an MCP server. Connect them and Cursor becomes a data analysis environment — explore schemas, write SQL, generate charts, and build reports without leaving your editor.

Setup

1. Install DataSpoc Lens

pip install dataspoc-lens[mcp]

2. Configure Cursor’s MCP Settings

Open Cursor’s settings and navigate to the MCP configuration. Add the DataSpoc Lens server.

Create or edit .cursor/mcp.json in your project root:

{
  "mcpServers": {
    "dataspoc-lens": {
      "command": "dataspoc-lens",
      "args": ["mcp"],
      "env": {
        "DATASPOC_BUCKET": "s3://my-company-data"
      }
    }
  }
}

For GCS:

{
  "mcpServers": {
    "dataspoc-lens": {
      "command": "dataspoc-lens",
      "args": ["mcp"],
      "env": {
        "DATASPOC_BUCKET": "gs://my-company-data"
      }
    }
  }
}

3. Restart Cursor

After saving the config, restart Cursor. The MCP server will start automatically, and Cursor’s AI will discover the available tools.

What Cursor Can Do Now

Discover Tables

In Cursor’s chat, type:

What tables are available in my data lake?

Cursor calls lens.tables() via MCP:

Your data lake contains 8 tables:

raw_postgres_orders        - 245,000 rows
raw_postgres_customers     - 18,200 rows
raw_stripe_payments        - 892,000 rows
raw_hubspot_contacts       - 34,500 rows
curated_sales              - 245,000 rows
curated_customer_360       - 18,200 rows
gold_revenue_by_region     - 48 rows
gold_monthly_metrics       - 36 rows

Explore Schemas

What columns does curated_sales have? Show me sample data.

Cursor calls lens.schema() and lens.query():

curated_sales schema:
  order_id        VARCHAR
  customer_id     VARCHAR
  product         VARCHAR
  category        VARCHAR
  amount          DOUBLE
  quantity        INTEGER
  region          VARCHAR
  sale_date       DATE
  payment_method  VARCHAR

Sample data (5 rows):
| order_id | customer_id | product        | amount  | region         | sale_date  |
|----------|-------------|----------------|---------|----------------|------------|
| ORD-001  | CUST-042    | Widget Pro     | 299.99  | North America  | 2026-03-15 |
| ORD-002  | CUST-108    | Widget Basic   | 49.99   | Europe         | 2026-03-15 |
| ORD-003  | CUST-042    | Gadget Plus    | 599.00  | North America  | 2026-03-16 |

Write Analytical Queries

Show me revenue trends by month for the last 6 months, broken down by region.

Cursor generates and executes:

SELECT
    DATE_TRUNC('month', sale_date) AS month,
    region,
    SUM(amount) AS revenue,
    COUNT(*) AS orders,
    ROUND(AVG(amount), 2) AS avg_order_value
FROM curated_sales
WHERE sale_date >= '2025-10-01'
GROUP BY month, region
ORDER BY month, revenue DESC

And displays the results right in the chat.

Generate Analysis Code

Ask Cursor to create a Python analysis script based on the data it just discovered:

Write a Python script that analyzes customer cohorts by signup month
and tracks their monthly spending over time. Use dataspoc_lens to query.

Cursor writes directly in your editor:

"""Customer cohort analysis using DataSpoc Lens."""
import pandas as pd
import matplotlib.pyplot as plt
from dataspoc_lens import LensClient

lens = LensClient()

# Get customer cohorts (signup month) with their spending over time
df = lens.query("""
    WITH cohorts AS (
        SELECT
            customer_id,
            DATE_TRUNC('month', MIN(sale_date)) AS cohort_month
        FROM curated_sales
        GROUP BY customer_id
    ),
    monthly_spend AS (
        SELECT
            s.customer_id,
            DATE_TRUNC('month', s.sale_date) AS spend_month,
            SUM(s.amount) AS monthly_total
        FROM curated_sales s
        GROUP BY s.customer_id, spend_month
    )
    SELECT
        c.cohort_month,
        m.spend_month,
        DATEDIFF('month', c.cohort_month, m.spend_month) AS months_since_signup,
        COUNT(DISTINCT m.customer_id) AS active_customers,
        SUM(m.monthly_total) AS total_revenue,
        AVG(m.monthly_total) AS avg_spend_per_customer
    FROM cohorts c
    JOIN monthly_spend m ON c.customer_id = m.customer_id
    WHERE c.cohort_month >= '2025-07-01'
    GROUP BY c.cohort_month, m.spend_month, months_since_signup
    ORDER BY c.cohort_month, months_since_signup
""")

# Pivot for cohort retention chart
pivot = df.pivot_table(
    index="cohort_month",
    columns="months_since_signup",
    values="active_customers",
    aggfunc="sum"
)

# Calculate retention rates
retention = pivot.div(pivot[0], axis=0) * 100

# Plot
fig, ax = plt.subplots(figsize=(12, 8))
import seaborn as sns
sns.heatmap(
    retention,
    annot=True,
    fmt=".0f",
    cmap="YlOrRd_r",
    ax=ax,
    vmin=0,
    vmax=100,
)
ax.set_title("Customer Cohort Retention (%)")
ax.set_xlabel("Months Since Signup")
ax.set_ylabel("Cohort Month")
plt.tight_layout()
plt.savefig("cohort_retention.png", dpi=150)
plt.show()
print("Chart saved to cohort_retention.png")

You can run this directly from Cursor’s terminal.

Build Charts Interactively

Create a bar chart showing the top 10 products by revenue this quarter.

Cursor queries the data and generates the visualization code:

from dataspoc_lens import LensClient
import matplotlib.pyplot as plt

lens = LensClient()

df = lens.query("""
    SELECT product, SUM(amount) AS revenue
    FROM curated_sales
    WHERE sale_date >= '2026-01-01'
    GROUP BY product
    ORDER BY revenue DESC
    LIMIT 10
""")

fig, ax = plt.subplots(figsize=(10, 6))
ax.barh(df["product"][::-1], df["revenue"][::-1])
ax.set_xlabel("Revenue ($)")
ax.set_title("Top 10 Products by Revenue — Q1 2026")

for i, (val, name) in enumerate(zip(df["revenue"][::-1], df["product"][::-1])):
    ax.text(val + 1000, i, f"${val:,.0f}", va="center", fontsize=9)

plt.tight_layout()
plt.savefig("top_products.png", dpi=150)
plt.show()

Generate Markdown Reports

Write a weekly metrics report for the last 7 days. Include revenue,
order count, top products, and regional breakdown. Format as markdown.

Cursor queries the lake multiple times and assembles a report:

# Weekly Metrics Report: April 8-14, 2026

## Summary
- **Total Revenue**: $987,000 (+8.3% vs prior week)
- **Orders**: 3,420 (+5.1% vs prior week)
- **Average Order Value**: $288.60 (+3.0% vs prior week)

## Daily Revenue
| Date | Revenue | Orders |
|------|---------|--------|
| Apr 8  | $142,000 | 490 |
| Apr 9  | $138,000 | 478 |
| Apr 10 | $145,000 | 502 |
| Apr 11 | $151,000 | 523 |
| Apr 12 | $133,000 | 461 |
| Apr 13 | $128,000 | 443 |
| Apr 14 | $150,000 | 523 |

## Top 5 Products
1. Widget Pro — $124,000 (12.6%)
2. Gadget Plus — $98,000 (9.9%)
...

The Workflow

The typical Cursor + DataSpoc workflow:

Open Cursor in your project directory
Ask about data — Cursor discovers tables and schemas via MCP
Iterate on queries — refine SQL based on results, all in chat
Generate code — ask Cursor to write Python scripts that use LensClient
Run and refine — execute in Cursor’s terminal, review output, iterate
Save artifacts — charts, reports, and analysis scripts stay in your project

Comparison: Traditional BI vs Jupyter vs CLI vs Cursor

Task	BI Tool (Metabase)	Jupyter	CLI (`dataspoc-lens`)	Cursor + MCP
Explore tables	Click through UI	`lens.tables()`	`dataspoc-lens tables`	”What tables exist?”
Write SQL	SQL editor	Cell with `lens.query()`	Pipe to command	”Show me revenue by month”
Generate charts	Drag-and-drop	Matplotlib code	Export + separate tool	”Create a bar chart”
Write reports	Dashboard	Markdown cells	Manual	”Write a weekly report”
Iterate	Refresh dashboard	Re-run cells	Re-run command	Continue conversation
Save work	Dashboard link	.ipynb file	Shell history	.py files in project
Learn curve	Medium	Medium	Low	Low

Cursor with MCP combines the natural language interface of a BI tool with the code generation of Jupyter and the speed of a CLI. You describe what you want, Cursor writes the code, and you run it.

Tips for Effective Data Analysis in Cursor

Start with exploration:

What tables are available? Show me the schema of the largest table.

Be specific about time ranges:

Revenue by product category for March 2026 only, excluding returns.

Ask for comparisons:

Compare this month vs last month for all key metrics.
Show percentage changes.

Request reusable code:

Write a function I can reuse to generate monthly reports.
Take the month as a parameter.

Chain analyses:

First show me which customers churned last month.
Then analyze what they had in common (plan, region, usage).
Then suggest retention strategies based on the patterns.

Security

Cursor’s MCP integration inherits all of DataSpoc’s security properties:

Read-only: The MCP server rejects write SQL
Cloud IAM: Uses your existing credentials (AWS SSO, gcloud, Azure CLI)
Scoped: Only accesses the bucket you configure
Auditable: Every query is SQL that gets logged

Your data stays in your cloud bucket. Cursor’s AI generates SQL, Lens executes it locally via DuckDB, and results stay on your machine. No data leaves your environment unless you explicitly export it.

← Back to blog