Analyze Your Data Lake from Cursor IDE with DataSpoc MCP
Cursor is an AI-powered code editor. DataSpoc Lens exposes a data lake as an MCP server. Connect them and Cursor becomes a data analysis environment — explore schemas, write SQL, generate charts, and build reports without leaving your editor.
Setup
1. Install DataSpoc Lens
pip install dataspoc-lens[mcp]2. Configure Cursor’s MCP Settings
Open Cursor’s settings and navigate to the MCP configuration. Add the DataSpoc Lens server.
Create or edit .cursor/mcp.json in your project root:
{ "mcpServers": { "dataspoc-lens": { "command": "dataspoc-lens", "args": ["mcp"], "env": { "DATASPOC_BUCKET": "s3://my-company-data" } } }}For GCS:
{ "mcpServers": { "dataspoc-lens": { "command": "dataspoc-lens", "args": ["mcp"], "env": { "DATASPOC_BUCKET": "gs://my-company-data" } } }}3. Restart Cursor
After saving the config, restart Cursor. The MCP server will start automatically, and Cursor’s AI will discover the available tools.
What Cursor Can Do Now
Discover Tables
In Cursor’s chat, type:
What tables are available in my data lake?Cursor calls lens.tables() via MCP:
Your data lake contains 8 tables:
raw_postgres_orders - 245,000 rowsraw_postgres_customers - 18,200 rowsraw_stripe_payments - 892,000 rowsraw_hubspot_contacts - 34,500 rowscurated_sales - 245,000 rowscurated_customer_360 - 18,200 rowsgold_revenue_by_region - 48 rowsgold_monthly_metrics - 36 rowsExplore Schemas
What columns does curated_sales have? Show me sample data.Cursor calls lens.schema() and lens.query():
curated_sales schema: order_id VARCHAR customer_id VARCHAR product VARCHAR category VARCHAR amount DOUBLE quantity INTEGER region VARCHAR sale_date DATE payment_method VARCHAR
Sample data (5 rows):| order_id | customer_id | product | amount | region | sale_date ||----------|-------------|----------------|---------|----------------|------------|| ORD-001 | CUST-042 | Widget Pro | 299.99 | North America | 2026-03-15 || ORD-002 | CUST-108 | Widget Basic | 49.99 | Europe | 2026-03-15 || ORD-003 | CUST-042 | Gadget Plus | 599.00 | North America | 2026-03-16 |Write Analytical Queries
Show me revenue trends by month for the last 6 months, broken down by region.Cursor generates and executes:
SELECT DATE_TRUNC('month', sale_date) AS month, region, SUM(amount) AS revenue, COUNT(*) AS orders, ROUND(AVG(amount), 2) AS avg_order_valueFROM curated_salesWHERE sale_date >= '2025-10-01'GROUP BY month, regionORDER BY month, revenue DESCAnd displays the results right in the chat.
Generate Analysis Code
Ask Cursor to create a Python analysis script based on the data it just discovered:
Write a Python script that analyzes customer cohorts by signup monthand tracks their monthly spending over time. Use dataspoc_lens to query.Cursor writes directly in your editor:
"""Customer cohort analysis using DataSpoc Lens."""import pandas as pdimport matplotlib.pyplot as pltfrom dataspoc_lens import LensClient
lens = LensClient()
# Get customer cohorts (signup month) with their spending over timedf = lens.query(""" WITH cohorts AS ( SELECT customer_id, DATE_TRUNC('month', MIN(sale_date)) AS cohort_month FROM curated_sales GROUP BY customer_id ), monthly_spend AS ( SELECT s.customer_id, DATE_TRUNC('month', s.sale_date) AS spend_month, SUM(s.amount) AS monthly_total FROM curated_sales s GROUP BY s.customer_id, spend_month ) SELECT c.cohort_month, m.spend_month, DATEDIFF('month', c.cohort_month, m.spend_month) AS months_since_signup, COUNT(DISTINCT m.customer_id) AS active_customers, SUM(m.monthly_total) AS total_revenue, AVG(m.monthly_total) AS avg_spend_per_customer FROM cohorts c JOIN monthly_spend m ON c.customer_id = m.customer_id WHERE c.cohort_month >= '2025-07-01' GROUP BY c.cohort_month, m.spend_month, months_since_signup ORDER BY c.cohort_month, months_since_signup""")
# Pivot for cohort retention chartpivot = df.pivot_table( index="cohort_month", columns="months_since_signup", values="active_customers", aggfunc="sum")
# Calculate retention ratesretention = pivot.div(pivot[0], axis=0) * 100
# Plotfig, ax = plt.subplots(figsize=(12, 8))import seaborn as snssns.heatmap( retention, annot=True, fmt=".0f", cmap="YlOrRd_r", ax=ax, vmin=0, vmax=100,)ax.set_title("Customer Cohort Retention (%)")ax.set_xlabel("Months Since Signup")ax.set_ylabel("Cohort Month")plt.tight_layout()plt.savefig("cohort_retention.png", dpi=150)plt.show()print("Chart saved to cohort_retention.png")You can run this directly from Cursor’s terminal.
Build Charts Interactively
Create a bar chart showing the top 10 products by revenue this quarter.Cursor queries the data and generates the visualization code:
from dataspoc_lens import LensClientimport matplotlib.pyplot as plt
lens = LensClient()
df = lens.query(""" SELECT product, SUM(amount) AS revenue FROM curated_sales WHERE sale_date >= '2026-01-01' GROUP BY product ORDER BY revenue DESC LIMIT 10""")
fig, ax = plt.subplots(figsize=(10, 6))ax.barh(df["product"][::-1], df["revenue"][::-1])ax.set_xlabel("Revenue ($)")ax.set_title("Top 10 Products by Revenue — Q1 2026")
for i, (val, name) in enumerate(zip(df["revenue"][::-1], df["product"][::-1])): ax.text(val + 1000, i, f"${val:,.0f}", va="center", fontsize=9)
plt.tight_layout()plt.savefig("top_products.png", dpi=150)plt.show()Generate Markdown Reports
Write a weekly metrics report for the last 7 days. Include revenue,order count, top products, and regional breakdown. Format as markdown.Cursor queries the lake multiple times and assembles a report:
# Weekly Metrics Report: April 8-14, 2026
## Summary- **Total Revenue**: $987,000 (+8.3% vs prior week)- **Orders**: 3,420 (+5.1% vs prior week)- **Average Order Value**: $288.60 (+3.0% vs prior week)
## Daily Revenue| Date | Revenue | Orders ||------|---------|--------|| Apr 8 | $142,000 | 490 || Apr 9 | $138,000 | 478 || Apr 10 | $145,000 | 502 || Apr 11 | $151,000 | 523 || Apr 12 | $133,000 | 461 || Apr 13 | $128,000 | 443 || Apr 14 | $150,000 | 523 |
## Top 5 Products1. Widget Pro — $124,000 (12.6%)2. Gadget Plus — $98,000 (9.9%)...The Workflow
The typical Cursor + DataSpoc workflow:
- Open Cursor in your project directory
- Ask about data — Cursor discovers tables and schemas via MCP
- Iterate on queries — refine SQL based on results, all in chat
- Generate code — ask Cursor to write Python scripts that use LensClient
- Run and refine — execute in Cursor’s terminal, review output, iterate
- Save artifacts — charts, reports, and analysis scripts stay in your project
Comparison: Traditional BI vs Jupyter vs CLI vs Cursor
| Task | BI Tool (Metabase) | Jupyter | CLI (dataspoc-lens) | Cursor + MCP |
|---|---|---|---|---|
| Explore tables | Click through UI | lens.tables() | dataspoc-lens tables | ”What tables exist?” |
| Write SQL | SQL editor | Cell with lens.query() | Pipe to command | ”Show me revenue by month” |
| Generate charts | Drag-and-drop | Matplotlib code | Export + separate tool | ”Create a bar chart” |
| Write reports | Dashboard | Markdown cells | Manual | ”Write a weekly report” |
| Iterate | Refresh dashboard | Re-run cells | Re-run command | Continue conversation |
| Save work | Dashboard link | .ipynb file | Shell history | .py files in project |
| Learn curve | Medium | Medium | Low | Low |
Cursor with MCP combines the natural language interface of a BI tool with the code generation of Jupyter and the speed of a CLI. You describe what you want, Cursor writes the code, and you run it.
Tips for Effective Data Analysis in Cursor
Start with exploration:
What tables are available? Show me the schema of the largest table.Be specific about time ranges:
Revenue by product category for March 2026 only, excluding returns.Ask for comparisons:
Compare this month vs last month for all key metrics.Show percentage changes.Request reusable code:
Write a function I can reuse to generate monthly reports.Take the month as a parameter.Chain analyses:
First show me which customers churned last month.Then analyze what they had in common (plan, region, usage).Then suggest retention strategies based on the patterns.Security
Cursor’s MCP integration inherits all of DataSpoc’s security properties:
- Read-only: The MCP server rejects write SQL
- Cloud IAM: Uses your existing credentials (AWS SSO, gcloud, Azure CLI)
- Scoped: Only accesses the bucket you configure
- Auditable: Every query is SQL that gets logged
Your data stays in your cloud bucket. Cursor’s AI generates SQL, Lens executes it locally via DuckDB, and results stay on your machine. No data leaves your environment unless you explicitly export it.