windsurfmcpai-agentsidedata-analysis

Data Analysis from Windsurf IDE with DataSpoc MCP

Michael San Martim · 2026-04-28

Windsurf is an AI-native IDE built by Codeium. Like Cursor, it can connect to MCP (Model Context Protocol) servers. Connect DataSpoc Lens as an MCP server and Windsurf’s AI assistant gains direct access to your data lake — it can list tables, read schemas, run SQL queries, and answer questions about your data. All without leaving your editor.

What You Get

After setup, you can type natural language in Windsurf’s AI panel:

  • “What tables do we have in the data lake?”
  • “Show me the schema for the orders table”
  • “What was revenue last month?”
  • “Find users who signed up but never purchased”
  • “Create a Python script that generates a monthly revenue report”

Windsurf calls DataSpoc Lens through MCP, gets the real data, and uses it to answer your questions or generate code.

Step 1: Install DataSpoc Lens

Terminal window
pip install dataspoc-lens

Configure your bucket:

Terminal window
dataspoc-lens add-bucket s3://my-company-data --name production
dataspoc-lens tables # verify connection

Step 2: Configure MCP in Windsurf

Open Windsurf’s MCP configuration. Navigate to Settings, then MCP Servers, or edit the config file directly:

{
"mcpServers": {
"dataspoc-lens": {
"command": "dataspoc-lens",
"args": ["mcp"],
"env": {
"AWS_PROFILE": "default"
}
}
}
}

On macOS/Linux, the config file is typically at ~/.codeium/windsurf/mcp_config.json.

If you also want data ingestion capabilities, add the Pipe MCP server:

{
"mcpServers": {
"dataspoc-lens": {
"command": "dataspoc-lens",
"args": ["mcp"],
"env": {
"AWS_PROFILE": "default"
}
},
"dataspoc-pipe": {
"command": "dataspoc-pipe",
"args": ["mcp"],
"env": {
"AWS_PROFILE": "default"
}
}
}
}

Restart Windsurf after editing the config.

Step 3: Verify the Connection

In Windsurf’s AI panel, type:

What tables are available in my data lake?

Windsurf calls the list_tables tool via MCP and responds with something like:

You have 8 tables available:
- raw.customers (12,450 rows)
- raw.orders (89,230 rows)
- raw.products (340 rows)
- raw.events (1,245,600 rows)
- curated.dim_customers (12,340 rows)
- curated.fct_orders (87,650 rows)
- gold.daily_revenue (480 rows)
- gold.customer_dashboard (11,890 rows)

If you see this, MCP is connected and working.

Type in Windsurf’s AI panel:

Analyze revenue trends for Q1 2026. Show monthly breakdown and growth rate.

Behind the scenes, Windsurf:

  1. Calls get_schema for relevant tables
  2. Writes a SQL query with monthly aggregation
  3. Calls run_query to execute it
  4. Formats the results with analysis

Response:

Q1 2026 Revenue Analysis:
| Month | Revenue | Orders | Avg Order | MoM Growth |
|----------|-----------|--------|-----------|------------|
| Jan 2026 | $245,600 | 3,120 | $78.72 | -- |
| Feb 2026 | $268,400 | 3,450 | $77.80 | +9.3% |
| Mar 2026 | $312,100 | 4,010 | $77.83 | +16.3% |
Q1 Total: $826,100
Average monthly growth: 12.8%
Revenue is accelerating. March saw 16.3% growth, nearly double
January's baseline. Order count is growing faster than average
order value, suggesting new customer acquisition is driving growth.

All numbers come from real SQL queries on your actual data.

Workflow 2: Find Anomalies in User Signups

Find anomalies in user signups over the past 30 days. Flag any days
with unusual signup counts.

Windsurf queries the data and responds:

I analyzed daily signups for the past 30 days. The average is 42
signups/day with a standard deviation of 8.
Anomalies detected:
| Date | Signups | Z-Score | Notes |
|------------|---------|---------|---------------------|
| 2026-04-12 | 127 | +10.6 | 3x normal volume |
| 2026-04-13 | 89 | +5.9 | 2x normal, tail |
| 2026-04-03 | 18 | -3.0 | Weekend + holiday |
April 12-13 saw a massive spike. This correlates with a Product Hunt
launch (if relevant). The April 3 dip aligns with a weekend.

Workflow 3: Create a Churn Report

Create a churn analysis report. Define churned as no order in 90 days.
Show churn by plan tier and signup cohort.

Windsurf builds the SQL, runs it, and presents:

Churn Analysis (as of 2026-04-28):
By Plan Tier:
| Tier | Total | Churned | Churn Rate |
|------------|-------|---------|------------|
| free | 5,990 | 3,240 | 54.1% |
| pro | 4,120 | 824 | 20.0% |
| enterprise | 2,340 | 117 | 5.0% |
By Signup Cohort (monthly):
| Cohort | Signups | Retained | 90-Day Retention |
|----------|---------|----------|-----------------|
| Jan 2026 | 1,245 | 892 | 71.6% |
| Feb 2026 | 1,380 | 1,048 | 75.9% |
| Mar 2026 | 1,520 | 1,230 | 80.9% |
Retention is improving month over month. Enterprise churn is
extremely low at 5%. Focus retention efforts on free tier users.

Workflow 4: Generate Code from Data

The real power of an IDE integration: Windsurf can write code that uses your actual data schema.

Create a Python script that generates a weekly revenue report as a
PDF using the gold.daily_revenue table.

Windsurf inspects the schema via MCP, then generates a Python script with correct column names, proper imports, and working queries — all in a new file in your editor. The code references real table and column names from your data lake because Windsurf looked them up through MCP before writing the code.

Tips for Effective Data Prompts in Windsurf

Be specific about the time range:

# Good
"Show revenue for March 2026"
# Vague
"Show recent revenue"

Name the tables if you know them:

# Good
"Query gold.customer_dashboard for churned enterprise customers"
# Slower (Windsurf has to discover tables first)
"Find churned enterprise customers"

Ask for the SQL too:

"What's our MRR? Show me the SQL you used."

This lets you verify the query and reuse it later.

Chain requests for complex analysis:

1. "List all tables and their row counts"
2. "Show the schema for raw.orders and raw.customers"
3. "Now build a cohort retention analysis using those tables"

Breaking complex analysis into steps gives Windsurf context to build better queries.

Ask for code generation after analysis:

1. "What does our daily revenue look like this month?"
2. "Create a Python function that queries this data and returns a pandas DataFrame"
3. "Add a Matplotlib chart to visualize the trend"

Windsurf vs. Cursor for Data Analysis

Both IDEs support MCP, and both work with DataSpoc. Here is how they compare:

FeatureWindsurfCursor
MCP supportYesYes
AI modelCodeium + configurableConfigurable (Claude, GPT)
Data analysis flowAI panel + inlineChat + Composer
Multi-file editingCascade flowComposer
Free tierGenerousLimited
MCP config location~/.codeium/windsurf/mcp_config.json~/.cursor/mcp.json

The MCP configuration is nearly identical. If you already use DataSpoc MCP with Cursor, switching to Windsurf takes 30 seconds — just copy the server config.

Troubleshooting

MCP server not connecting:

Terminal window
# Test the MCP server manually
dataspoc-lens mcp --test
# Check that the command is in your PATH
which dataspoc-lens

Queries timing out:

Terminal window
# Cache frequently queried tables
dataspoc-lens cache refresh raw.orders
dataspoc-lens cache refresh raw.customers

Wrong AWS credentials:

Make sure the AWS_PROFILE in your MCP config matches a valid profile in ~/.aws/credentials, or set explicit credentials:

{
"mcpServers": {
"dataspoc-lens": {
"command": "dataspoc-lens",
"args": ["mcp"],
"env": {
"AWS_ACCESS_KEY_ID": "AKIA...",
"AWS_SECRET_ACCESS_KEY": "...",
"AWS_REGION": "us-east-1"
}
}
}
}

Connect Windsurf to your data lake once, and every AI conversation in your IDE has access to real data. No context switching, no separate BI tool, no copy-pasting query results. Just ask questions and get answers grounded in your actual data.

Recommended