Data Analysis from Windsurf IDE with DataSpoc MCP
Windsurf is an AI-native IDE built by Codeium. Like Cursor, it can connect to MCP (Model Context Protocol) servers. Connect DataSpoc Lens as an MCP server and Windsurf’s AI assistant gains direct access to your data lake — it can list tables, read schemas, run SQL queries, and answer questions about your data. All without leaving your editor.
What You Get
After setup, you can type natural language in Windsurf’s AI panel:
- “What tables do we have in the data lake?”
- “Show me the schema for the orders table”
- “What was revenue last month?”
- “Find users who signed up but never purchased”
- “Create a Python script that generates a monthly revenue report”
Windsurf calls DataSpoc Lens through MCP, gets the real data, and uses it to answer your questions or generate code.
Step 1: Install DataSpoc Lens
pip install dataspoc-lensConfigure your bucket:
dataspoc-lens add-bucket s3://my-company-data --name productiondataspoc-lens tables # verify connectionStep 2: Configure MCP in Windsurf
Open Windsurf’s MCP configuration. Navigate to Settings, then MCP Servers, or edit the config file directly:
{ "mcpServers": { "dataspoc-lens": { "command": "dataspoc-lens", "args": ["mcp"], "env": { "AWS_PROFILE": "default" } } }}On macOS/Linux, the config file is typically at ~/.codeium/windsurf/mcp_config.json.
If you also want data ingestion capabilities, add the Pipe MCP server:
{ "mcpServers": { "dataspoc-lens": { "command": "dataspoc-lens", "args": ["mcp"], "env": { "AWS_PROFILE": "default" } }, "dataspoc-pipe": { "command": "dataspoc-pipe", "args": ["mcp"], "env": { "AWS_PROFILE": "default" } } }}Restart Windsurf after editing the config.
Step 3: Verify the Connection
In Windsurf’s AI panel, type:
What tables are available in my data lake?Windsurf calls the list_tables tool via MCP and responds with something like:
You have 8 tables available:
- raw.customers (12,450 rows)- raw.orders (89,230 rows)- raw.products (340 rows)- raw.events (1,245,600 rows)- curated.dim_customers (12,340 rows)- curated.fct_orders (87,650 rows)- gold.daily_revenue (480 rows)- gold.customer_dashboard (11,890 rows)If you see this, MCP is connected and working.
Workflow 1: Analyze Revenue Trends
Type in Windsurf’s AI panel:
Analyze revenue trends for Q1 2026. Show monthly breakdown and growth rate.Behind the scenes, Windsurf:
- Calls
get_schemafor relevant tables - Writes a SQL query with monthly aggregation
- Calls
run_queryto execute it - Formats the results with analysis
Response:
Q1 2026 Revenue Analysis:
| Month | Revenue | Orders | Avg Order | MoM Growth ||----------|-----------|--------|-----------|------------|| Jan 2026 | $245,600 | 3,120 | $78.72 | -- || Feb 2026 | $268,400 | 3,450 | $77.80 | +9.3% || Mar 2026 | $312,100 | 4,010 | $77.83 | +16.3% |
Q1 Total: $826,100Average monthly growth: 12.8%
Revenue is accelerating. March saw 16.3% growth, nearly doubleJanuary's baseline. Order count is growing faster than averageorder value, suggesting new customer acquisition is driving growth.All numbers come from real SQL queries on your actual data.
Workflow 2: Find Anomalies in User Signups
Find anomalies in user signups over the past 30 days. Flag any dayswith unusual signup counts.Windsurf queries the data and responds:
I analyzed daily signups for the past 30 days. The average is 42signups/day with a standard deviation of 8.
Anomalies detected:
| Date | Signups | Z-Score | Notes ||------------|---------|---------|---------------------|| 2026-04-12 | 127 | +10.6 | 3x normal volume || 2026-04-13 | 89 | +5.9 | 2x normal, tail || 2026-04-03 | 18 | -3.0 | Weekend + holiday |
April 12-13 saw a massive spike. This correlates with a Product Huntlaunch (if relevant). The April 3 dip aligns with a weekend.Workflow 3: Create a Churn Report
Create a churn analysis report. Define churned as no order in 90 days.Show churn by plan tier and signup cohort.Windsurf builds the SQL, runs it, and presents:
Churn Analysis (as of 2026-04-28):
By Plan Tier:| Tier | Total | Churned | Churn Rate ||------------|-------|---------|------------|| free | 5,990 | 3,240 | 54.1% || pro | 4,120 | 824 | 20.0% || enterprise | 2,340 | 117 | 5.0% |
By Signup Cohort (monthly):| Cohort | Signups | Retained | 90-Day Retention ||----------|---------|----------|-----------------|| Jan 2026 | 1,245 | 892 | 71.6% || Feb 2026 | 1,380 | 1,048 | 75.9% || Mar 2026 | 1,520 | 1,230 | 80.9% |
Retention is improving month over month. Enterprise churn isextremely low at 5%. Focus retention efforts on free tier users.Workflow 4: Generate Code from Data
The real power of an IDE integration: Windsurf can write code that uses your actual data schema.
Create a Python script that generates a weekly revenue report as aPDF using the gold.daily_revenue table.Windsurf inspects the schema via MCP, then generates a Python script with correct column names, proper imports, and working queries — all in a new file in your editor. The code references real table and column names from your data lake because Windsurf looked them up through MCP before writing the code.
Tips for Effective Data Prompts in Windsurf
Be specific about the time range:
# Good"Show revenue for March 2026"
# Vague"Show recent revenue"Name the tables if you know them:
# Good"Query gold.customer_dashboard for churned enterprise customers"
# Slower (Windsurf has to discover tables first)"Find churned enterprise customers"Ask for the SQL too:
"What's our MRR? Show me the SQL you used."This lets you verify the query and reuse it later.
Chain requests for complex analysis:
1. "List all tables and their row counts"2. "Show the schema for raw.orders and raw.customers"3. "Now build a cohort retention analysis using those tables"Breaking complex analysis into steps gives Windsurf context to build better queries.
Ask for code generation after analysis:
1. "What does our daily revenue look like this month?"2. "Create a Python function that queries this data and returns a pandas DataFrame"3. "Add a Matplotlib chart to visualize the trend"Windsurf vs. Cursor for Data Analysis
Both IDEs support MCP, and both work with DataSpoc. Here is how they compare:
| Feature | Windsurf | Cursor |
|---|---|---|
| MCP support | Yes | Yes |
| AI model | Codeium + configurable | Configurable (Claude, GPT) |
| Data analysis flow | AI panel + inline | Chat + Composer |
| Multi-file editing | Cascade flow | Composer |
| Free tier | Generous | Limited |
| MCP config location | ~/.codeium/windsurf/mcp_config.json | ~/.cursor/mcp.json |
The MCP configuration is nearly identical. If you already use DataSpoc MCP with Cursor, switching to Windsurf takes 30 seconds — just copy the server config.
Troubleshooting
MCP server not connecting:
# Test the MCP server manuallydataspoc-lens mcp --test
# Check that the command is in your PATHwhich dataspoc-lensQueries timing out:
# Cache frequently queried tablesdataspoc-lens cache refresh raw.ordersdataspoc-lens cache refresh raw.customersWrong AWS credentials:
Make sure the AWS_PROFILE in your MCP config matches a valid profile in ~/.aws/credentials, or set explicit credentials:
{ "mcpServers": { "dataspoc-lens": { "command": "dataspoc-lens", "args": ["mcp"], "env": { "AWS_ACCESS_KEY_ID": "AKIA...", "AWS_SECRET_ACCESS_KEY": "...", "AWS_REGION": "us-east-1" } } }}Connect Windsurf to your data lake once, and every AI conversation in your IDE has access to real data. No context switching, no separate BI tool, no copy-pasting query results. Just ask questions and get answers grounded in your actual data.