ragmcpai-agentssqlembeddings

RAG vs SQL: Why Your AI Agent Should Query, Not Embed

Michael San Martim · 2026-04-22

Every week, another team announces they’re “building RAG for their data warehouse.” They chunk their tables into vectors, store them in Pinecone, and ask an LLM to answer business questions. It works for documents. It fails catastrophically for structured data.

The Problem: RAG on Tables

Here’s what happens when you use RAG for structured data:

# The RAG approach to answering "What was revenue in Q3?"
# Step 1: Chunk your tables into text
chunks = []
for row in orders_table:
chunks.append(f"Order {row.id}: customer {row.customer}, "
f"amount ${row.amount}, date {row.date}")
# Step 2: Embed chunks
embeddings = openai.embed(chunks) # $0.0001 per chunk, 1M rows = $100
# Step 3: Store in vector DB
pinecone_index.upsert(vectors=embeddings) # $70/month for 1M vectors
# Step 4: At query time, retrieve "relevant" chunks
query_embedding = openai.embed("What was revenue in Q3?")
results = pinecone_index.query(query_embedding, top_k=20)
# Step 5: Ask LLM to synthesize
answer = openai.chat(
f"Based on these records: {results}\n"
f"Answer: What was revenue in Q3?"
)
# Returns: "Based on the retrieved records, Q3 revenue appears to be
# approximately $2.3M" (actual answer: $4.7M)

Why it fails:

  1. Retrieval misses rows. Vector similarity finds “relevant” chunks, not ALL matching rows. If you have 50,000 Q3 orders, you retrieved 20.
  2. Aggregation is impossible. You cannot SUM, AVG, or COUNT from 20 random samples.
  3. Data goes stale. You need to re-embed every time data changes.
  4. It’s expensive. Embedding millions of rows + vector DB hosting + LLM tokens for context.
  5. It hallucinates with confidence. The LLM sees 20 rows and extrapolates.

The Correct Approach: SQL

The same question answered with SQL:

SELECT SUM(amount) AS revenue
FROM orders
WHERE created_at >= '2024-07-01'
AND created_at < '2024-10-01'

Result: $4,712,843.00 — exact, instant, free.

MCP + SQL with DataSpoc Lens

DataSpoc Lens exposes your data lake as an MCP server. AI agents execute real SQL queries instead of guessing from embeddings.

Setup

Terminal window
pip install dataspoc-lens[mcp]
dataspoc-lens add-bucket s3://company-lake
dataspoc-lens mcp

How the Agent Works

from dataspoc_lens import LensClient
client = LensClient()
# Agent discovers what data exists
tables = client.list_tables()
# ['raw.postgres.orders', 'raw.postgres.customers', 'raw.postgres.products']
# Agent inspects schema
schema = client.get_schema("raw.postgres.orders")
# [{"name": "id", "type": "INTEGER"},
# {"name": "customer_id", "type": "INTEGER"},
# {"name": "amount", "type": "DECIMAL"},
# {"name": "created_at", "type": "TIMESTAMP"}, ...]
# Agent writes and executes SQL
result = client.query("""
SELECT SUM(amount) AS revenue
FROM raw.postgres.orders
WHERE created_at >= '2024-07-01'
AND created_at < '2024-10-01'
""")
# [{"revenue": 4712843.00}]

No embeddings. No vector DB. No hallucination. Real data, real answers.

Side-by-Side Code Comparison

RAG Pipeline (what teams are building)

# ingest.py — runs daily, takes 45 minutes for 1M rows
import pandas as pd
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_pinecone import PineconeVectorStore
df = pd.read_parquet("s3://company-lake/raw/postgres/orders/")
# Convert rows to text chunks
documents = []
for _, row in df.iterrows():
text = f"Order {row['id']}: {row['customer_name']} purchased "
text += f"${row['amount']:.2f} on {row['created_at']}"
documents.append(text)
# Split and embed
splitter = RecursiveCharacterTextSplitter(chunk_size=500)
chunks = splitter.create_documents(documents)
embeddings = OpenAIEmbeddings()
vectorstore = PineconeVectorStore.from_documents(chunks, embeddings, index_name="orders")
# query.py — at query time
retriever = vectorstore.as_retriever(search_kwargs={"k": 50})
docs = retriever.get_relevant_documents("What was Q3 revenue?")
# Gets 50 random-ish order records out of 250,000 Q3 orders

SQL Pipeline (what you should build)

# query.py — that's it, no ingest step needed
from dataspoc_lens import LensClient
client = LensClient()
result = client.query("SELECT SUM(amount) FROM orders WHERE quarter = 'Q3-2024'")
# Scans all 250,000 rows in 0.3 seconds, returns exact answer

The Comparison

DimensionRAG on TablesMCP + SQL
Accuracy~60% (misses rows, can’t aggregate)99%+ (exact computation)
Speed3-8 seconds (embed query + retrieve + LLM)0.1-2 seconds (SQL execution)
FreshnessStale until re-embedded (hours/days)Real-time (queries live data)
Cost per query$0.02-0.10 (embedding + LLM tokens)$0.001 (DuckDB compute only)
Monthly infra$200+ (vector DB + embedding API)$0 (DuckDB is free)
Auditability”Based on retrieved context…”Exact SQL query, reproducible
AggregationsImpossible (SUM of 20 rows != SUM of 250K)Native (that’s what SQL does)
JoinsCannot join across embeddingsNative multi-table joins
UpdatesRe-embed entire datasetAutomatic (reads latest Parquet)

When RAG IS the Right Choice

RAG is excellent for unstructured documents:

  • PDF reports, legal contracts, support tickets
  • Documentation search
  • Knowledge bases with prose content
  • Any data where keyword/semantic search is the access pattern

The rule is simple:

Tables and numbers = SQL. Documents and prose = RAG.

A Hybrid Architecture

For most companies, the best architecture combines both:

┌─────────────────────────────────────────────────┐
│ AI Agent │
├─────────────────────────────────────────────────┤
│ │
│ Structured data? ──► DataSpoc Lens (MCP + SQL) │
│ │
│ Unstructured docs? ──► RAG (vector search) │
│ │
└─────────────────────────────────────────────────┘
from dataspoc_lens import LensClient
from langchain_pinecone import PineconeVectorStore
lens = LensClient()
vectorstore = PineconeVectorStore(index_name="docs")
def route_question(question: str) -> str:
"""Route to SQL for data questions, RAG for document questions."""
data_keywords = ["revenue", "count", "average", "total", "growth",
"last month", "quarterly", "how many", "top 10"]
if any(kw in question.lower() for kw in data_keywords):
# SQL path — exact answers from structured data
result = lens.ask(question)
return result["answer"]
else:
# RAG path — semantic search over documents
docs = vectorstore.similarity_search(question, k=5)
return synthesize_from_docs(docs, question)

The Cost of Getting This Wrong

A real example from a Series B startup:

MetricTheir RAG setupAfter switching to SQL
Monthly cost$2,400 (Pinecone + OpenAI embeddings)$0 (DuckDB on Parquet)
Query accuracy58% (validated against actual numbers)99.7%
Data freshness6-hour lag (daily re-embedding)Real-time
Maintenance2 engineers part-timeZero (self-service)
Time to answer4.2 seconds average0.8 seconds average

They were spending $2,400/month to get wrong answers slowly.

Getting Started

Terminal window
# If your data is already in Parquet/S3:
pip install dataspoc-lens[mcp]
dataspoc-lens add-bucket s3://your-bucket
dataspoc-lens mcp # Start MCP server for AI agents
# If your data is in Postgres/MySQL:
pip install dataspoc-pipe
dataspoc-pipe init && dataspoc-pipe add postgres --tables orders
dataspoc-pipe run # Extracts to Parquet
# Then query with SQL
dataspoc-lens query "SELECT SUM(amount) FROM raw.postgres.orders WHERE created_at > '2024-01-01'"

Stop embedding your tables. Start querying them.

Recommended