langchainsqlpythondata-engineeringcomparison

LangChain SQLDatabaseChain vs DataSpoc Lens: Which Is Better for Data Queries?

Michael San Martim · 2026-04-16

LangChain’s SQLDatabaseChain was one of the first tools that let LLMs query databases. DataSpoc Lens takes a different approach: a purpose-built query engine over cloud Parquet. Both let you ask natural language questions about data. They solve the problem very differently.

This post puts them side by side with real code, then gives you a verdict.

The Setup: LangChain SQLDatabaseChain

LangChain needs a database connection, an LLM, and a chain that ties them together:

pip install langchain langchain-openai langchain-community sqlalchemy psycopg2-binary

from langchain_community.utilities import SQLDatabase
from langchain_openai import ChatOpenAI
from langchain.chains import create_sql_query_chain
from langchain_community.tools.sql_database.tool import QuerySQLDataBaseTool

# 1. Connect to your database
db = SQLDatabase.from_uri(
    "postgresql://user:password@host:5432/analytics",
    include_tables=["orders", "customers", "products"],  # limit scope
    sample_rows_in_table_info=3,  # include sample rows in schema prompt
)

# 2. Create LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0)

# 3. Create the chain
query_chain = create_sql_query_chain(llm, db)
execute_tool = QuerySQLDataBaseTool(db=db)

# 4. Ask a question
sql = query_chain.invoke({"question": "What were total sales by region last quarter?"})
print(f"Generated SQL: {sql}")

result = execute_tool.invoke(sql)
print(f"Result: {result}")

This works, but notice what you need: a running PostgreSQL instance, credentials in your code, network access to the database, and the right SQLAlchemy driver installed.

The Setup: DataSpoc Lens

Lens connects directly to Parquet files in your cloud bucket:

pip install dataspoc-lens

from dataspoc_lens import LensClient

lens = LensClient()  # reads config from ~/.dataspoc/config.yaml or env vars

# List what's available
print(lens.tables())
# ['raw_orders', 'raw_customers', 'curated_sales', 'gold_revenue']

# Ask a question (natural language → SQL → execute → result)
answer = lens.ask("What were total sales by region last quarter?")
print(answer)

# Or write SQL directly
df = lens.query("""
    SELECT region, SUM(amount) as total_sales
    FROM curated_sales
    WHERE sale_date >= '2026-01-01'
    GROUP BY region
    ORDER BY total_sales DESC
""")
print(df)

No database server. No credentials in code. No SQLAlchemy. The data lives as Parquet files in S3/GCS/Azure, and Lens reads them with DuckDB.

Side-by-Side: The Same 5 Tasks

Task 1: List Available Tables

# LangChain
print(db.get_usable_table_names())
# Requires active DB connection

# DataSpoc Lens
print(lens.tables())
# Reads from bucket manifest — no database needed

Task 2: Explore a Schema

# LangChain
print(db.get_table_info(table_names=["orders"]))
# Returns CREATE TABLE + sample rows (sent to LLM as context)

# DataSpoc Lens
print(lens.schema("raw_orders"))
# Returns column names and types as a dict

Task 3: Natural Language Query

# LangChain
chain = create_sql_query_chain(llm, db)
sql = chain.invoke({"question": "Top 5 customers by lifetime value"})
result = execute_tool.invoke(sql)

# DataSpoc Lens
answer = lens.ask("Top 5 customers by lifetime value")

Task 4: Direct SQL

# LangChain
result = db.run("SELECT customer_id, SUM(amount) FROM orders GROUP BY 1 ORDER BY 2 DESC LIMIT 5")

# DataSpoc Lens
df = lens.query("SELECT customer_id, SUM(amount) FROM raw_orders GROUP BY 1 ORDER BY 2 DESC LIMIT 5")
# Returns a pandas DataFrame, not a string

Task 5: Use with an AI Agent

# LangChain — define as a tool
from langchain.agents import create_openai_tools_agent, AgentExecutor
from langchain.tools import Tool

tools = [
    Tool(name="query_db", func=execute_tool.invoke, description="Run SQL"),
    Tool(name="list_tables", func=db.get_usable_table_names, description="List tables"),
]
agent = create_openai_tools_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools)
executor.invoke({"input": "Analyze sales trends"})

# DataSpoc Lens — native MCP server
# No code needed. Just configure the MCP server:
# dataspoc-lens mcp
# Any MCP-compatible agent (Claude, Cursor, etc.) connects directly.

Comparison Matrix

Criteria	LangChain SQLDatabaseChain	DataSpoc Lens
Data source	Any SQL database (Postgres, MySQL, etc.)	Parquet in S3/GCS/Azure
Infrastructure	Running database + network access	Cloud bucket only
Credentials	Connection string with user/password	Cloud IAM (no secrets)
Query engine	Database’s own engine	DuckDB (in-process)
Natural language	LLM generates SQL via chain	`lens.ask()` built-in
Agent integration	LangChain agents/tools	MCP server (any client)
Return type	String	pandas DataFrame
Caching	None built-in	`lens.cache_status()`, `lens.cache_refresh()`
Setup time	10-30 minutes	2 minutes
Cost	Database compute + LLM tokens	LLM tokens only
Write safety	Can execute INSERT/UPDATE/DELETE	Read-only by design

Where LangChain Wins

LangChain’s SQL chain is the right choice when:

Your data lives in a traditional database. If it is in Postgres, MySQL, or Snowflake and you do not want to move it, LangChain connects directly.
You need multi-database joins. LangChain can connect to multiple databases through different chains.
You are already in the LangChain ecosystem. If your app uses LangChain for other things (document retrieval, chat memory, tool orchestration), adding SQLDatabaseChain is natural.

# LangChain excels at ad-hoc DB access
from langchain_community.utilities import SQLDatabase

# Connect to your production read replica
db = SQLDatabase.from_uri("postgresql://readonly:pass@replica:5432/prod")

Where DataSpoc Lens Wins

Lens is the right choice when:

Your data is in a lake (Parquet in S3/GCS/Azure). Lens is purpose-built for this. No database to manage.
You want zero infrastructure. No database server, no connection pooling, no read replicas. DuckDB runs in-process.
You need agent-native access. The MCP server means any MCP client (Claude Code, Cursor, custom agents) connects without writing wrapper code.
Security matters. Cloud IAM controls access. No database credentials to manage or leak. Read-only by design.

# Lens excels at data lake access
from dataspoc_lens import LensClient

lens = LensClient()
# Data comes from Parquet in your bucket — no DB needed
df = lens.query("SELECT * FROM curated_sales WHERE region = 'EMEA' LIMIT 100")

Hybrid: Use Both

If you have data in both a database and a lake, you can use both in the same agent:

import json
from openai import OpenAI
from langchain_community.utilities import SQLDatabase
from dataspoc_lens import LensClient

openai_client = OpenAI()
db = SQLDatabase.from_uri("postgresql://readonly:pass@host:5432/app")
lens = LensClient()

TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "query_app_db",
            "description": "Query the application database (users, sessions, settings).",
            "parameters": {
                "type": "object",
                "properties": {"sql": {"type": "string"}},
                "required": ["sql"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "query_data_lake",
            "description": "Query the data lake (sales, events, analytics).",
            "parameters": {
                "type": "object",
                "properties": {"sql": {"type": "string"}},
                "required": ["sql"],
            },
        },
    },
]


def dispatch(name, args):
    if name == "query_app_db":
        return db.run(args["sql"])
    elif name == "query_data_lake":
        df = lens.query(args["sql"])
        return df.to_json(orient="records")

The Verdict

Use LangChain SQLDatabaseChain when your data is in a traditional database and you are building a LangChain application.

Use DataSpoc Lens when your data is in a cloud data lake (Parquet files), you want zero infrastructure, or you need agent-native MCP access.

Use both when your data spans databases and lakes. Let the LLM decide which tool to call based on the question.

The real insight: the tool matters less than the data architecture. If your data is already in Parquet in a cloud bucket (and it should be for analytics), Lens gives you the simplest path from question to answer.

← Back to blog