anthropicclaudepythonsdkdata-analysis

Construindo um Agente de Consulta de Dados com Anthropic Claude SDK e DataSpoc

Michael San Martim · 2026-04-25

O recurso tool_use do Claude transforma o modelo em um agente que pode chamar suas funções, inspecionar resultados e decidir o que fazer em seguida. Combine isso com o DataSpoc Lens e você obtém uma IA que consulta seu data lake com SQL real — sem adivinhação, sem números alucinados.

Este post constrói um agente de consulta de dados completo usando o SDK Python da Anthropic. Você vai definir ferramentas, implementar o loop de mensagens e lidar com conversas multi-etapa onde o Claude descobre schemas, escreve queries e responde perguntas de acompanhamento.

Pré-requisitos

Terminal window
pip install anthropic dataspoc-lens
export ANTHROPIC_API_KEY="sk-ant-..."

Configure seu data lake:

Terminal window
dataspoc-lens add-bucket s3://my-company-data --name production

Passo 1: Definir Ferramentas como JSON Schema

O tool_use do Claude requer ferramentas definidas como JSON schemas. Cada ferramenta mapeia para um método do LensClient:

TOOLS = [
{
"name": "list_tables",
"description": (
"List all available tables in the data lake. "
"Call this first to discover what data exists."
),
"input_schema": {
"type": "object",
"properties": {},
"required": [],
},
},
{
"name": "get_schema",
"description": (
"Get column names and types for a table. "
"Always call this before writing SQL to know exact column names."
),
"input_schema": {
"type": "object",
"properties": {
"table_name": {
"type": "string",
"description": "Full table name, e.g. 'raw.orders'",
}
},
"required": ["table_name"],
},
},
{
"name": "run_query",
"description": (
"Execute a SQL query against the data lake using DuckDB syntax. "
"Returns up to 50 rows. Always include LIMIT unless aggregating."
),
"input_schema": {
"type": "object",
"properties": {
"sql": {
"type": "string",
"description": "DuckDB-compatible SQL query",
}
},
"required": ["sql"],
},
},
{
"name": "ask_ai",
"description": (
"Ask a natural language question about the data. "
"Uses the built-in AI query engine. Good for quick exploration."
),
"input_schema": {
"type": "object",
"properties": {
"question": {
"type": "string",
"description": "Natural language question about the data",
}
},
"required": ["question"],
},
},
]

Passo 2: Implementar a Execução das Ferramentas

Cada nome de ferramenta mapeia para uma chamada do LensClient:

from dataspoc_lens import LensClient
lens = LensClient()
def execute_tool(name: str, input: dict) -> str:
"""Route tool calls to LensClient methods."""
if name == "list_tables":
tables = lens.tables()
return "\n".join(tables)
elif name == "get_schema":
schema = lens.schema(input["table_name"])
lines = [f" {col['name']} ({col['type']})" for col in schema]
return f"Table: {input['table_name']}\n" + "\n".join(lines)
elif name == "run_query":
result = lens.query(input["sql"])
return result.to_string(max_rows=50)
elif name == "ask_ai":
result = lens.ask(input["question"])
return result
else:
return f"Unknown tool: {name}"

Passo 3: O Loop do Agente

O padrão central para o tool_use do Claude é um loop: enviar mensagens, verificar chamadas de ferramenta, executá-las, enviar resultados de volta, repetir até que o Claude responda com texto.

import anthropic
client = anthropic.Anthropic()
SYSTEM_PROMPT = (
"You are a data analyst with access to a company data lake stored as "
"Parquet files in cloud storage. You query it using DuckDB SQL via tools.\n\n"
"Workflow:\n"
"1. List tables to see what's available\n"
"2. Check schemas before writing SQL\n"
"3. Write precise SQL queries\n"
"4. Present results with business context\n\n"
"Use DuckDB syntax. For dates, use DATE '2026-01-01' format."
)
def chat(messages: list[dict]) -> tuple[str, list[dict]]:
"""Send messages to Claude and handle tool calls in a loop."""
while True:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system=SYSTEM_PROMPT,
tools=TOOLS,
messages=messages,
)
# Collect the assistant's response
messages.append({"role": "assistant", "content": response.content})
# If the model is done (no more tool calls), return the text
if response.stop_reason == "end_turn":
text = "".join(
block.text for block in response.content
if block.type == "text"
)
return text, messages
# Process tool calls
tool_results = []
for block in response.content:
if block.type == "tool_use":
print(f" [tool] {block.name}({block.input})")
result = execute_tool(block.name, block.input)
tool_results.append(
{
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
}
)
# Send tool results back to Claude
messages.append({"role": "user", "content": tool_results})

Este loop lida com o ciclo de vida completo: o Claude pede para usar uma ferramenta, nós a executamos, enviamos o resultado, e o Claude pede outra ferramenta ou responde com sua resposta final.

Passo 4: Sessão Interativa

def main():
messages = []
print("Data Lake Agent (Claude) -- type 'quit' to exit")
print("-" * 50)
while True:
question = input("\nYou: ")
if question.lower() in ("quit", "exit", "q"):
break
messages.append({"role": "user", "content": question})
answer, messages = chat(messages)
print(f"\nAgent: {answer}")
if __name__ == "__main__":
main()

Exemplo de Sessão

You: What data do we have?
[tool] list_tables({})
Agent: Here are the tables in your data lake:
- **raw.customers** -- Customer records
- **raw.orders** -- Order transactions
- **raw.products** -- Product catalog
- **curated.daily_revenue** -- Aggregated daily revenue
- **curated.customer_segments** -- Customer segmentation
You: Who are our top 5 customers by total spend?
[tool] get_schema({"table_name": "raw.orders"})
[tool] get_schema({"table_name": "raw.customers"})
[tool] run_query({"sql": "SELECT c.name, SUM(o.amount) as total_spend FROM raw.orders o JOIN raw.customers c ON o.customer_id = c.customer_id GROUP BY c.name ORDER BY total_spend DESC LIMIT 5"})
Agent: Here are your top 5 customers by total spend:
| Customer | Total Spend |
|-------------------|-------------|
| Acme Corporation | $1,245,800 |
| TechFlow Inc | $892,340 |
| Global Systems | $756,120 |
| DataVentures | $623,450 |
| CloudFirst Ltd | $589,200 |
Acme Corporation leads significantly, spending 40% more than the second
highest customer.
You: Show me Acme's orders over time, monthly
[tool] run_query({"sql": "SELECT DATE_TRUNC('month', o.order_date) as month, COUNT(*) as order_count, SUM(o.amount) as monthly_spend FROM raw.orders o JOIN raw.customers c ON o.customer_id = c.customer_id WHERE c.name = 'Acme Corporation' GROUP BY 1 ORDER BY 1"})
Agent: Acme Corporation's monthly orders show a clear growth trend...

Comparação: Anthropic SDK vs. OpenAI Function Calling

O padrão é quase idêntico. Aqui estão as diferenças principais:

# === Anthropic (Claude) ===
response = client.messages.create(
model="claude-sonnet-4-20250514",
tools=TOOLS, # tools at top level
messages=messages,
)
# Tool calls are in response.content as tool_use blocks
# Results go back as tool_result in a user message
# === OpenAI ===
response = openai.chat.completions.create(
model="gpt-4o",
tools=[{"type": "function", "function": t} for t in TOOLS],
messages=messages,
)
# Tool calls are in response.choices[0].message.tool_calls
# Results go back as role="tool" messages
AspectoAnthropic SDKOpenAI SDK
Definição de ferramentainput_schema (JSON Schema)parameters (JSON Schema)
Localização da chamadacontent[] blocks com type: "tool_use"tool_calls[] na mensagem
Formato do resultadotool_result em mensagem do usuárioMensagem separada role: "tool"
Ferramentas paralelasMúltiplos blocos tool_use em uma respostaMúltiplas entradas em tool_calls
Razão de paradastop_reason: "tool_use"finish_reason: "tool_calls"

Script Completo Funcional

"""claude_data_agent.py - Data lake agent with Anthropic Claude SDK."""
import anthropic
from dataspoc_lens import LensClient
lens = LensClient()
client = anthropic.Anthropic()
TOOLS = [
{
"name": "list_tables",
"description": "List all tables in the data lake.",
"input_schema": {
"type": "object",
"properties": {},
"required": [],
},
},
{
"name": "get_schema",
"description": "Get column names and types for a table.",
"input_schema": {
"type": "object",
"properties": {
"table_name": {"type": "string", "description": "Table name"}
},
"required": ["table_name"],
},
},
{
"name": "run_query",
"description": "Execute a DuckDB SQL query. Include LIMIT unless aggregating.",
"input_schema": {
"type": "object",
"properties": {
"sql": {"type": "string", "description": "SQL query"}
},
"required": ["sql"],
},
},
]
SYSTEM = (
"You are a data analyst. List tables first, check schemas, then query. "
"Use DuckDB SQL. Present results with business context."
)
def execute_tool(name: str, inp: dict) -> str:
if name == "list_tables":
return "\n".join(lens.tables())
elif name == "get_schema":
schema = lens.schema(inp["table_name"])
return "\n".join(f"{c['name']} ({c['type']})" for c in schema)
elif name == "run_query":
return lens.query(inp["sql"]).to_string(max_rows=50)
return f"Unknown tool: {name}"
def chat(messages: list[dict]) -> tuple[str, list[dict]]:
while True:
resp = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system=SYSTEM,
tools=TOOLS,
messages=messages,
)
messages.append({"role": "assistant", "content": resp.content})
if resp.stop_reason == "end_turn":
return "".join(b.text for b in resp.content if b.type == "text"), messages
results = []
for block in resp.content:
if block.type == "tool_use":
print(f" [tool] {block.name}({block.input})")
results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": execute_tool(block.name, block.input),
})
messages.append({"role": "user", "content": results})
def main():
messages = []
print("Data Lake Agent (Claude) -- type 'quit' to exit\n")
while True:
q = input("You: ")
if q.lower() in ("quit", "exit", "q"):
break
messages.append({"role": "user", "content": q})
answer, messages = chat(messages)
print(f"\nAgent: {answer}\n")
if __name__ == "__main__":
main()

Execute:

Terminal window
python claude_data_agent.py

O Claude descobre suas tabelas, entende schemas, escreve SQL e explica resultados — tudo fundamentado em dados reais dos seus arquivos Parquet na nuvem. Sem números alucinados, sem recuperação aproximada, apenas consultas precisas no seu data lake real.

Recomendados