ai-agentsagent-mdmcpclaude-codepatterns

The AGENT.md Pattern: Teaching AI Agents to Use Your Tools

Michael San Martim · 2026-04-28

You give an AI agent access to your data platform. It has the CLI installed. It has the Python SDK available. What does it do?

It guesses. It hallucinates flags. It invents API methods that do not exist. It spends 4 tool calls figuring out the basic workflow, burning tokens and your patience.

There is a better way. A single Markdown file that teaches any AI agent exactly how to use your tools. We call it AGENT.md.

The Problem

Here is what happens when an AI agent tries to use a tool without documentation:

Agent: I'll use the dataspoc CLI to ingest data.
Agent: Running: dataspoc ingest --from postgres --to s3://bucket
Error: No such command 'ingest'

Agent: Let me try: dataspoc pipe create --source postgres
Error: No such command 'pipe'

Agent: Maybe: dataspoc-pipe --help
[reads 50 lines of help text, burns 2000 tokens]

Agent: OK, I think it's: dataspoc-pipe run my-pipeline
Error: Pipeline 'my-pipeline' not found. Run 'dataspoc-pipe add' first.

Four attempts, four failures. The agent eventually gets there, but it took 8 tool calls and 10,000 tokens of fumbling.

The Solution: AGENT.md

AGENT.md is a Markdown file designed specifically for AI agents to read. Not for humans (they have README), not for code (that is the docstrings). For agents.

Here is DataSpoc’s AGENT.md:

# AGENT.md — DataSpoc Platform

## Capabilities

| Action | Tool | Command |
|--------|------|---------|
| Ingest data | Pipe | `dataspoc-pipe run <pipeline>` |
| Add new source | Pipe | `dataspoc-pipe add <name> --source <type>` |
| Query data | Lens | `dataspoc-lens shell` then SQL |
| Ask questions | Lens | `dataspoc-lens ask "<question>"` |
| Train model | Lens | `dataspoc-lens ml train --table <t> --target <col>` |
| Predict | Lens | `dataspoc-lens ml predict --model <name> --input <table>` |

## Python SDK

```python
from dataspoc_pipe import PipeClient
from dataspoc_lens import LensClient

pipe = PipeClient(project_path="./my-project")
lens = LensClient(bucket_alias="company")

# Ingest
pipe.run("my-pipeline")

# Query
result = lens.sql("SELECT * FROM raw__postgres__orders LIMIT 10")

# AI Ask
answer = lens.ask("What is the total revenue this month?")

# Train
lens.ml_train(table="curated__sales__customers", target="churned", model_name="churn-v1")

Patterns

Ingest then Query

pipe.run("pipeline-name")
lens.sql("SELECT COUNT(*) FROM raw__source__table")

Explore then Model

lens.tables() — see what is available
lens.sql("SELECT * FROM table LIMIT 5") — understand the schema
lens.ml_train(table=..., target=..., model_name=...) — train

Ask then Verify

lens.ask("business question") — get SQL + result
lens.sql(answer.sql) — re-run to verify

Constraints

Never modify bucket structure directly. Always use Pipe or Lens.
Table names follow: layer__source__tablename (double underscore).
Pipe writes raw/ and curated/. Lens writes gold/.
ML models go to ml/models/. Predictions go to ml/predictions/.
All credentials come from environment variables. Never hardcode.

Error Recovery

Error	Cause	Fix
”Pipeline not found”	Wrong name	Run `pipe.list_pipelines()`
”Table not found”	Data not ingested	Run the pipeline first
”No manifest”	Bucket not initialized	Run `dataspoc-pipe init`
”Access denied”	Missing AWS creds	Check AWS_* env vars

## Why This Works

AI agents (Claude, GPT-4, Gemini, local models) all share one behavior: they read context and follow instructions. When they encounter AGENT.md, they:

1. **Scan the capabilities table** — know exactly what actions are possible
2. **Copy the Python SDK examples** — use correct syntax on the first try
3. **Follow the patterns** — execute multi-step workflows correctly
4. **Respect the constraints** — avoid common mistakes
5. **Use error recovery** — self-correct without trial and error

The result: one tool call instead of eight. Correct on the first attempt.

## Comparison: Three Approaches

### No Documentation

Agent behavior:

Guesses command names (wrong 80% of the time)
Invents flags that don’t exist
Takes 5-10 attempts to complete a task
Burns 10,000+ tokens on exploration
May give up entirely

### README.md (Designed for Humans)

Agent behavior:

Finds installation instructions (not useful, already installed)
Reads feature list (marketing language, not actionable)
Finds some code examples (buried in prose)
Takes 2-4 attempts to complete a task
Burns 5,000+ tokens parsing human-friendly text

### AGENT.md (Designed for Machines)

Agent behavior:

Finds exact command in capabilities table (1 lookup)
Copies Python SDK example (correct syntax)
Follows pattern for multi-step task (no guessing)
Completes task on first attempt
Burns <1,000 tokens on the documentation

## How to Write AGENT.md for Your Own Tools

### Step 1: Capabilities Table

List every action your tool can do. One row per action. Include the exact command or function call.

```markdown
## Capabilities

| Action | Command/Function |
|--------|-----------------|
| Create user | `api.create_user(email, name)` |
| List users | `api.list_users(filters={})` |
| Delete user | `api.delete_user(user_id)` |

Step 2: Code Examples

Show the exact import and initialization. Show the most common operations with real, runnable code:

## SDK

```python
from myservice import Client

client = Client(api_key=os.environ["MY_API_KEY"])

# Create
user = client.create_user(email="test@example.com", name="Test")

# Read
users = client.list_users(filters={"active": True})

# Update
client.update_user(user.id, name="New Name")

### Step 3: Patterns (Multi-step Workflows)

Agents need to know the order of operations. Show common workflows as numbered steps:

```markdown
## Patterns

### Onboard a New Customer
1. `client.create_org(name="Acme")`
2. `client.create_user(email="admin@acme.com", org_id=org.id, role="admin")`
3. `client.send_invite(user.id)`

### Generate Monthly Report
1. `data = client.get_metrics(period="last_month")`
2. `report = client.generate_report(data, format="pdf")`
3. `client.email_report(report, recipients=["cfo@company.com"])`

Step 4: Constraints

What should the agent NEVER do? What are the hard rules?

## Constraints

- Never delete production data without confirmation
- Always use pagination for list endpoints (max 100 per page)
- Rate limit: 60 requests per minute
- IDs are UUIDs, never integers
- Dates are always ISO 8601 with timezone

Step 5: Error Recovery Table

Agents encounter errors. Tell them what each error means and how to fix it:

## Error Recovery

| Error | Fix |
|-------|-----|
| 401 Unauthorized | Re-check API_KEY env var |
| 404 Not Found | Verify resource ID with list endpoint |
| 429 Rate Limited | Wait 60 seconds, retry |
| 500 Server Error | Retry up to 3 times with backoff |

Full Template

Here is a copy-paste template for your own tools:

# AGENT.md — [Your Tool Name]

## Capabilities

| Action | Command/Function | Returns |
|--------|-----------------|---------|
| ... | ... | ... |

## Setup

```python
from your_tool import Client
client = Client()  # or however initialization works

Common Operations

# Operation 1
result = client.do_thing(param1, param2)

# Operation 2
items = client.list_things(filter="active")

Patterns

Pattern Name

Step one
Step two
Step three

Constraints

Rule 1
Rule 2
Rule 3

Error Recovery

Error	Cause	Fix
…	…	…

## Where to Put AGENT.md

Place it where AI agents will find it:

- **Repository root** — agents exploring your codebase will find it immediately
- **Package directory** — agents using your installed package can locate it
- **Documentation site** — agents with web access can fetch it
- **System prompt** — if you control the agent, inject it directly

DataSpoc includes AGENT.md in the installed package:

```python
import importlib.resources
agent_instructions = importlib.resources.read_text("dataspoc_lens", "AGENT.md")

MCP servers can expose it as a resource:

@server.resource("dataspoc://agent-instructions")
def get_agent_instructions():
    return read_file("AGENT.md")

Real-World Impact

We tested the same task (“ingest data from Postgres and train a churn model”) with and without AGENT.md across three AI agents:

Agent	Without AGENT.md	With AGENT.md
Claude	7 tool calls, 2 errors	3 tool calls, 0 errors
GPT-4	9 tool calls, 4 errors	3 tool calls, 0 errors
Gemini	11 tool calls, 5 errors	4 tool calls, 1 error

The pattern works because it matches how agents process information: structured, scannable, copy-paste ready. No prose to parse, no ambiguity to resolve, no human context to infer.

Write an AGENT.md for your tools. Your AI agents will thank you by actually working on the first try.

← Back to blog

The AGENT.md Pattern: Teaching AI Agents to Use Your Tools

The Problem

The Solution: AGENT.md

Patterns

Ingest then Query

Explore then Model

Ask then Verify

Constraints

Error Recovery

Step 2: Code Examples

Step 4: Constraints

Step 5: Error Recovery Table

Full Template

Common Operations

Patterns

Pattern Name

Constraints

Error Recovery

Real-World Impact

Recommended

Using Claude Code as Your Data Engineer: MCP + DataSpoc

Analyze Your Data Lake from Cursor IDE with DataSpoc MCP

Data Governance for AI Agents: How DataSpoc Keeps Your Lake Secure