Data Governance for AI Agents: How DataSpoc Keeps Your Lake Secure
The moment you give an AI agent access to your data, someone in security will ask: “What stops it from deleting everything?” Fair question. Most AI-to-data integrations have no good answer. DataSpoc does.
This post covers the security model that lets AI agents query your data lake without introducing new risks.
The Fear
Teams hesitate to connect AI agents to data for valid reasons:
- Write access: What if the agent runs
DROP TABLEorDELETE FROM? - Credential sprawl: Another set of database passwords to manage and rotate.
- Data exfiltration: Can the agent send data to unauthorized destinations?
- No audit trail: How do you know what data the agent accessed?
- Scope creep: The agent can see everything, including data it should not.
These fears are justified when you give agents direct database access. DataSpoc eliminates each one.
Security Layer 1: Read-Only by Design
The Lens MCP server is read-only. It does not expose write operations. Period.
from dataspoc_lens import LensClient
lens = LensClient()
# This works — read querydf = lens.query("SELECT * FROM curated_sales LIMIT 10")
# This is rejected — write querytry: lens.query("DROP TABLE curated_sales")except Exception as e: print(e) # "Write operations are not permitted. Lens is read-only."
# These are also rejectedlens.query("INSERT INTO curated_sales VALUES (...)") # rejectedlens.query("UPDATE curated_sales SET amount = 0") # rejectedlens.query("DELETE FROM curated_sales") # rejectedlens.query("CREATE TABLE test (id INT)") # rejectedThis is enforced at the engine level, not just the prompt level. Even if an LLM generates a write query, Lens will not execute it. The SQL parser checks every statement before execution.
When using the MCP server, the same protection applies:
{ "mcpServers": { "dataspoc-lens": { "command": "dataspoc-lens", "args": ["mcp"] } }}The MCP server exposes tools like query, tables, schema, and ask. None of them accept write operations. An AI agent connected via MCP physically cannot modify your data.
Security Layer 2: Cloud IAM (No New Credentials)
DataSpoc never manages credentials. It uses your existing cloud IAM:
AWS
# Lens uses your existing AWS credentials# Option 1: AWS SSO (recommended)aws sso login --profile data-team
# Option 2: IAM role (for EC2/ECS/Lambda)# Automatically uses the instance/task role
# Option 3: Environment variablesexport AWS_ACCESS_KEY_ID="..."export AWS_SECRET_ACCESS_KEY="..."The IAM policy controls what the agent can see:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": ["s3:GetObject", "s3:ListBucket"], "Resource": [ "arn:aws:s3:::company-analytics", "arn:aws:s3:::company-analytics/*" ] }, { "Effect": "Deny", "Action": ["s3:PutObject", "s3:DeleteObject"], "Resource": "*" } ]}Notice: the IAM policy explicitly denies write access to S3. Even if Lens had a bug that allowed write SQL (it does not), the cloud layer would block the write.
GCP
# Use application default credentialsgcloud auth application-default login
# Or service accountexport GOOGLE_APPLICATION_CREDENTIALS="/path/to/sa-key.json"# IAM binding — read-only access to the bucket- members: - serviceAccount:dataspoc-reader@project.iam.gserviceaccount.com role: roles/storage.objectViewerAzure
# Use Azure CLI credentialsaz login
# Or managed identity (recommended for production)export AZURE_STORAGE_ACCOUNT="companylake"{ "roleDefinitionName": "Storage Blob Data Reader", "scope": "/subscriptions/.../resourceGroups/.../providers/Microsoft.Storage/storageAccounts/companylake"}The key insight: DataSpoc adds zero new credentials. The AI agent has exactly the same access as the human who configured it. If your cloud IAM says “this identity can only read from the analytics bucket,” that is all the agent can do.
Security Layer 3: Bucket-Level Access Control
Different teams see different data by having access to different buckets:
s3://company-finance → Finance team onlys3://company-hr → HR team onlys3://company-product → Product team onlys3://company-analytics → Everyone (aggregated, non-sensitive)Configure Lens to point to the appropriate bucket:
# Finance team's agentexport DATASPOC_BUCKET="s3://company-finance"dataspoc-lens mcp # This agent sees finance data only
# Product team's agentexport DATASPOC_BUCKET="s3://company-product"dataspoc-lens mcp # This agent sees product data onlyAn agent configured with s3://company-product literally cannot access s3://company-finance. It does not know that bucket exists. The isolation is at the cloud infrastructure level, not application logic.
Security Layer 4: Audit Trail
Every query executed through Lens is SQL. SQL is text. Text is loggable.
from dataspoc_lens import LensClient
lens = LensClient()
# Every call to query() or ask() produces a SQL statement# that can be logged, reviewed, and audited
# ask() returns both the answer and the SQL it generatedanswer = lens.ask("How many customers do we have?")# Internally executes: SELECT COUNT(*) FROM curated_customers# This SQL is logged to .dataspoc/logs/Lens logs every query to the bucket:
bucket/ .dataspoc/ logs/ lens/ 2026-04-15T14:30:00Z.json 2026-04-15T14:31:15Z.jsonEach log entry contains:
{ "timestamp": "2026-04-15T14:30:00Z", "query": "SELECT COUNT(*) FROM curated_customers", "source": "mcp", "tables_accessed": ["curated_customers"], "rows_returned": 1, "duration_ms": 45, "status": "success"}You can review exactly what data the agent accessed, when, and how much. Compare this with RAG, where the retrieval step is opaque — you cannot easily see which chunks were sent to the LLM.
Comparison: Three Approaches to AI Data Access
Approach 1: Direct Database Access (Dangerous)
# The agent gets a database connection stringimport psycopg2conn = psycopg2.connect("postgresql://admin:password@prod-db:5432/main")cursor = conn.cursor()
# Nothing stops the agent from running:cursor.execute("DROP TABLE customers") # disastercursor.execute("SELECT * FROM hr.salaries") # data leakcursor.execute("UPDATE orders SET status = 'shipped'") # data corruptionProblems:
- Credentials in code
- Full read/write access
- No scope limitation
- One mistake destroys production data
Approach 2: RAG with Vector Store (Unauditable)
# The agent retrieves chunks from a vector storeresults = vector_store.similarity_search("customer salary data", k=20)# Which 20 chunks were returned? Hard to audit.# Did they include sensitive HR data? Maybe.# Can you prove what the LLM saw? Not easily.Problems:
- Opaque retrieval (what chunks were actually returned?)
- Embeddings can encode sensitive data
- No row-level access control
- Cannot prove compliance
Approach 3: DataSpoc Lens (Governed)
from dataspoc_lens import LensClient
lens = LensClient() # uses cloud IAM, read-only, scoped to one bucket
# Every action is SQL — auditable, reviewable, explainabledf = lens.query("SELECT region, COUNT(*) FROM curated_sales GROUP BY region")
# Write operations are rejected at engine level# Access scope is determined by cloud IAM# Every query is logged with timestamp and tables accessedAdvantages:
- No credentials to manage
- Read-only by design
- Scoped by cloud IAM
- Full audit trail
- Every answer traces to a SQL query
Configuration Checklist for Production
Here is a step-by-step checklist for deploying DataSpoc with AI agents in a governed environment:
1. Create a Dedicated IAM Identity
# AWS: Create a role for the agentaws iam create-role --role-name dataspoc-agent-reader \ --assume-role-policy-document file://trust-policy.json
aws iam attach-role-policy --role-name dataspoc-agent-reader \ --policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess2. Restrict to Specific Buckets
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": ["s3:GetObject", "s3:ListBucket"], "Resource": [ "arn:aws:s3:::company-analytics", "arn:aws:s3:::company-analytics/*" ] } ]}3. Configure the MCP Server
{ "mcpServers": { "dataspoc-lens": { "command": "dataspoc-lens", "args": ["mcp"], "env": { "DATASPOC_BUCKET": "s3://company-analytics", "AWS_PROFILE": "dataspoc-agent-reader" } } }}4. Enable Query Logging
# dataspoc configlogging: enabled: true destination: "s3://company-analytics/.dataspoc/logs/lens/" level: "all" # logs every query5. Review Logs Regularly
from dataspoc_lens import LensClient
lens = LensClient()
# Query the agent's own audit logsdf = lens.query(""" SELECT timestamp, query, tables_accessed, rows_returned FROM lens_audit_log WHERE timestamp >= CURRENT_DATE - INTERVAL '7 days' ORDER BY timestamp DESC""")print(df)The Bottom Line
Giving AI agents data access does not have to be scary. DataSpoc’s security model is simple:
- Read-only engine — writes are impossible at the SQL parser level
- Cloud IAM — no new credentials, same permissions as humans
- Bucket isolation — each team/agent sees only their data
- SQL audit trail — every query is logged and reviewable
The result: your security team gets the governance they need, and your data team gets AI agents that actually work. No compromise required.