Blog

Tutorials, guides, and deep dives on data engineering, AI agents, and the DataSpoc platform.

data-mesharchitecturedata-engineering

Data Mesh Without the Complexity: One Bucket Per Team

Implement data mesh with DataSpoc. Each team gets their own bucket, their own pipelines, their own AI agent. No central bottleneck. No platform team required.

2026-04-29 · Michael San Martim
medalliondata-lakearchitecture

Building a Medallion Architecture Data Lake with DataSpoc

Bronze, Silver, Gold — implement the Databricks medallion pattern using DataSpoc Pipe and Lens. No Spark, no cluster, no cost.

2026-04-29 · Michael San Martim
ai-agentsagent-mdmcp

The AGENT.md Pattern: Teaching AI Agents to Use Your Tools

How a single Markdown file turns any AI agent into a power user of your data platform.

2026-04-28 · Michael San Martim
startupdata-lakecost-reduction

The $0 Data Lake for Startups: DataSpoc + S3 in 30 Minutes

A complete data platform for early-stage startups. Ingest from Postgres and Stripe, query with AI, connect agents.

2026-04-28 · Michael San Martim
airflowetlyaml

10 Lines of YAML vs 200 Lines of Airflow DAG

Side-by-side comparison of building the same ETL pipeline with Airflow and DataSpoc Pipe.

2026-04-28 · Michael San Martim
multi-clouds3gcs

One Data Lake, Three Clouds: Multi-Cloud Analytics with DataSpoc

Register S3, GCS, and Azure buckets in DataSpoc Lens and query across all of them with one SQL statement.

2026-04-28 · Michael San Martim
ollamalocal-aiprivacy

Query Your Data Lake with AI for Free: Ollama + DataSpoc Lens

No API key needed. Run AI queries on your data lake using a local LLM with Ollama.

2026-04-28 · Michael San Martim
windsurfmcpai-agents

Data Analysis from Windsurf IDE with DataSpoc MCP

Connect Windsurf to your data lake. Query, analyze, and build reports without leaving your editor.

2026-04-28 · Michael San Martim
jupyternotebookduckdb

Jupyter Notebook on Your Data Lake in 60 Seconds

Launch JupyterLab with all your cloud Parquet tables pre-mounted. Zero configuration.

2026-04-27 · Michael San Martim
incrementalextractionsinger

Incremental Extraction Patterns: Never Re-Extract Old Data Again

How DataSpoc Pipe uses Singer bookmarks to only fetch new data. Save time, bandwidth, and cloud costs.

2026-04-27 · Michael San Martim
sqltransformsdata-modeling

Building a Curated Data Layer with SQL Transforms (No dbt Required)

Numbered SQL files in DataSpoc Lens replace dbt for simple data transformation workflows.

2026-04-27 · Michael San Martim
google-sheetsno-codedata-lake

From Google Sheets to a Queryable Data Lake in 3 Commands

Turn any public Google Sheets spreadsheet into Parquet files you can query with SQL.

2026-04-26 · Michael San Martim
marimonotebookreactive

Reactive Data Analysis with Marimo and DataSpoc Lens

Marimo notebooks + DuckDB on your data lake. Cells update automatically when you change a query.

2026-04-26 · Michael San Martim
mongodbmigrationparquet

MongoDB to Parquet: Build a Data Lake from Your MongoDB in 15 Minutes

Use DataSpoc Pipe with tap-mongodb to extract collections and convert them to queryable Parquet files.

2026-04-26 · Michael San Martim
anthropicclaudepython

Building a Data Query Agent with Anthropic Claude SDK and DataSpoc

Use Claude's tool_use feature to let the model query your data lake with real SQL.

2026-04-25 · Michael San Martim
crewaimulti-agentai-agents

Building a Multi-Agent Data Team with CrewAI and DataSpoc

A DE agent ingests data, a DA agent analyzes it, an ML agent trains models — all powered by DataSpoc.

2026-04-25 · Michael San Martim
llamaindexai-agentspython

LlamaIndex + DataSpoc: Query Your Data Lake Without Embeddings

Use LlamaIndex's tool-calling agents with DataSpoc Lens SDK for accurate, grounded data queries.

2026-04-25 · Michael San Martim
duckdbsparkcomparison

DuckDB vs Spark for Data Lake Queries: When Each Wins

DuckDB powers DataSpoc Lens. Spark powers Databricks. Here's when to use each — with benchmarks.

2026-04-24 · Michael San Martim
mcpclaudeai-agents

How to Build an MCP Server for Your Data Lake

Turn your cloud Parquet files into an API for AI agents with DataSpoc Lens MCP.

2026-04-24 · Michael San Martim
cursormcpdata-analysis

Analyze Your Data Lake from Cursor IDE with DataSpoc MCP

Connect Cursor to your data lake via MCP. Query tables, explore schemas, and build reports without leaving your editor.

2026-04-23 · Michael San Martim
duckdbsnowflakecost-reduction

We Replaced Our $8k/month Snowflake with DuckDB and Parquet

How DataSpoc Lens gives you a virtual warehouse over cloud Parquet — at zero cost.

2026-04-23 · Michael San Martim
fivetranmigrationdata-engineering

Migrating from Fivetran to DataSpoc Pipe: A Step-by-Step Guide

Save $2k/month by replacing Fivetran with open-source DataSpoc Pipe. Same sources, same destinations, zero cost.

2026-04-22 · Michael San Martim
ragmcpai-agents

RAG vs SQL: Why Your AI Agent Should Query, Not Embed

RAG hallucinates on structured data. Here's why MCP + SQL is more accurate, faster, and cheaper.

2026-04-22 · Michael San Martim
data-governancesecurityai-agents

Data Governance for AI Agents: How DataSpoc Keeps Your Lake Secure

Read-only access, cloud IAM enforcement, and audit trails — how to let AI agents query without risk.

2026-04-21 · Michael San Martim
ai-agentslanggraphpython

Building a Data Analyst Agent with LangGraph and DataSpoc Lens

A LangGraph agent that discovers tables, writes SQL, and answers business questions using DataSpoc Lens SDK.

2026-04-21 · Michael San Martim
data-engineeringetlpostgresql

Postgres to S3 in 5 Minutes with DataSpoc Pipe

Replace your Airflow DAGs with one CLI command. Incremental extraction from PostgreSQL to Parquet in S3.

2026-04-20 · Michael San Martim
rest-apietlpipeline

Ingest Any REST API to Parquet in 10 Minutes

Use DataSpoc Pipe with tap-rest-api to pull data from any API endpoint into your data lake.

2026-04-20 · Michael San Martim
autogenmulti-agentanalytics

Multi-Agent Analytics with AutoGen and DataSpoc

Build an AutoGen team where agents collaborate to analyze your data lake — one queries, one visualizes, one reports.

2026-04-19 · Michael San Martim
claude-codemcpdata-engineering

Using Claude Code as Your Data Engineer: MCP + DataSpoc

Configure Claude Code to ingest data, query your lake, and debug pipelines — all from the terminal.

2026-04-18 · Michael San Martim
ragparquetduckdb

RAG on Parquet: How to Build Retrieval Over Your Data Lake Without Embeddings

Skip the vector store. Use DataSpoc Lens to give your LLM direct SQL access to Parquet files.

2026-04-17 · Michael San Martim
langchainsqlpython

LangChain SQLDatabaseChain vs DataSpoc Lens: Which Is Better for Data Queries?

Side-by-side comparison of querying databases with LangChain's SQL chain versus DataSpoc Lens SDK and MCP.

2026-04-16 · Michael San Martim
openaiai-agentspython

Building a Data Lake Agent with OpenAI Function Calling and DataSpoc

Use OpenAI's function calling to let GPT-4 query your data lake with real SQL via DataSpoc Lens.

2026-04-15 · Michael San Martim