Quickstart
This guide takes you from zero to querying your data lake in five minutes.
1. Install
Section titled “1. Install”pip install dataspoc-pipe dataspoc-lens2. Initialize a Pipe project
Section titled “2. Initialize a Pipe project”dataspoc-pipe initThis creates a .dataspoc/ directory in your bucket with the manifest and state tracking.
3. Add a data source
Section titled “3. Add a data source”dataspoc-pipe add my-sourceThe interactive wizard walks you through:
- Source type (database, API, file, etc.)
- Connection details (host, credentials via env vars)
- Tables or endpoints to extract
- Destination bucket path
- Sync mode (full or incremental)
4. Run the pipeline
Section titled “4. Run the pipeline”dataspoc-pipe run my-sourcePipe extracts data from your source, converts it to Parquet, and writes it to your bucket under raw/my-source/<table>/.
5. Connect Lens to the bucket
Section titled “5. Connect Lens to the bucket”dataspoc-lens add-bucket s3://my-dataLens reads the manifest and discovers all tables Pipe has written.
6. Query with SQL
Section titled “6. Query with SQL”dataspoc-lens shellSELECT customer_name, SUM(revenue) as totalFROM raw.my_source.ordersGROUP BY customer_nameORDER BY total DESCLIMIT 10;7. Ask in natural language
Section titled “7. Ask in natural language”dataspoc-lens ask "top customers by revenue"Lens translates your question into SQL, runs it, and returns the result.
Local Testing with CSV
Section titled “Local Testing with CSV”You do not need a cloud bucket to get started. Use local files with file:// URIs:
# Create a sample CSVmkdir -p /tmp/my-lakecat > /tmp/sales.csv << 'EOF'date,customer,product,revenue2025-01-15,Acme Corp,Widget Pro,150002025-01-15,Globex Inc,Widget Basic,85002025-01-16,Acme Corp,Widget Basic,42002025-01-16,Initech,Widget Pro,120002025-01-17,Globex Inc,Widget Pro,19500EOF
# Initialize and ingestdataspoc-pipe init --bucket file:///tmp/my-lakedataspoc-pipe add local-csv --source-type file --path /tmp/sales.csvdataspoc-pipe run local-csv
# Querydataspoc-lens add-bucket file:///tmp/my-lakedataspoc-lens shellSELECT customer, SUM(revenue) as total_revenueFROM raw.local_csv.salesGROUP BY customerORDER BY total_revenue DESC;Next Steps
Section titled “Next Steps”- Architecture --- Understand the bucket contract
- AI Agent Integration --- Connect your AI agent to the data lake
- MCP Setup --- Use DataSpoc from Claude or Cursor