Building a Curated Data Layer with SQL Transforms (No dbt Required)
dbt is the standard for SQL-based data transformation. It is also a 50-file project with profiles, sources, models, tests, macros, packages, and a YAML file for every model. For a team with 5 tables that needs a star schema, that is overkill.
DataSpoc Lens has a simpler approach: numbered SQL files that run in order. Each file creates a table from a query. No YAML, no Jinja, no dependency graph to debug. Just SQL.
The Problem with dbt for Small Teams
To build a star schema from 3 source tables in dbt, you need:
my_dbt_project/ dbt_project.yml profiles.yml models/ staging/ stg_users.sql stg_users.yml stg_orders.sql stg_orders.yml stg_products.sql stg_products.yml marts/ dim_customers.sql dim_customers.yml fct_orders.sql fct_orders.yml mart_revenue.sql mart_revenue.yml tests/ ... macros/ ...That is 12+ files before you write a single business query. dbt’s power comes from dependency management, testing, and documentation — all valuable at scale. But for a startup with one analyst, it is a week of setup before any value is delivered.
Lens Transforms: Numbered SQL Files
In DataSpoc Lens, transforms are .sql files in a directory. They are numbered to define execution order:
transforms/ 01_clean_users.sql 02_clean_orders.sql 03_join_order_details.sql 04_daily_revenue.sql 05_customer_dashboard.sqlEach file uses the CTAS (CREATE TABLE AS SELECT) pattern:
-- Creates a table in the curated layerCREATE OR REPLACE TABLE curated.clean_users ASSELECT ...Run all transforms:
dataspoc-lens transform runLens executes them in numeric order: 01, 02, 03, 04, 05. Each transform can reference tables created by earlier transforms.
Building a Star Schema: 5 Transforms
Let us build a complete analytical layer from raw data. We start with three source tables from Pipe:
raw.users— user accountsraw.orders— order transactionsraw.products— product catalog
Transform 1: Clean Users
-- transforms/01_clean_users.sql-- Clean and standardize user data
CREATE OR REPLACE TABLE curated.dim_customers ASSELECT user_id, LOWER(TRIM(email)) AS email, COALESCE(name, 'Unknown') AS name, CASE WHEN plan IN ('free', 'starter') THEN 'free' WHEN plan IN ('pro', 'professional') THEN 'pro' WHEN plan IN ('business', 'enterprise') THEN 'enterprise' ELSE 'unknown' END AS plan_tier, created_at::DATE AS signup_date, DATE_DIFF('day', created_at, CURRENT_DATE) AS days_since_signup, CASE WHEN DATE_DIFF('day', created_at, CURRENT_DATE) <= 30 THEN 'new' WHEN DATE_DIFF('day', created_at, CURRENT_DATE) <= 365 THEN 'active' ELSE 'veteran' END AS customer_cohortFROM raw.usersWHERE email IS NOT NULL AND email LIKE '%@%.%';This handles the messy reality of raw data: inconsistent plan names, missing names, invalid emails.
Transform 2: Clean Orders
-- transforms/02_clean_orders.sql-- Standardize orders, remove test data, add computed fields
CREATE OR REPLACE TABLE curated.fct_orders ASSELECT order_id, user_id, product_id, amount, quantity, amount * quantity AS line_total, discount_pct, amount * quantity * (1 - COALESCE(discount_pct, 0) / 100.0) AS net_revenue, status, order_date, DATE_TRUNC('month', order_date) AS order_month, DATE_TRUNC('quarter', order_date) AS order_quarter, DAYOFWEEK(order_date) AS day_of_week, CASE WHEN HOUR(created_at) BETWEEN 6 AND 11 THEN 'morning' WHEN HOUR(created_at) BETWEEN 12 AND 17 THEN 'afternoon' WHEN HOUR(created_at) BETWEEN 18 AND 22 THEN 'evening' ELSE 'night' END AS time_of_dayFROM raw.ordersWHERE status != 'test' AND amount > 0 AND order_date >= DATE '2024-01-01';Transform 3: Join Order Details
-- transforms/03_join_order_details.sql-- Denormalized order fact with customer and product dimensions
CREATE OR REPLACE TABLE curated.order_details ASSELECT o.order_id, o.order_date, o.order_month, o.net_revenue, o.status, o.time_of_day,
-- Customer dimensions c.name AS customer_name, c.email AS customer_email, c.plan_tier, c.customer_cohort, c.signup_date,
-- Product dimensions p.name AS product_name, p.category AS product_category, p.sku,
-- Calculated DATE_DIFF('day', c.signup_date, o.order_date) AS days_to_purchaseFROM curated.fct_orders oJOIN curated.dim_customers c ON o.user_id = c.user_idJOIN raw.products p ON o.product_id = p.product_idWHERE o.status = 'completed';Transform 4: Daily Revenue Aggregation
-- transforms/04_daily_revenue.sql-- Daily revenue metrics for dashboards
CREATE OR REPLACE TABLE gold.daily_revenue ASSELECT order_date, COUNT(DISTINCT order_id) AS total_orders, COUNT(DISTINCT customer_email) AS unique_customers, SUM(net_revenue) AS revenue, ROUND(AVG(net_revenue), 2) AS avg_order_value, ROUND(MEDIAN(net_revenue), 2) AS median_order_value,
-- By plan tier SUM(CASE WHEN plan_tier = 'free' THEN net_revenue ELSE 0 END) AS free_revenue, SUM(CASE WHEN plan_tier = 'pro' THEN net_revenue ELSE 0 END) AS pro_revenue, SUM(CASE WHEN plan_tier = 'enterprise' THEN net_revenue ELSE 0 END) AS enterprise_revenue,
-- By category SUM(CASE WHEN product_category = 'software' THEN net_revenue ELSE 0 END) AS software_revenue, SUM(CASE WHEN product_category = 'services' THEN net_revenue ELSE 0 END) AS services_revenue,
-- Running totals (filled in by window) SUM(SUM(net_revenue)) OVER ( ORDER BY order_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ) AS revenue_7d_rollingFROM curated.order_detailsGROUP BY order_dateORDER BY order_date;Transform 5: Customer Dashboard View
-- transforms/05_customer_dashboard.sql-- Per-customer summary for the analytics dashboard
CREATE OR REPLACE TABLE gold.customer_dashboard ASSELECT customer_email, customer_name, plan_tier, customer_cohort, signup_date,
COUNT(*) AS total_orders, SUM(net_revenue) AS lifetime_value, ROUND(AVG(net_revenue), 2) AS avg_order_value, MIN(order_date) AS first_order, MAX(order_date) AS last_order, DATE_DIFF('day', MAX(order_date), CURRENT_DATE) AS days_since_last_order,
CASE WHEN DATE_DIFF('day', MAX(order_date), CURRENT_DATE) <= 30 THEN 'active' WHEN DATE_DIFF('day', MAX(order_date), CURRENT_DATE) <= 90 THEN 'at_risk' ELSE 'churned' END AS activity_status,
-- Most purchased category MODE(product_category) AS favorite_categoryFROM curated.order_detailsGROUP BY 1, 2, 3, 4, 5;Running the Transforms
# Run all transforms in orderdataspoc-lens transform run
# Run a specific transformdataspoc-lens transform run --file 04_daily_revenue.sql
# Dry run: show the SQL without executingdataspoc-lens transform run --dry-runOutput:
[10:15:01] Running 5 transforms...[10:15:01] 01_clean_users.sql → curated.dim_customers (12,340 rows)[10:15:02] 02_clean_orders.sql → curated.fct_orders (87,650 rows)[10:15:03] 03_join_order_details.sql → curated.order_details (85,200 rows)[10:15:04] 04_daily_revenue.sql → gold.daily_revenue (480 rows)[10:15:04] 05_customer_dashboard.sql → gold.customer_dashboard (11,890 rows)[10:15:04] All transforms complete (3.2s)Query the Results
# Revenue trendsdataspoc-lens query " SELECT order_date, revenue, revenue_7d_rolling FROM gold.daily_revenue WHERE order_date >= DATE '2026-04-01' ORDER BY order_date"
# At-risk customersdataspoc-lens query " SELECT customer_name, lifetime_value, days_since_last_order FROM gold.customer_dashboard WHERE activity_status = 'at_risk' ORDER BY lifetime_value DESC LIMIT 20"
# Or use AIdataspoc-lens ask "Which product categories are growing fastest this quarter?"Lens Transforms vs. dbt: When to Use Each
| Aspect | Lens Transforms | dbt |
|---|---|---|
| Setup time | 0 (create .sql files) | Hours (project, profiles, packages) |
| Dependency management | Numeric ordering | DAG from ref() |
| Testing | Manual SQL checks | Built-in tests |
| Documentation | Comments in SQL | YAML + auto-generated docs |
| Incremental models | Full refresh | Incremental materialization |
| CI/CD | Run in any script | dbt Cloud, GitHub Actions |
| Learning curve | Know SQL = done | SQL + Jinja + YAML + dbt concepts |
When Lens Transforms Are Better
- You have fewer than 20 transforms
- One or two people maintain the pipeline
- Dependencies are simple and linear
- You want to start producing value today, not next week
- You do not need incremental materialization (your data fits in memory)
When dbt Is Better
- You have 100+ models with complex dependencies
- Multiple teams contribute transforms
- You need incremental models (billions of rows)
- You need built-in testing and documentation
- You have a dedicated analytics engineering team
Scheduling Transforms
Pair transforms with Pipe extraction in a cron job:
# Run every hour: extract then transform0 * * * * dataspoc-pipe run postgres-production && dataspoc-lens transform runOr in a shell script with error handling:
#!/bin/bash# etl.sh - Extract and transform
set -e
echo "[$(date)] Starting ETL..."
# Extractdataspoc-pipe run postgres-productionecho "[$(date)] Extraction complete"
# Transformdataspoc-lens transform runecho "[$(date)] Transforms complete"
# Refresh cache for fast queriesdataspoc-lens cache refresh-staleecho "[$(date)] Cache refreshed"
echo "[$(date)] ETL complete"Five SQL files. No Jinja. No YAML. No dependency graph. Just numbered transforms that build a clean analytical layer from raw data. When your needs outgrow this, graduate to dbt. Until then, ship value today.