sqltransformsdata-modelingdbt-alternativeanalytics

Building a Curated Data Layer with SQL Transforms (No dbt Required)

Michael San Martim · 2026-04-27

dbt is the standard for SQL-based data transformation. It is also a 50-file project with profiles, sources, models, tests, macros, packages, and a YAML file for every model. For a team with 5 tables that needs a star schema, that is overkill.

DataSpoc Lens has a simpler approach: numbered SQL files that run in order. Each file creates a table from a query. No YAML, no Jinja, no dependency graph to debug. Just SQL.

The Problem with dbt for Small Teams

To build a star schema from 3 source tables in dbt, you need:

my_dbt_project/
  dbt_project.yml
  profiles.yml
  models/
    staging/
      stg_users.sql
      stg_users.yml
      stg_orders.sql
      stg_orders.yml
      stg_products.sql
      stg_products.yml
    marts/
      dim_customers.sql
      dim_customers.yml
      fct_orders.sql
      fct_orders.yml
      mart_revenue.sql
      mart_revenue.yml
  tests/
    ...
  macros/
    ...

That is 12+ files before you write a single business query. dbt’s power comes from dependency management, testing, and documentation — all valuable at scale. But for a startup with one analyst, it is a week of setup before any value is delivered.

Lens Transforms: Numbered SQL Files

In DataSpoc Lens, transforms are .sql files in a directory. They are numbered to define execution order:

transforms/
  01_clean_users.sql
  02_clean_orders.sql
  03_join_order_details.sql
  04_daily_revenue.sql
  05_customer_dashboard.sql

Each file uses the CTAS (CREATE TABLE AS SELECT) pattern:

-- Creates a table in the curated layer
CREATE OR REPLACE TABLE curated.clean_users AS
SELECT ...

Run all transforms:

dataspoc-lens transform run

Lens executes them in numeric order: 01, 02, 03, 04, 05. Each transform can reference tables created by earlier transforms.

Building a Star Schema: 5 Transforms

Let us build a complete analytical layer from raw data. We start with three source tables from Pipe:

raw.users — user accounts
raw.orders — order transactions
raw.products — product catalog

Transform 1: Clean Users

-- transforms/01_clean_users.sql
-- Clean and standardize user data

CREATE OR REPLACE TABLE curated.dim_customers AS
SELECT
    user_id,
    LOWER(TRIM(email)) AS email,
    COALESCE(name, 'Unknown') AS name,
    CASE
        WHEN plan IN ('free', 'starter') THEN 'free'
        WHEN plan IN ('pro', 'professional') THEN 'pro'
        WHEN plan IN ('business', 'enterprise') THEN 'enterprise'
        ELSE 'unknown'
    END AS plan_tier,
    created_at::DATE AS signup_date,
    DATE_DIFF('day', created_at, CURRENT_DATE) AS days_since_signup,
    CASE
        WHEN DATE_DIFF('day', created_at, CURRENT_DATE) <= 30 THEN 'new'
        WHEN DATE_DIFF('day', created_at, CURRENT_DATE) <= 365 THEN 'active'
        ELSE 'veteran'
    END AS customer_cohort
FROM raw.users
WHERE email IS NOT NULL
  AND email LIKE '%@%.%';

This handles the messy reality of raw data: inconsistent plan names, missing names, invalid emails.

Transform 2: Clean Orders

-- transforms/02_clean_orders.sql
-- Standardize orders, remove test data, add computed fields

CREATE OR REPLACE TABLE curated.fct_orders AS
SELECT
    order_id,
    user_id,
    product_id,
    amount,
    quantity,
    amount * quantity AS line_total,
    discount_pct,
    amount * quantity * (1 - COALESCE(discount_pct, 0) / 100.0) AS net_revenue,
    status,
    order_date,
    DATE_TRUNC('month', order_date) AS order_month,
    DATE_TRUNC('quarter', order_date) AS order_quarter,
    DAYOFWEEK(order_date) AS day_of_week,
    CASE
        WHEN HOUR(created_at) BETWEEN 6 AND 11 THEN 'morning'
        WHEN HOUR(created_at) BETWEEN 12 AND 17 THEN 'afternoon'
        WHEN HOUR(created_at) BETWEEN 18 AND 22 THEN 'evening'
        ELSE 'night'
    END AS time_of_day
FROM raw.orders
WHERE status != 'test'
  AND amount > 0
  AND order_date >= DATE '2024-01-01';

Transform 3: Join Order Details

-- transforms/03_join_order_details.sql
-- Denormalized order fact with customer and product dimensions

CREATE OR REPLACE TABLE curated.order_details AS
SELECT
    o.order_id,
    o.order_date,
    o.order_month,
    o.net_revenue,
    o.status,
    o.time_of_day,

    -- Customer dimensions
    c.name AS customer_name,
    c.email AS customer_email,
    c.plan_tier,
    c.customer_cohort,
    c.signup_date,

    -- Product dimensions
    p.name AS product_name,
    p.category AS product_category,
    p.sku,

    -- Calculated
    DATE_DIFF('day', c.signup_date, o.order_date) AS days_to_purchase
FROM curated.fct_orders o
JOIN curated.dim_customers c ON o.user_id = c.user_id
JOIN raw.products p ON o.product_id = p.product_id
WHERE o.status = 'completed';

Transform 4: Daily Revenue Aggregation

-- transforms/04_daily_revenue.sql
-- Daily revenue metrics for dashboards

CREATE OR REPLACE TABLE gold.daily_revenue AS
SELECT
    order_date,
    COUNT(DISTINCT order_id) AS total_orders,
    COUNT(DISTINCT customer_email) AS unique_customers,
    SUM(net_revenue) AS revenue,
    ROUND(AVG(net_revenue), 2) AS avg_order_value,
    ROUND(MEDIAN(net_revenue), 2) AS median_order_value,

    -- By plan tier
    SUM(CASE WHEN plan_tier = 'free' THEN net_revenue ELSE 0 END) AS free_revenue,
    SUM(CASE WHEN plan_tier = 'pro' THEN net_revenue ELSE 0 END) AS pro_revenue,
    SUM(CASE WHEN plan_tier = 'enterprise' THEN net_revenue ELSE 0 END) AS enterprise_revenue,

    -- By category
    SUM(CASE WHEN product_category = 'software' THEN net_revenue ELSE 0 END) AS software_revenue,
    SUM(CASE WHEN product_category = 'services' THEN net_revenue ELSE 0 END) AS services_revenue,

    -- Running totals (filled in by window)
    SUM(SUM(net_revenue)) OVER (
        ORDER BY order_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS revenue_7d_rolling
FROM curated.order_details
GROUP BY order_date
ORDER BY order_date;

Transform 5: Customer Dashboard View

-- transforms/05_customer_dashboard.sql
-- Per-customer summary for the analytics dashboard

CREATE OR REPLACE TABLE gold.customer_dashboard AS
SELECT
    customer_email,
    customer_name,
    plan_tier,
    customer_cohort,
    signup_date,

    COUNT(*) AS total_orders,
    SUM(net_revenue) AS lifetime_value,
    ROUND(AVG(net_revenue), 2) AS avg_order_value,
    MIN(order_date) AS first_order,
    MAX(order_date) AS last_order,
    DATE_DIFF('day', MAX(order_date), CURRENT_DATE) AS days_since_last_order,

    CASE
        WHEN DATE_DIFF('day', MAX(order_date), CURRENT_DATE) <= 30 THEN 'active'
        WHEN DATE_DIFF('day', MAX(order_date), CURRENT_DATE) <= 90 THEN 'at_risk'
        ELSE 'churned'
    END AS activity_status,

    -- Most purchased category
    MODE(product_category) AS favorite_category
FROM curated.order_details
GROUP BY 1, 2, 3, 4, 5;

Running the Transforms

# Run all transforms in order
dataspoc-lens transform run

# Run a specific transform
dataspoc-lens transform run --file 04_daily_revenue.sql

# Dry run: show the SQL without executing
dataspoc-lens transform run --dry-run

Output:

[10:15:01] Running 5 transforms...
[10:15:01] 01_clean_users.sql → curated.dim_customers (12,340 rows)
[10:15:02] 02_clean_orders.sql → curated.fct_orders (87,650 rows)
[10:15:03] 03_join_order_details.sql → curated.order_details (85,200 rows)
[10:15:04] 04_daily_revenue.sql → gold.daily_revenue (480 rows)
[10:15:04] 05_customer_dashboard.sql → gold.customer_dashboard (11,890 rows)
[10:15:04] All transforms complete (3.2s)

Query the Results

# Revenue trends
dataspoc-lens query "
  SELECT order_date, revenue, revenue_7d_rolling
  FROM gold.daily_revenue
  WHERE order_date >= DATE '2026-04-01'
  ORDER BY order_date
"

# At-risk customers
dataspoc-lens query "
  SELECT customer_name, lifetime_value, days_since_last_order
  FROM gold.customer_dashboard
  WHERE activity_status = 'at_risk'
  ORDER BY lifetime_value DESC
  LIMIT 20
"

# Or use AI
dataspoc-lens ask "Which product categories are growing fastest this quarter?"

Lens Transforms vs. dbt: When to Use Each

Aspect	Lens Transforms	dbt
Setup time	0 (create .sql files)	Hours (project, profiles, packages)
Dependency management	Numeric ordering	DAG from `ref()`
Testing	Manual SQL checks	Built-in tests
Documentation	Comments in SQL	YAML + auto-generated docs
Incremental models	Full refresh	Incremental materialization
CI/CD	Run in any script	dbt Cloud, GitHub Actions
Learning curve	Know SQL = done	SQL + Jinja + YAML + dbt concepts

When Lens Transforms Are Better

You have fewer than 20 transforms
One or two people maintain the pipeline
Dependencies are simple and linear
You want to start producing value today, not next week
You do not need incremental materialization (your data fits in memory)

When dbt Is Better

You have 100+ models with complex dependencies
Multiple teams contribute transforms
You need incremental models (billions of rows)
You need built-in testing and documentation
You have a dedicated analytics engineering team

Scheduling Transforms

Pair transforms with Pipe extraction in a cron job:

# Run every hour: extract then transform
0 * * * * dataspoc-pipe run postgres-production && dataspoc-lens transform run

Or in a shell script with error handling:

#!/bin/bash
# etl.sh - Extract and transform

set -e

echo "[$(date)] Starting ETL..."

# Extract
dataspoc-pipe run postgres-production
echo "[$(date)] Extraction complete"

# Transform
dataspoc-lens transform run
echo "[$(date)] Transforms complete"

# Refresh cache for fast queries
dataspoc-lens cache refresh-stale
echo "[$(date)] Cache refreshed"

echo "[$(date)] ETL complete"

Five SQL files. No Jinja. No YAML. No dependency graph. Just numbered transforms that build a clean analytical layer from raw data. When your needs outgrow this, graduate to dbt. Until then, ship value today.

← Back to blog