sqltransformsdata-modelingdbt-alternativeanalytics

Building a Curated Data Layer with SQL Transforms (No dbt Required)

Michael San Martim · 2026-04-27

dbt is the standard for SQL-based data transformation. It is also a 50-file project with profiles, sources, models, tests, macros, packages, and a YAML file for every model. For a team with 5 tables that needs a star schema, that is overkill.

DataSpoc Lens has a simpler approach: numbered SQL files that run in order. Each file creates a table from a query. No YAML, no Jinja, no dependency graph to debug. Just SQL.

The Problem with dbt for Small Teams

To build a star schema from 3 source tables in dbt, you need:

my_dbt_project/
dbt_project.yml
profiles.yml
models/
staging/
stg_users.sql
stg_users.yml
stg_orders.sql
stg_orders.yml
stg_products.sql
stg_products.yml
marts/
dim_customers.sql
dim_customers.yml
fct_orders.sql
fct_orders.yml
mart_revenue.sql
mart_revenue.yml
tests/
...
macros/
...

That is 12+ files before you write a single business query. dbt’s power comes from dependency management, testing, and documentation — all valuable at scale. But for a startup with one analyst, it is a week of setup before any value is delivered.

Lens Transforms: Numbered SQL Files

In DataSpoc Lens, transforms are .sql files in a directory. They are numbered to define execution order:

transforms/
01_clean_users.sql
02_clean_orders.sql
03_join_order_details.sql
04_daily_revenue.sql
05_customer_dashboard.sql

Each file uses the CTAS (CREATE TABLE AS SELECT) pattern:

-- Creates a table in the curated layer
CREATE OR REPLACE TABLE curated.clean_users AS
SELECT ...

Run all transforms:

Terminal window
dataspoc-lens transform run

Lens executes them in numeric order: 01, 02, 03, 04, 05. Each transform can reference tables created by earlier transforms.

Building a Star Schema: 5 Transforms

Let us build a complete analytical layer from raw data. We start with three source tables from Pipe:

  • raw.users — user accounts
  • raw.orders — order transactions
  • raw.products — product catalog

Transform 1: Clean Users

-- transforms/01_clean_users.sql
-- Clean and standardize user data
CREATE OR REPLACE TABLE curated.dim_customers AS
SELECT
user_id,
LOWER(TRIM(email)) AS email,
COALESCE(name, 'Unknown') AS name,
CASE
WHEN plan IN ('free', 'starter') THEN 'free'
WHEN plan IN ('pro', 'professional') THEN 'pro'
WHEN plan IN ('business', 'enterprise') THEN 'enterprise'
ELSE 'unknown'
END AS plan_tier,
created_at::DATE AS signup_date,
DATE_DIFF('day', created_at, CURRENT_DATE) AS days_since_signup,
CASE
WHEN DATE_DIFF('day', created_at, CURRENT_DATE) <= 30 THEN 'new'
WHEN DATE_DIFF('day', created_at, CURRENT_DATE) <= 365 THEN 'active'
ELSE 'veteran'
END AS customer_cohort
FROM raw.users
WHERE email IS NOT NULL
AND email LIKE '%@%.%';

This handles the messy reality of raw data: inconsistent plan names, missing names, invalid emails.

Transform 2: Clean Orders

-- transforms/02_clean_orders.sql
-- Standardize orders, remove test data, add computed fields
CREATE OR REPLACE TABLE curated.fct_orders AS
SELECT
order_id,
user_id,
product_id,
amount,
quantity,
amount * quantity AS line_total,
discount_pct,
amount * quantity * (1 - COALESCE(discount_pct, 0) / 100.0) AS net_revenue,
status,
order_date,
DATE_TRUNC('month', order_date) AS order_month,
DATE_TRUNC('quarter', order_date) AS order_quarter,
DAYOFWEEK(order_date) AS day_of_week,
CASE
WHEN HOUR(created_at) BETWEEN 6 AND 11 THEN 'morning'
WHEN HOUR(created_at) BETWEEN 12 AND 17 THEN 'afternoon'
WHEN HOUR(created_at) BETWEEN 18 AND 22 THEN 'evening'
ELSE 'night'
END AS time_of_day
FROM raw.orders
WHERE status != 'test'
AND amount > 0
AND order_date >= DATE '2024-01-01';

Transform 3: Join Order Details

-- transforms/03_join_order_details.sql
-- Denormalized order fact with customer and product dimensions
CREATE OR REPLACE TABLE curated.order_details AS
SELECT
o.order_id,
o.order_date,
o.order_month,
o.net_revenue,
o.status,
o.time_of_day,
-- Customer dimensions
c.name AS customer_name,
c.email AS customer_email,
c.plan_tier,
c.customer_cohort,
c.signup_date,
-- Product dimensions
p.name AS product_name,
p.category AS product_category,
p.sku,
-- Calculated
DATE_DIFF('day', c.signup_date, o.order_date) AS days_to_purchase
FROM curated.fct_orders o
JOIN curated.dim_customers c ON o.user_id = c.user_id
JOIN raw.products p ON o.product_id = p.product_id
WHERE o.status = 'completed';

Transform 4: Daily Revenue Aggregation

-- transforms/04_daily_revenue.sql
-- Daily revenue metrics for dashboards
CREATE OR REPLACE TABLE gold.daily_revenue AS
SELECT
order_date,
COUNT(DISTINCT order_id) AS total_orders,
COUNT(DISTINCT customer_email) AS unique_customers,
SUM(net_revenue) AS revenue,
ROUND(AVG(net_revenue), 2) AS avg_order_value,
ROUND(MEDIAN(net_revenue), 2) AS median_order_value,
-- By plan tier
SUM(CASE WHEN plan_tier = 'free' THEN net_revenue ELSE 0 END) AS free_revenue,
SUM(CASE WHEN plan_tier = 'pro' THEN net_revenue ELSE 0 END) AS pro_revenue,
SUM(CASE WHEN plan_tier = 'enterprise' THEN net_revenue ELSE 0 END) AS enterprise_revenue,
-- By category
SUM(CASE WHEN product_category = 'software' THEN net_revenue ELSE 0 END) AS software_revenue,
SUM(CASE WHEN product_category = 'services' THEN net_revenue ELSE 0 END) AS services_revenue,
-- Running totals (filled in by window)
SUM(SUM(net_revenue)) OVER (
ORDER BY order_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS revenue_7d_rolling
FROM curated.order_details
GROUP BY order_date
ORDER BY order_date;

Transform 5: Customer Dashboard View

-- transforms/05_customer_dashboard.sql
-- Per-customer summary for the analytics dashboard
CREATE OR REPLACE TABLE gold.customer_dashboard AS
SELECT
customer_email,
customer_name,
plan_tier,
customer_cohort,
signup_date,
COUNT(*) AS total_orders,
SUM(net_revenue) AS lifetime_value,
ROUND(AVG(net_revenue), 2) AS avg_order_value,
MIN(order_date) AS first_order,
MAX(order_date) AS last_order,
DATE_DIFF('day', MAX(order_date), CURRENT_DATE) AS days_since_last_order,
CASE
WHEN DATE_DIFF('day', MAX(order_date), CURRENT_DATE) <= 30 THEN 'active'
WHEN DATE_DIFF('day', MAX(order_date), CURRENT_DATE) <= 90 THEN 'at_risk'
ELSE 'churned'
END AS activity_status,
-- Most purchased category
MODE(product_category) AS favorite_category
FROM curated.order_details
GROUP BY 1, 2, 3, 4, 5;

Running the Transforms

Terminal window
# Run all transforms in order
dataspoc-lens transform run
# Run a specific transform
dataspoc-lens transform run --file 04_daily_revenue.sql
# Dry run: show the SQL without executing
dataspoc-lens transform run --dry-run

Output:

[10:15:01] Running 5 transforms...
[10:15:01] 01_clean_users.sql → curated.dim_customers (12,340 rows)
[10:15:02] 02_clean_orders.sql → curated.fct_orders (87,650 rows)
[10:15:03] 03_join_order_details.sql → curated.order_details (85,200 rows)
[10:15:04] 04_daily_revenue.sql → gold.daily_revenue (480 rows)
[10:15:04] 05_customer_dashboard.sql → gold.customer_dashboard (11,890 rows)
[10:15:04] All transforms complete (3.2s)

Query the Results

Terminal window
# Revenue trends
dataspoc-lens query "
SELECT order_date, revenue, revenue_7d_rolling
FROM gold.daily_revenue
WHERE order_date >= DATE '2026-04-01'
ORDER BY order_date
"
# At-risk customers
dataspoc-lens query "
SELECT customer_name, lifetime_value, days_since_last_order
FROM gold.customer_dashboard
WHERE activity_status = 'at_risk'
ORDER BY lifetime_value DESC
LIMIT 20
"
# Or use AI
dataspoc-lens ask "Which product categories are growing fastest this quarter?"

Lens Transforms vs. dbt: When to Use Each

AspectLens Transformsdbt
Setup time0 (create .sql files)Hours (project, profiles, packages)
Dependency managementNumeric orderingDAG from ref()
TestingManual SQL checksBuilt-in tests
DocumentationComments in SQLYAML + auto-generated docs
Incremental modelsFull refreshIncremental materialization
CI/CDRun in any scriptdbt Cloud, GitHub Actions
Learning curveKnow SQL = doneSQL + Jinja + YAML + dbt concepts

When Lens Transforms Are Better

  • You have fewer than 20 transforms
  • One or two people maintain the pipeline
  • Dependencies are simple and linear
  • You want to start producing value today, not next week
  • You do not need incremental materialization (your data fits in memory)

When dbt Is Better

  • You have 100+ models with complex dependencies
  • Multiple teams contribute transforms
  • You need incremental models (billions of rows)
  • You need built-in testing and documentation
  • You have a dedicated analytics engineering team

Scheduling Transforms

Pair transforms with Pipe extraction in a cron job:

Terminal window
# Run every hour: extract then transform
0 * * * * dataspoc-pipe run postgres-production && dataspoc-lens transform run

Or in a shell script with error handling:

#!/bin/bash
# etl.sh - Extract and transform
set -e
echo "[$(date)] Starting ETL..."
# Extract
dataspoc-pipe run postgres-production
echo "[$(date)] Extraction complete"
# Transform
dataspoc-lens transform run
echo "[$(date)] Transforms complete"
# Refresh cache for fast queries
dataspoc-lens cache refresh-stale
echo "[$(date)] Cache refreshed"
echo "[$(date)] ETL complete"

Five SQL files. No Jinja. No YAML. No dependency graph. Just numbered transforms that build a clean analytical layer from raw data. When your needs outgrow this, graduate to dbt. Until then, ship value today.

Recommended