Skip to content

SQL Transforms

SQL transforms let you build curated and aggregated datasets from your raw data lake tables using simple numbered SQL files.

Place .sql files in ~/.dataspoc-lens/transforms/. Each file contains a CREATE OR REPLACE TABLE ... AS SELECT ... (CTAS) statement. Files are executed in alphabetical order, so number them to control the sequence.

~/.dataspoc-lens/
transforms/
001_clean_users.sql
002_aggregate_orders.sql
003_customer_360.sql

Each transform file should contain a single CTAS statement. The created table becomes available to subsequent transforms and queries.

001_clean_users.sql:

CREATE OR REPLACE TABLE clean_users AS
SELECT
id,
TRIM(LOWER(email)) AS email,
COALESCE(name, 'Unknown') AS name,
created_at,
CASE
WHEN status IN ('active', 'Active', 'ACTIVE') THEN 'active'
WHEN status IN ('inactive', 'Inactive') THEN 'inactive'
ELSE 'unknown'
END AS status
FROM customers
WHERE email IS NOT NULL;

002_aggregate_orders.sql:

CREATE OR REPLACE TABLE order_summary AS
SELECT
customer_id,
COUNT(*) AS total_orders,
SUM(total) AS total_spent,
AVG(total) AS avg_order_value,
MIN(order_date) AS first_order,
MAX(order_date) AS last_order
FROM orders
WHERE status = 'completed'
GROUP BY customer_id;

003_customer_360.sql:

CREATE OR REPLACE TABLE customer_360 AS
SELECT
u.id,
u.name,
u.email,
u.status AS user_status,
o.total_orders,
o.total_spent,
o.avg_order_value,
o.first_order,
o.last_order,
DATEDIFF('day', o.last_order, CURRENT_DATE) AS days_since_last_order
FROM clean_users u
LEFT JOIN order_summary o ON u.id = o.customer_id;
Terminal window
dataspoc-lens transform list
┌───┬───────────────────────────┐
│ # │ File │
├───┼───────────────────────────┤
│ 1 │ 001_clean_users.sql │
│ 2 │ 002_aggregate_orders.sql │
│ 3 │ 003_customer_360.sql │
└───┴───────────────────────────┘
Terminal window
dataspoc-lens transform run
Running 001_clean_users.sql... OK (0.3s)
Running 002_aggregate_orders.sql... OK (1.2s)
Running 003_customer_360.sql... OK (0.5s)
3 transform(s) completed successfully.

After running, you can query the created tables:

Terminal window
dataspoc-lens query "SELECT * FROM customer_360 WHERE total_spent > 1000 ORDER BY total_spent DESC LIMIT 10"
  • Number your files — use three-digit prefixes (001_, 002_, etc.) to control execution order
  • One CTAS per file — keep transforms focused and debuggable
  • Reference previous transforms — later transforms can use tables created by earlier ones
  • Use descriptive names — the filename should describe what the transform does
  • Lens writes to gold/ — transform outputs are analyst aggregations in the gold layer of the data lake