Ir al contenido

SQL Transforms

Esta página aún no está disponible en tu idioma.

SQL transforms let you build curated and aggregated datasets from your raw data lake tables using simple numbered SQL files.

Place .sql files in ~/.dataspoc-lens/transforms/. Each file contains a CREATE OR REPLACE TABLE ... AS SELECT ... (CTAS) statement. Files are executed in alphabetical order, so number them to control the sequence.

~/.dataspoc-lens/
transforms/
001_clean_users.sql
002_aggregate_orders.sql
003_customer_360.sql

Each transform file should contain a single CTAS statement. The created table becomes available to subsequent transforms and queries.

001_clean_users.sql:

CREATE OR REPLACE TABLE clean_users AS
SELECT
id,
TRIM(LOWER(email)) AS email,
COALESCE(name, 'Unknown') AS name,
created_at,
CASE
WHEN status IN ('active', 'Active', 'ACTIVE') THEN 'active'
WHEN status IN ('inactive', 'Inactive') THEN 'inactive'
ELSE 'unknown'
END AS status
FROM customers
WHERE email IS NOT NULL;

002_aggregate_orders.sql:

CREATE OR REPLACE TABLE order_summary AS
SELECT
customer_id,
COUNT(*) AS total_orders,
SUM(total) AS total_spent,
AVG(total) AS avg_order_value,
MIN(order_date) AS first_order,
MAX(order_date) AS last_order
FROM orders
WHERE status = 'completed'
GROUP BY customer_id;

003_customer_360.sql:

CREATE OR REPLACE TABLE customer_360 AS
SELECT
u.id,
u.name,
u.email,
u.status AS user_status,
o.total_orders,
o.total_spent,
o.avg_order_value,
o.first_order,
o.last_order,
DATEDIFF('day', o.last_order, CURRENT_DATE) AS days_since_last_order
FROM clean_users u
LEFT JOIN order_summary o ON u.id = o.customer_id;
Terminal window
dataspoc-lens transform list
┌───┬───────────────────────────┐
│ # │ File │
├───┼───────────────────────────┤
│ 1 │ 001_clean_users.sql │
│ 2 │ 002_aggregate_orders.sql │
│ 3 │ 003_customer_360.sql │
└───┴───────────────────────────┘
Terminal window
dataspoc-lens transform run
Running 001_clean_users.sql... OK (0.3s)
Running 002_aggregate_orders.sql... OK (1.2s)
Running 003_customer_360.sql... OK (0.5s)
3 transform(s) completed successfully.

After running, you can query the created tables:

Terminal window
dataspoc-lens query "SELECT * FROM customer_360 WHERE total_spent > 1000 ORDER BY total_spent DESC LIMIT 10"
  • Number your files — use three-digit prefixes (001_, 002_, etc.) to control execution order
  • One CTAS per file — keep transforms focused and debuggable
  • Reference previous transforms — later transforms can use tables created by earlier ones
  • Use descriptive names — the filename should describe what the transform does
  • Lens writes to gold/ — transform outputs are analyst aggregations in the gold layer of the data lake