SQL Transforms
Esta página aún no está disponible en tu idioma.
SQL transforms let you build curated and aggregated datasets from your raw data lake tables using simple numbered SQL files.
How it works
Section titled “How it works”Place .sql files in ~/.dataspoc-lens/transforms/. Each file contains a CREATE OR REPLACE TABLE ... AS SELECT ... (CTAS) statement. Files are executed in alphabetical order, so number them to control the sequence.
Transform directory
Section titled “Transform directory”~/.dataspoc-lens/ transforms/ 001_clean_users.sql 002_aggregate_orders.sql 003_customer_360.sqlWriting transforms
Section titled “Writing transforms”Each transform file should contain a single CTAS statement. The created table becomes available to subsequent transforms and queries.
Example: Clean users
Section titled “Example: Clean users”001_clean_users.sql:
CREATE OR REPLACE TABLE clean_users ASSELECT id, TRIM(LOWER(email)) AS email, COALESCE(name, 'Unknown') AS name, created_at, CASE WHEN status IN ('active', 'Active', 'ACTIVE') THEN 'active' WHEN status IN ('inactive', 'Inactive') THEN 'inactive' ELSE 'unknown' END AS statusFROM customersWHERE email IS NOT NULL;Example: Aggregate orders
Section titled “Example: Aggregate orders”002_aggregate_orders.sql:
CREATE OR REPLACE TABLE order_summary ASSELECT customer_id, COUNT(*) AS total_orders, SUM(total) AS total_spent, AVG(total) AS avg_order_value, MIN(order_date) AS first_order, MAX(order_date) AS last_orderFROM ordersWHERE status = 'completed'GROUP BY customer_id;Example: Customer 360
Section titled “Example: Customer 360”003_customer_360.sql:
CREATE OR REPLACE TABLE customer_360 ASSELECT u.id, u.name, u.email, u.status AS user_status, o.total_orders, o.total_spent, o.avg_order_value, o.first_order, o.last_order, DATEDIFF('day', o.last_order, CURRENT_DATE) AS days_since_last_orderFROM clean_users uLEFT JOIN order_summary o ON u.id = o.customer_id;List transforms
Section titled “List transforms”dataspoc-lens transform list┌───┬───────────────────────────┐│ # │ File │├───┼───────────────────────────┤│ 1 │ 001_clean_users.sql ││ 2 │ 002_aggregate_orders.sql ││ 3 │ 003_customer_360.sql │└───┴───────────────────────────┘Run transforms
Section titled “Run transforms”dataspoc-lens transform runRunning 001_clean_users.sql... OK (0.3s)Running 002_aggregate_orders.sql... OK (1.2s)Running 003_customer_360.sql... OK (0.5s)
3 transform(s) completed successfully.After running, you can query the created tables:
dataspoc-lens query "SELECT * FROM customer_360 WHERE total_spent > 1000 ORDER BY total_spent DESC LIMIT 10"Best practices
Section titled “Best practices”- Number your files — use three-digit prefixes (
001_,002_, etc.) to control execution order - One CTAS per file — keep transforms focused and debuggable
- Reference previous transforms — later transforms can use tables created by earlier ones
- Use descriptive names — the filename should describe what the transform does
- Lens writes to gold/ — transform outputs are analyst aggregations in the gold layer of the data lake