Exemplos
Três exemplos completos mostrando o DataSpoc Lens em cenários do mundo real.
Exemplo 1: Explorar um data lake de e-commerce
Seção intitulada “Exemplo 1: Explorar um data lake de e-commerce”Descubra e analise dados do data lake de uma empresa de e-commerce no S3.
Configuração
Seção intitulada “Configuração”pip install dataspoc-lens[s3]dataspoc-lens initdataspoc-lens add-bucket s3://acme-ecommerce-dataBucket added: s3://acme-ecommerce-dataDiscovering tables...┌──────────────┬─────────┬────────┬──────────┐│ Table │ Columns │ Rows │ Source │├──────────────┼─────────┼────────┼──────────┤│ customers │ 8 │ 54200 │ postgres ││ orders │ 12 │ 486000 │ postgres ││ products │ 9 │ 3120 │ postgres ││ events │ 6 │ 2.1M │ segment │└──────────────┴─────────┴────────┴──────────┘
4 table(s) found.Explorar o catálogo
Seção intitulada “Explorar o catálogo”dataspoc-lens catalog --detail orders┌─────────────────┬───────────┐│ Column │ Type │├─────────────────┼───────────┤│ order_id │ INTEGER ││ customer_id │ INTEGER ││ order_date │ DATE ││ ship_date │ DATE ││ total │ DOUBLE ││ discount │ DOUBLE ││ status │ VARCHAR ││ payment_method │ VARCHAR ││ channel │ VARCHAR ││ region │ VARCHAR ││ items_count │ INTEGER ││ created_at │ TIMESTAMP │└─────────────────┴───────────┘Executar queries
Seção intitulada “Executar queries”dataspoc-lens query " SELECT region, COUNT(*) as orders, SUM(total) as revenue FROM orders WHERE order_date >= '2026-01-01' GROUP BY region ORDER BY revenue DESC"┌──────────────┬────────┬────────────┐│ region │ orders │ revenue │├──────────────┼────────┼────────────┤│ North America│ 45200 │ 3,842,000 ││ Europe │ 28300 │ 2,156,000 ││ Asia Pacific │ 18900 │ 1,420,000 ││ Latin America│ 7600 │ 580,000 │└──────────────┴────────┴────────────┘
(4 row(s), 0.342s)Usar o shell interativo
Seção intitulada “Usar o shell interativo”dataspoc-lens shelllens> .tables customers orders products events
lens> SELECT payment_method, COUNT(*) as cnt ...> FROM orders GROUP BY payment_method ORDER BY cnt DESC;┌────────────────┬────────┐│ payment_method │ cnt │├────────────────┼────────┤│ credit_card │ 245000 ││ debit_card │ 128000 ││ paypal │ 82000 ││ bank_transfer │ 31000 │└────────────────┴────────┘
(4 row(s), 0.215s)
lens> .export csv /tmp/payment_methods.csvExported 4 row(s) to /tmp/payment_methods.csvExportar para relatório
Seção intitulada “Exportar para relatório”dataspoc-lens query " SELECT DATE_TRUNC('month', order_date) as month, COUNT(*) as orders, SUM(total) as revenue FROM orders WHERE order_date >= '2025-01-01' GROUP BY month ORDER BY month" --export monthly_report.csv(16 row(s), 0.456s)Exported 16 row(s) to monthly_report.csvExemplo 2: Construir uma camada curada com transformações
Seção intitulada “Exemplo 2: Construir uma camada curada com transformações”Crie datasets limpos e agregados a partir de dados brutos usando transformações SQL.
Criar arquivos de transformação
Seção intitulada “Criar arquivos de transformação”dataspoc-lens initCrie os seguintes arquivos em ~/.dataspoc-lens/transforms/:
001_clean_customers.sql:
CREATE OR REPLACE TABLE clean_customers ASSELECT id, TRIM(LOWER(email)) AS email, COALESCE(name, 'Unknown') AS name, created_at, CASE WHEN status IN ('active', 'Active', 'ACTIVE') THEN 'active' WHEN status IN ('inactive', 'Inactive') THEN 'inactive' ELSE 'unknown' END AS statusFROM customersWHERE email IS NOT NULL AND email NOT LIKE '%test%';002_order_metrics.sql:
CREATE OR REPLACE TABLE order_metrics ASSELECT customer_id, COUNT(*) AS total_orders, SUM(total) AS lifetime_value, AVG(total) AS avg_order_value, MIN(order_date) AS first_order, MAX(order_date) AS last_order, DATEDIFF('day', MAX(order_date), CURRENT_DATE) AS days_since_last_order, COUNT(CASE WHEN status = 'cancelled' THEN 1 END) AS cancelled_ordersFROM ordersGROUP BY customer_id;003_customer_segments.sql:
CREATE OR REPLACE TABLE customer_segments ASSELECT c.id, c.name, c.email, m.total_orders, m.lifetime_value, m.avg_order_value, m.days_since_last_order, CASE WHEN m.lifetime_value > 5000 AND m.days_since_last_order < 30 THEN 'VIP Active' WHEN m.lifetime_value > 5000 AND m.days_since_last_order >= 30 THEN 'VIP At Risk' WHEN m.lifetime_value > 1000 THEN 'Regular' WHEN m.total_orders = 1 THEN 'One-time' ELSE 'Low Value' END AS segmentFROM clean_customers cLEFT JOIN order_metrics m ON c.id = m.customer_id;Executar transformações
Seção intitulada “Executar transformações”dataspoc-lens transform list┌───┬─────────────────────────────┐│ # │ File │├───┼─────────────────────────────┤│ 1 │ 001_clean_customers.sql ││ 2 │ 002_order_metrics.sql ││ 3 │ 003_customer_segments.sql │└───┴─────────────────────────────┘dataspoc-lens transform runRunning 001_clean_customers.sql... OK (0.8s)Running 002_order_metrics.sql... OK (2.1s)Running 003_customer_segments.sql... OK (0.6s)
3 transform(s) completed successfully.Consultar os resultados
Seção intitulada “Consultar os resultados”dataspoc-lens query " SELECT segment, COUNT(*) as customers, AVG(lifetime_value) as avg_ltv FROM customer_segments GROUP BY segment ORDER BY avg_ltv DESC"┌──────────────┬───────────┬────────────┐│ segment │ customers │ avg_ltv │├──────────────┼───────────┼────────────┤│ VIP Active │ 420 │ 12,450.00 ││ VIP At Risk │ 185 │ 8,200.00 ││ Regular │ 3,200 │ 2,150.00 ││ One-time │ 8,400 │ 85.00 ││ Low Value │ 2,100 │ 320.00 │└──────────────┴───────────┴────────────┘
(5 row(s), 0.052s)Exemplo 3: Análise multi-cloud com Jupyter
Seção intitulada “Exemplo 3: Análise multi-cloud com Jupyter”Conecte a buckets em múltiplas nuvens e analise tudo junto em um notebook Jupyter.
Configuração
Seção intitulada “Configuração”pip install dataspoc-lens[s3,gcs,jupyter]dataspoc-lens init
# Add S3 bucket (product data)dataspoc-lens add-bucket s3://acme-product-data
# Add GCS bucket (analytics data)dataspoc-lens add-bucket gs://acme-analyticsBucket added: s3://acme-product-dataDiscovering tables...2 table(s) found.
Bucket added: gs://acme-analyticsDiscovering tables...3 table(s) found.Cachear para performance
Seção intitulada “Cachear para performance”dataspoc-lens cache productsdataspoc-lens cache page_viewsdataspoc-lens cache conversionsAbrir o Jupyter
Seção intitulada “Abrir o Jupyter”dataspoc-lens notebookCélulas do notebook
Seção intitulada “Células do notebook”# Cell 1: Check available tables from both clouds%%sqlSHOW TABLES┌──────────────────┐│ name │├──────────────────┤│ products ││ inventory ││ page_views ││ conversions ││ sessions │└──────────────────┘# Cell 2: Cross-cloud join — product data (S3) with analytics (GCS)%%sqlSELECT p.name AS product, p.category, COUNT(DISTINCT pv.session_id) AS views, COUNT(DISTINCT c.conversion_id) AS purchases, ROUND(COUNT(DISTINCT c.conversion_id)::FLOAT / NULLIF(COUNT(DISTINCT pv.session_id), 0) * 100, 2) AS conversion_rateFROM products pLEFT JOIN page_views pv ON p.product_id = pv.product_idLEFT JOIN conversions c ON p.product_id = c.product_idGROUP BY p.name, p.categoryORDER BY conversion_rate DESCLIMIT 10# Cell 3: Visualize conversion funnelimport pandas as pd
df = conn.execute(""" SELECT category, SUM(views) as total_views, SUM(purchases) as total_purchases FROM ( SELECT p.category, COUNT(DISTINCT pv.session_id) AS views, COUNT(DISTINCT c.conversion_id) AS purchases FROM products p LEFT JOIN page_views pv ON p.product_id = pv.product_id LEFT JOIN conversions c ON p.product_id = c.product_id GROUP BY p.category, p.product_id ) GROUP BY category ORDER BY total_views DESC""").fetchdf()
df.plot(x='category', y=['total_views', 'total_purchases'], kind='bar', title='Views vs Purchases by Category')# Cell 4: Check cache statusfrom dataspoc_lens import LensClient
with LensClient() as client: for entry in client.cache_status(): size_mb = entry["size_bytes"] / (1024 * 1024) print(f"{entry['table']}: {entry['status']} ({size_mb:.1f} MB)")products: fresh (0.8 MB)page_views: fresh (45.2 MB)conversions: fresh (12.1 MB)