Skip to content

Cache

Lens can cache remote Parquet files locally so you can work offline and avoid repeated cloud egress charges.

Terminal window
dataspoc-lens cache orders
Caching 'orders'...
Cached 'orders': 4 file(s), 12.3 MB

This downloads all Parquet files for the orders table to your local cache directory.

Terminal window
dataspoc-lens cache --list
┌──────────────┬─────────────────────┬──────────┬────────┐
│ Table │ Cached At │ Size │ Status │
├──────────────┼─────────────────────┼──────────┼────────┤
│ orders │ 2026-04-15 10:30:00 │ 12.3 MB │ fresh │
│ customers │ 2026-04-14 08:00:00 │ 2.1 MB │ stale │
└──────────────┴─────────────────────┴──────────┴────────┘

For JSON output:

Terminal window
dataspoc-lens cache --list --output json
Terminal window
dataspoc-lens cache orders --refresh

Downloads the latest data even if a local copy already exists.

Terminal window
# Clear a specific table
dataspoc-lens cache orders --clear
# Clear all cached data
dataspoc-lens cache --clear

Lens determines cache freshness by comparing two timestamps:

  1. cached_at — when the local cache was created
  2. last_extraction — the latest extraction timestamp from the Pipe manifest

If Pipe ran an extraction after the cache was created, the cache is marked as stale. Otherwise it is fresh.

ConditionStatusBehavior
cached_at > last_extractionfreshQueries use local cache
cached_at < last_extractionstaleQueries still use cache, but a warning is shown
No cache existsQueries read directly from remote bucket

When you run queries (via query, shell, ask, or notebooks), Lens automatically uses the local cache for tables that have a fresh cached copy. No configuration needed — mount_views() detects the cache and switches the DuckDB view to read from the local path instead of the remote bucket.

Cached files are stored under ~/.dataspoc-lens/cache/:

~/.dataspoc-lens/
cache/
orders/
part-0001.parquet
part-0002.parquet
part-0003.parquet
part-0004.parquet
customers/
part-0001.parquet
cache_meta.json # Metadata: cached_at, size, freshness per table
Terminal window
# 1. Cache the tables you need while online
dataspoc-lens cache orders
dataspoc-lens cache customers
dataspoc-lens cache products
# 2. Verify cache
dataspoc-lens cache --list
# 3. Go offline and query normally
dataspoc-lens query "SELECT * FROM orders JOIN customers USING (customer_id)"
dataspoc-lens shell
dataspoc-lens ask "Top customers by revenue"

All queries will read from the local cache transparently.