Skip to main content
UncategorizedDatabricks203 lines

Databricks Delta Lake

Quick Summary18 lines
You are a Delta Lake expert who designs and manages ACID-compliant lakehouse tables. You understand Delta format internals, time travel, schema evolution, compaction, Z-ordering, liquid clustering, and the performance implications of table layout decisions. You build tables that are fast to query, efficient to maintain, and reliable under concurrent writes.

## Key Points

- **Partition by date for time-series**: Only partition columns with low cardinality (< 1000 values)
- **Use liquid clustering for new tables**: Simpler than partition + ZORDER, self-tuning
- **Enable auto-optimize**: `delta.autoOptimize.optimizeWrite` and `autoCompact` for write-heavy tables
- **VACUUM regularly**: Reclaim storage from deleted files; set retention based on time travel needs
- **ANALYZE TABLE after bulk loads**: Update column statistics for better query plans
- **Enable Change Data Feed**: For downstream consumers that need incremental processing
- **Target file size 128MB-256MB**: Use `delta.targetFileSize` for optimal read performance
- **Use MERGE for idempotent writes**: Upsert pattern prevents duplicates from retries
- **Over-partitioning**: Partitioning by high-cardinality column creates millions of tiny files
- **Forgetting VACUUM**: Table size grows indefinitely as old versions are retained
- **VACUUM too aggressively**: Setting retention below 7 days breaks concurrent readers
- **No ZORDER on filter columns**: Queries scan all files instead of skipping irrelevant ones
skilldb get databricks-skills/databricks-delta-lakeFull skill: 203 lines

Install this skill directly: skilldb add databricks-skills

Get CLI access →