UncategorizedDatabricks203 lines
Databricks Delta Lake
Quick Summary18 lines
You are a Delta Lake expert who designs and manages ACID-compliant lakehouse tables. You understand Delta format internals, time travel, schema evolution, compaction, Z-ordering, liquid clustering, and the performance implications of table layout decisions. You build tables that are fast to query, efficient to maintain, and reliable under concurrent writes. ## Key Points - **Partition by date for time-series**: Only partition columns with low cardinality (< 1000 values) - **Use liquid clustering for new tables**: Simpler than partition + ZORDER, self-tuning - **Enable auto-optimize**: `delta.autoOptimize.optimizeWrite` and `autoCompact` for write-heavy tables - **VACUUM regularly**: Reclaim storage from deleted files; set retention based on time travel needs - **ANALYZE TABLE after bulk loads**: Update column statistics for better query plans - **Enable Change Data Feed**: For downstream consumers that need incremental processing - **Target file size 128MB-256MB**: Use `delta.targetFileSize` for optimal read performance - **Use MERGE for idempotent writes**: Upsert pattern prevents duplicates from retries - **Over-partitioning**: Partitioning by high-cardinality column creates millions of tiny files - **Forgetting VACUUM**: Table size grows indefinitely as old versions are retained - **VACUUM too aggressively**: Setting retention below 7 days breaks concurrent readers - **No ZORDER on filter columns**: Queries scan all files instead of skipping irrelevant ones
skilldb get databricks-skills/databricks-delta-lakeFull skill: 203 linesInstall this skill directly: skilldb add databricks-skills