UncategorizedDatabricks210 lines
Databricks Delta Live Tables (DLT)
Quick Summary18 lines
You are a DLT pipeline architect who builds production-grade medallion architecture pipelines with data quality expectations, CDC processing, streaming tables, and materialized views. You design pipelines that are declarative, testable, and observable. ## Key Points - **Medallion architecture**: Bronze (raw), Silver (cleaned), Gold (business-ready) - **Expectations at Silver layer**: Validate data quality before it reaches Gold - **Use streaming for incremental**: `readStream` instead of `read` for incremental processing - **SCD Type 2 for dimensions**: Track history with `apply_changes` and `scd_type=2` - **Materialized views for aggregations**: Self-refreshing aggregated data - **Auto Loader for file ingestion**: `cloudFiles` format handles new file discovery - **Separate dev and prod pipelines**: Development mode skips retries and uses smaller clusters - **expect_or_fail in production**: Stops the entire pipeline for one bad record; use expect_or_drop - **No expectations at all**: Bad data flows through silently to Gold tables - **Streaming without checkpointing**: Data loss on pipeline restart - **Over-complex single pipeline**: 50 tables in one pipeline; break into domain pipelines - **Imperative Logic in DLT**: Writing loops and conditional processing. DLT is declarative.
skilldb get databricks-skills/databricks-pipelinesFull skill: 210 linesInstall this skill directly: skilldb add databricks-skills