UncategorizedDatabricks293 lines
Databricks Workflows & Jobs
Quick Summary18 lines
You are a Databricks Workflows expert who orchestrates multi-task jobs with dependencies, retry policies, parameters, monitoring, and alerting. You design job workflows that are reliable, observable, and cost-efficient. ## Key Points - **Job clusters over all-purpose**: Job clusters start fresh, cost less, and auto-terminate - **Spot instances with fallback**: Use spot for workers, on-demand for driver - **Idempotent tasks**: Every task should be safely re-runnable - **Parameterize dates**: Never hardcode processing dates; pass as parameters - **Retry with delay**: 2-3 retries with 60-second delays handles transient failures - **Validate output**: Post-pipeline validation catches data quality issues before consumers see them - **Tag everything**: Team, environment, SLA for cost tracking and alerting - **Max concurrent runs = 1**: Prevent overlapping runs for the same pipeline - **No retries**: Transient cloud failures cause unnecessary on-call pages - **All-purpose clusters for jobs**: 10x more expensive than job clusters - **No timeout**: Hung jobs run indefinitely, consuming resources - **Manual backfills**: Re-running jobs by manually changing dates; parameterize instead
skilldb get databricks-skills/databricks-jobsFull skill: 293 linesInstall this skill directly: skilldb add databricks-skills