Skip to main content
Technology & EngineeringCicd Patterns283 lines

Buildkite

Buildkite pipelines including dynamic pipeline generation, agent targeting, plugins, and hybrid cloud CI/CD

Quick Summary27 lines
You are an expert in Buildkite for continuous integration and deployment.

## Key Points

- label: ":npm: Install & Build"
- label: ":test_tube: Unit Tests"
- label: ":lint: Lint"
- label: ":rocket: Deploy"
- label: ":pipeline: Generate Pipeline"
- label: ":test_tube: Test $dir"
- label: ":rocket: Deploy"
- label: ":docker: Build Image"
- label: ":apple: macOS Tests"
- label: ":gpu: ML Training"
- label: ":docker: Build & Push"
- label: ":jest: Tests"

## Quick Example

```yaml
# .buildkite/pipeline.yml
steps:
  - label: ":pipeline: Generate Pipeline"
    command: .buildkite/generate-pipeline.sh | buildkite-agent pipeline upload
```
skilldb get cicd-patterns-skills/BuildkiteFull skill: 283 lines
Paste into your CLAUDE.md or agent config

Buildkite — CI/CD

You are an expert in Buildkite for continuous integration and deployment.

Overview

Buildkite is a hybrid CI/CD platform where the orchestration layer runs in Buildkite's cloud while agents run on your own infrastructure. Pipelines are defined in .buildkite/pipeline.yml or generated dynamically. This architecture means your code and secrets never leave your network. Buildkite uses a step-based model with command, wait, block, trigger, and group steps.

Setup & Configuration

Pipeline configuration lives in .buildkite/pipeline.yml. Buildkite agents are installed on your own servers, VMs, containers, or managed via Buildkite's hosted agents.

Basic pipeline:

# .buildkite/pipeline.yml
steps:
  - label: ":npm: Install & Build"
    command:
      - npm ci
      - npm run build
    artifact_paths:
      - "dist/**/*"

  - wait

  - label: ":test_tube: Unit Tests"
    command: npm test
    artifact_paths:
      - "test-results/**/*"

  - label: ":lint: Lint"
    command: npm run lint

  - wait

  - label: ":rocket: Deploy"
    command: ./deploy.sh
    branches: main
    concurrency: 1
    concurrency_group: deploy/production

Core Patterns

Dynamic Pipeline Generation

Upload pipeline steps dynamically from a script:

# .buildkite/pipeline.yml
steps:
  - label: ":pipeline: Generate Pipeline"
    command: .buildkite/generate-pipeline.sh | buildkite-agent pipeline upload
#!/bin/bash
# .buildkite/generate-pipeline.sh
# Generate steps based on changed files
CHANGED_DIRS=$(git diff --name-only HEAD~1 | cut -d/ -f1 | sort -u)

echo "steps:"
for dir in $CHANGED_DIRS; do
  if [ -f "$dir/Makefile" ]; then
    cat <<YAML
  - label: ":test_tube: Test $dir"
    command: cd $dir && make test
    agents:
      queue: default
YAML
  fi
done

cat <<YAML
  - wait
  - label: ":rocket: Deploy"
    command: make deploy
    branches: main
YAML

Agent Targeting

Route jobs to specific agents by queue or tags:

steps:
  - label: ":docker: Build Image"
    command: docker build -t myapp .
    agents:
      queue: docker-builders
      os: linux
      arch: amd64

  - label: ":apple: macOS Tests"
    command: xcodebuild test
    agents:
      queue: macos
      xcode: "15.2"

  - label: ":gpu: ML Training"
    command: python train.py
    agents:
      queue: gpu
      gpu: a100

Plugins

Extend steps with reusable plugins:

steps:
  - label: ":docker: Build & Push"
    plugins:
      - docker-compose#v5.3.0:
          build: app
          push:
            - app:registry.example.com/app:${BUILDKITE_BUILD_NUMBER}

  - label: ":jest: Tests"
    plugins:
      - docker#v5.11.0:
          image: node:20-alpine
          workdir: /app
          volumes:
            - "./:/app"
          command: ["npm", "test"]

  - label: ":s3: Upload Artifacts"
    plugins:
      - artifacts#v1.9.4:
          upload:
            from: "dist/**/*"
            to: "s3://my-bucket/builds/${BUILDKITE_BUILD_NUMBER}/"

Block Steps and Input

Manual approval and input collection:

steps:
  - label: ":test_tube: Tests"
    command: npm test

  - wait

  - block: ":rocket: Deploy to Production?"
    prompt: "Review the test results before deploying."
    fields:
      - text: "Release Notes"
        key: "release-notes"
        required: true
      - select: "Region"
        key: "deploy-region"
        options:
          - label: "US East"
            value: "us-east-1"
          - label: "EU West"
            value: "eu-west-1"

  - label: ":rocket: Deploy"
    command: |
      echo "Deploying to $(buildkite-agent meta-data get deploy-region)"
      ./deploy.sh $(buildkite-agent meta-data get deploy-region)

Group Steps

Organize related steps visually:

steps:
  - group: ":test_tube: Tests"
    steps:
      - label: "Unit Tests"
        command: npm run test:unit
      - label: "Integration Tests"
        command: npm run test:integration
      - label: "E2E Tests"
        command: npm run test:e2e

  - wait

  - group: ":rocket: Deploy"
    steps:
      - label: "Deploy API"
        command: ./deploy-api.sh
      - label: "Deploy Frontend"
        command: ./deploy-frontend.sh

Trigger Steps

Trigger pipelines in other projects:

steps:
  - label: ":construction: Build"
    command: make build

  - wait

  - trigger: deploy-pipeline
    label: ":rocket: Trigger Deploy"
    build:
      message: "Deploying ${BUILDKITE_COMMIT}"
      branch: main
      env:
        UPSTREAM_BUILD: "${BUILDKITE_BUILD_NUMBER}"
        DEPLOY_SHA: "${BUILDKITE_COMMIT}"

Retry and Timeout Configuration

steps:
  - label: ":test_tube: Flaky Integration Tests"
    command: npm run test:integration
    retry:
      automatic:
        - exit_status: 1
          limit: 2
        - exit_status: -1
          limit: 1
      manual:
        allowed: true
        reason: "Retry if infrastructure issue"
    timeout_in_minutes: 15
    cancel_on_build_failing: true

Core Philosophy

Buildkite's hybrid architecture embodies a principle that matters deeply in CI/CD: the orchestration layer can be managed, but your code and secrets should never leave your network. By running agents on your own infrastructure while Buildkite's cloud handles scheduling and UI, you get the convenience of a hosted service without surrendering control over your build environment. This separation of concerns means you can scale agents independently, run them on specialized hardware (GPUs, ARM, macOS), and enforce network boundaries that a fully hosted CI system cannot provide.

Dynamic pipeline generation is Buildkite's most powerful pattern and reflects a broader CI philosophy: pipelines should describe what needs to happen for this specific change, not enumerate every possible job. A monorepo with fifty packages should not run fifty test suites on every commit. Instead, a generation step inspects the diff, determines which packages changed, and uploads only the relevant steps. This keeps build times proportional to the scope of the change, not the size of the repository.

Concurrency control and agent targeting transform Buildkite from a generic task runner into an infrastructure-aware deployment system. By tagging agents with capabilities (OS, architecture, available tooling) and using concurrency groups to prevent conflicting operations, you encode operational constraints directly into the pipeline definition. This means the pipeline itself becomes a runnable document of your deployment topology and safety invariants.

Anti-Patterns

  • Static pipelines for dynamic repositories. Defining every job in a fixed pipeline.yml for a monorepo means every commit runs every test suite regardless of what changed. Use dynamic pipeline generation to upload only the steps relevant to the current diff.

  • Ignoring agent queue routing. Running all jobs on a single default queue means GPU-hungry ML jobs compete with lightweight linting for the same agents. Tag agents with meaningful capabilities and route jobs to appropriate queues so resources match workload requirements.

  • Passing state through environment variables across steps. Environment variables set in one step are not available in subsequent steps. Using workarounds like writing to shared files without the meta-data API leads to fragile, hard-to-debug pipelines. Use buildkite-agent meta-data set/get for key-value data and artifacts for files.

  • Automatic retries without investigation. Setting retry.automatic with a high limit masks real failures behind a wall of retries. A test that passes on retry three is a test with a real problem. Use retries sparingly and always review retry frequency to find and fix the root cause.

  • Blocking the entire pipeline with wait steps. Using wait to serialize every stage means independent jobs cannot run in parallel. Use depends_on for granular dependencies so jobs start as soon as their actual prerequisites complete, not when every job in the previous stage finishes.

Best Practices

  • Use dynamic pipelines to generate steps based on what changed; this keeps monorepo builds fast.
  • Tag agents with meaningful metadata (queue, os, arch, capability) for precise job routing.
  • Use concurrency and concurrency_group to prevent concurrent deploys to the same environment.
  • Leverage plugins from the Buildkite Plugin Registry instead of scripting common patterns.
  • Use artifact_paths and buildkite-agent artifact for passing files between steps.
  • Use meta-data set/get for passing small key-value data between steps.
  • Run agents in ephemeral containers or VMs (autoscaling stacks) for clean environments per build.
  • Use block steps with fields to collect deployment parameters from the person triggering the deploy.
  • Set cancel_on_build_failing: true to abort long-running steps when a sibling fails.
  • Use Buildkite's Elastic CI Stack for AWS for autoscaling agents that scale to zero when idle.

Common Pitfalls

  • Agents run on your infrastructure; if no agents are available for a queue, jobs sit pending indefinitely.
  • Dynamic pipelines (pipeline upload) must output valid YAML; a syntax error silently fails the upload.
  • wait steps block the entire pipeline; use depends_on for more granular step dependencies.
  • Agent meta-data tags are set at agent startup; you cannot change them without restarting the agent.
  • Artifact storage is in Buildkite's cloud by default; large artifacts are slow to upload/download. Use S3 plugin for large files.
  • Environment variables set in one step are not available in subsequent steps unless passed via meta-data or artifacts.
  • Plugin ordering matters; plugins hook into lifecycle events and run in the order declared.
  • retry.automatic can mask real failures; always check retry counts in build output.

Install this skill directly: skilldb add cicd-patterns-skills

Get CLI access →