Skip to main content
Technology & EngineeringCicd Patterns342 lines

Deployment Strategies

Blue/green, canary, rolling, and feature-flag deployment strategies with platform-specific implementation patterns

Quick Summary18 lines
You are an expert in deployment strategies for continuous delivery and zero-downtime releases.

## Key Points

- TargetService:
- BeforeAllowTraffic: "LambdaFunctionToValidateBeforeTraffic"
- AfterAllowTraffic: "LambdaFunctionToValidateAfterTraffic"
- label: "Expand: Add new column"
- label: "Deploy: Code reads both old and new"
- label: "Backfill: Populate new column"
- label: "Deploy: Code reads only new"
- label: "Contract: Remove old column"
- Always have health checks and readiness probes to prevent routing traffic to unhealthy instances.
- Automate rollback triggers based on error rate, latency, and business metrics.
- Use canary analysis (automated metric comparison) rather than manual observation when possible.
- For blue/green, keep the old environment running long enough to verify the new one, then tear it down to save costs.
skilldb get cicd-patterns-skills/Deployment StrategiesFull skill: 342 lines
Paste into your CLAUDE.md or agent config

Deployment Strategies — CI/CD

You are an expert in deployment strategies for continuous delivery and zero-downtime releases.

Overview

Deployment strategies determine how new versions of software are released to users. The choice of strategy affects downtime, risk, rollback speed, and resource costs. Key strategies include rolling updates, blue/green deployments, canary releases, and feature flags. Each has trade-offs suited to different risk tolerances and infrastructure capabilities.

Setup & Configuration

Deployment strategy choice depends on infrastructure (Kubernetes, cloud VMs, serverless), traffic management capabilities (load balancers, service mesh), and observability maturity (metrics, alerting).

Decision framework:

StrategyDowntimeRollback SpeedResource CostComplexity
RollingNoneMediumLowLow
Blue/GreenNoneInstant2x during deployMedium
CanaryNoneFastLow-MediumHigh
RecreateYesSlowLowLow

Core Patterns

Rolling Update

Gradually replace old instances with new ones. The default strategy in Kubernetes.

# Kubernetes Deployment with rolling update
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 6
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2
      maxUnavailable: 1
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
        - name: myapp
          image: myapp:2.0.0
          readinessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 15
            periodSeconds: 10

Blue/Green Deployment

Run two identical environments; switch traffic atomically between them.

Kubernetes with service selector swap:

# blue deployment (currently live)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: blue
  template:
    metadata:
      labels:
        app: myapp
        version: blue
    spec:
      containers:
        - name: myapp
          image: myapp:1.0.0
---
# green deployment (new version)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: green
  template:
    metadata:
      labels:
        app: myapp
        version: green
    spec:
      containers:
        - name: myapp
          image: myapp:2.0.0
---
# Service points to active color
apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  selector:
    app: myapp
    version: blue  # Switch to 'green' to cut over
  ports:
    - port: 80
      targetPort: 8080

AWS ALB blue/green with CodeDeploy:

# appspec.yml for ECS blue/green
version: 0.0
Resources:
  - TargetService:
      Type: AWS::ECS::Service
      Properties:
        TaskDefinition: <TASK_DEFINITION>
        LoadBalancerInfo:
          ContainerName: "myapp"
          ContainerPort: 8080
Hooks:
  - BeforeAllowTraffic: "LambdaFunctionToValidateBeforeTraffic"
  - AfterAllowTraffic: "LambdaFunctionToValidateAfterTraffic"

Canary Deployment

Route a small percentage of traffic to the new version, then gradually increase.

Using Istio VirtualService:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp
spec:
  hosts:
    - myapp
  http:
    - route:
        - destination:
            host: myapp
            subset: stable
          weight: 95
        - destination:
            host: myapp
            subset: canary
          weight: 5
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: myapp
spec:
  host: myapp
  subsets:
    - name: stable
      labels:
        version: v1
    - name: canary
      labels:
        version: v2

Argo Rollouts canary with automated analysis:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: myapp
spec:
  replicas: 10
  strategy:
    canary:
      steps:
        - setWeight: 5
        - pause: { duration: 5m }
        - analysis:
            templates:
              - templateName: success-rate
            args:
              - name: service-name
                value: myapp
        - setWeight: 25
        - pause: { duration: 5m }
        - setWeight: 50
        - pause: { duration: 5m }
        - setWeight: 100
      canaryService: myapp-canary
      stableService: myapp-stable
      trafficRouting:
        istio:
          virtualService:
            name: myapp-vsvc
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
spec:
  args:
    - name: service-name
  metrics:
    - name: success-rate
      interval: 60s
      successCondition: result[0] >= 0.99
      provider:
        prometheus:
          address: http://prometheus:9090
          query: |
            sum(rate(http_requests_total{service="{{args.service-name}}",status=~"2.."}[5m]))
            /
            sum(rate(http_requests_total{service="{{args.service-name}}"}[5m]))

Feature Flag Deployment

Deploy code to all instances but control activation via feature flags:

# GitHub Actions deploy with feature flag
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Deploy new code
        run: ./deploy.sh
      - name: Enable feature for 5% of users
        run: |
          curl -X PATCH "https://api.launchdarkly.com/api/v2/flags/default/new-checkout" \
            -H "Authorization: ${{ secrets.LD_API_KEY }}" \
            -H "Content-Type: application/json" \
            -d '{
              "patch": [{
                "op": "replace",
                "path": "/environments/production/rules/0/rollout/variations",
                "value": [
                  {"variation": 0, "weight": 5000},
                  {"variation": 1, "weight": 95000}
                ]
              }]
            }'

Database Migration Strategies

Pair deployment strategies with safe schema migrations:

# Expand-and-contract migration pipeline
steps:
  - label: "Expand: Add new column"
    command: |
      # Step 1: Add new column (non-breaking)
      migrate up add_new_column

  - label: "Deploy: Code reads both old and new"
    command: deploy.sh --version dual-read

  - label: "Backfill: Populate new column"
    command: rake db:backfill_new_column

  - label: "Deploy: Code reads only new"
    command: deploy.sh --version new-only

  - label: "Contract: Remove old column"
    command: migrate up drop_old_column

Core Philosophy

The fundamental insight behind deployment strategies is that deploying code and releasing features are two separate acts. A deployment puts new binaries onto infrastructure. A release exposes new behavior to users. Conflating the two means every deployment is a high-stakes event with blast radius equal to your entire user base. The most mature organizations deploy continuously but release deliberately — using canary analysis, feature flags, and traffic shaping to control exactly who sees what, and rolling back in seconds when something goes wrong.

Every deployment strategy is a trade-off between four variables: risk (blast radius and rollback speed), cost (infrastructure overhead during transition), complexity (operational burden), and speed (time from merge to production). There is no universally correct strategy. A rolling update is simple and cheap but offers limited blast radius control. A canary deployment gives precise risk management but requires sophisticated traffic routing and automated metric analysis. The right choice depends on the service's criticality, your observability maturity, and your infrastructure capabilities. Choosing a strategy more complex than your team can operate reliably is worse than choosing a simpler one.

Database schema changes are the hidden constraint that every deployment strategy must address. When old and new application versions run simultaneously — as they do in rolling, blue/green, and canary deployments — both versions must work with the same database schema. This means every schema migration must be backward-compatible, and the expand-and-contract pattern is not optional but essential. A deployment strategy that ignores the database is a deployment strategy that will cause an outage.

Anti-Patterns

  • Big-bang deployments. Replacing all instances simultaneously with a "recreate" strategy in production means any bug affects 100% of users with no gradual rollout or automatic rollback. Reserve recreate for development environments; use rolling or blue/green for production.

  • Canary without automated analysis. Running a canary deployment but relying on a human to watch dashboards and decide whether to promote defeats the purpose. Without automated metric comparison (error rate, latency, business KPIs), canary deployments are just slow rolling updates with extra steps.

  • Ignoring database compatibility. Deploying a new application version that requires a schema change (renamed column, dropped table) while old instances are still running breaks those old instances. Always use the expand-and-contract pattern: add the new schema, deploy code that handles both, migrate data, then remove the old schema.

  • Orphaned blue environments. Running a blue/green deployment but forgetting to tear down the idle environment after verification doubles your infrastructure cost permanently. Automate the teardown of the old environment after a configurable stabilization window.

  • Feature flags without lifecycle management. Using feature flags to decouple deploy from release but never removing them turns every conditional into permanent branching logic. Establish a flag lifecycle: create, activate, fully roll out, remove code path, delete flag.

Best Practices

  • Always have health checks and readiness probes to prevent routing traffic to unhealthy instances.
  • Automate rollback triggers based on error rate, latency, and business metrics.
  • Use canary analysis (automated metric comparison) rather than manual observation when possible.
  • For blue/green, keep the old environment running long enough to verify the new one, then tear it down to save costs.
  • Make deployments idempotent; re-running a deploy should produce the same result.
  • Database migrations must be backward-compatible; never break the running version during a deploy.
  • Use the expand-and-contract pattern for schema changes across deployment boundaries.
  • Practice rollbacks regularly; an untested rollback procedure is not a rollback procedure.
  • Separate deploy (push code) from release (enable for users) using feature flags.
  • Monitor golden signals (latency, traffic, errors, saturation) during every deployment.

Common Pitfalls

  • Blue/green doubles infrastructure cost during the transition; forgetting to tear down the old environment wastes resources permanently.
  • Canary deployments without automated metric analysis rely on humans noticing problems, which defeats the purpose.
  • Rolling updates with maxUnavailable: 0 and maxSurge: 1 are very slow with many replicas; tune both values.
  • Not testing rollback procedures means discovering they do not work during an incident.
  • Database migrations that are not backward-compatible break rolling and blue/green deployments where old and new code run simultaneously.
  • Session affinity (sticky sessions) can route a user entirely to canary, giving an outsized blast radius.
  • Feature flags left enabled permanently become technical debt; maintain a lifecycle and cleanup process.
  • Canary weight percentages apply to new connections; long-lived connections (WebSockets, gRPC streams) are not rebalanced.

Install this skill directly: skilldb add cicd-patterns-skills

Get CLI access →