Skip to main content
Technology & EngineeringTerraform264 lines

Terraform State Management

Remote state backends, state locking, import, migration, and state surgery techniques

Quick Summary25 lines
You are an expert in Terraform state management for infrastructure as code.

## Key Points

- **Enable versioning on state storage.** S3 bucket versioning or GCS object versioning lets you recover from a corrupted state by rolling back to a previous version.
- **Encrypt state at rest and in transit.** State contains sensitive data (database passwords, private keys). Use server-side encryption and TLS.
- **Split state by blast radius.** Separate networking, compute, and data layers into independent state files. A bad apply to the compute layer should not touch the database.
- **Use a consistent key naming convention.** For example: `{environment}/{component}/terraform.tfstate`.
- **Restrict state bucket access** using IAM policies. Only CI pipelines and designated operators should have write access.
- **Never store state in version control.** The `.gitignore` should include `*.tfstate` and `*.tfstate.*`.
- **Prefer `terraform_remote_state` sparingly.** It creates a coupling between stacks. For large organizations, use a data lookup (like `aws_vpc` data source with tags) instead.
- **Forgetting DynamoDB table for S3 backend locking.** Without a lock table, concurrent applies can corrupt state. Always configure `dynamodb_table`.
- **Running `terraform state mv` without a backup.** Always pull a copy of the state first: `terraform state pull > backup.json`.
- **Force-unlocking without verifying the lock is stale.** If another process is genuinely running, force-unlocking causes concurrent writes and potential corruption.
- **Migrating state while another apply is in progress.** Coordinate with your team before running `terraform init -migrate-state`.
- **Storing different environments in the same state file.** This means a change to staging can accidentally affect production. Use separate state files per environment.

## Quick Example

```bash
# Force-unlock a stuck lock (use with extreme caution)
terraform force-unlock LOCK_ID
```
skilldb get terraform-skills/Terraform State ManagementFull skill: 264 lines
Paste into your CLAUDE.md or agent config

State Management — Terraform

You are an expert in Terraform state management for infrastructure as code.

Overview

Terraform state is the source of truth that maps your configuration to real-world resources. It tracks resource metadata, dependency ordering, and attribute values. By default, state is stored locally in terraform.tfstate, but production workflows require remote state with locking to enable team collaboration and prevent concurrent modifications.

Core Concepts

What State Contains

State records every resource Terraform manages: its type, provider, attributes, and dependencies. It also stores the outputs of root and child modules. Terraform uses this to compute diffs during plan and to determine the correct order of operations during apply.

Remote Backends

Remote backends store state in a shared location and support locking to prevent concurrent writes.

# S3 backend with DynamoDB locking (most common on AWS)
terraform {
  backend "s3" {
    bucket         = "mycompany-terraform-state"
    key            = "services/api/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-state-locks"
    encrypt        = true
  }
}
# GCS backend (Google Cloud)
terraform {
  backend "gcs" {
    bucket = "mycompany-terraform-state"
    prefix = "services/api"
  }
}
# Azure Storage backend
terraform {
  backend "azurerm" {
    resource_group_name  = "terraform-state-rg"
    storage_account_name = "tfstatemycompany"
    container_name       = "tfstate"
    key                  = "services/api/terraform.tfstate"
  }
}
# Terraform Cloud / HCP Terraform backend
terraform {
  cloud {
    organization = "mycompany"

    workspaces {
      name = "api-production"
    }
  }
}

State Locking

Locking prevents two people or CI jobs from writing to the same state simultaneously. Most remote backends support locking natively. If a lock is held, Terraform waits or fails with an error.

# Force-unlock a stuck lock (use with extreme caution)
terraform force-unlock LOCK_ID

Implementation Patterns

Bootstrap Pattern for State Backend

The state backend itself needs to exist before Terraform can use it. A common approach is a bootstrap configuration.

# bootstrap/main.tf — run once to create the state bucket and lock table
provider "aws" {
  region = "us-east-1"
}

resource "aws_s3_bucket" "state" {
  bucket = "mycompany-terraform-state"

  lifecycle {
    prevent_destroy = true
  }
}

resource "aws_s3_bucket_versioning" "state" {
  bucket = aws_s3_bucket.state.id

  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "state" {
  bucket = aws_s3_bucket.state.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "aws:kms"
    }
  }
}

resource "aws_s3_bucket_public_access_block" "state" {
  bucket = aws_s3_bucket.state.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_dynamodb_table" "locks" {
  name         = "terraform-state-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

Cross-Stack Data Sharing with Remote State

# In the networking stack, expose outputs
output "vpc_id" {
  value = aws_vpc.main.id
}

output "private_subnet_ids" {
  value = aws_subnet.private[*].id
}

# In the application stack, read the networking state
data "terraform_remote_state" "network" {
  backend = "s3"

  config = {
    bucket = "mycompany-terraform-state"
    key    = "network/terraform.tfstate"
    region = "us-east-1"
  }
}

resource "aws_instance" "app" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = "t3.micro"
  subnet_id     = data.terraform_remote_state.network.outputs.private_subnet_ids[0]
}

Importing Existing Resources

# Import a resource into state
terraform import aws_instance.web i-0abc123def456

# Import into a module resource
terraform import module.vpc.aws_vpc.this vpc-0abc123def

# Import into an indexed resource
terraform import 'aws_subnet.public["us-east-1a"]' subnet-0abc123

Starting with Terraform 1.5, use import blocks for declarative imports:

import {
  to = aws_instance.web
  id = "i-0abc123def456"
}

# Run plan to preview the import, then apply
# terraform plan -generate-config-out=generated.tf

State Migration

# Move a resource to a different address (rename)
terraform state mv aws_instance.old aws_instance.new

# Move a resource into a module
terraform state mv aws_instance.web module.compute.aws_instance.web

# Remove a resource from state without destroying it
terraform state rm aws_instance.legacy

# Pull remote state to a local file for inspection
terraform state pull > state.json

# Push a corrected local state file (dangerous)
terraform state push state.json

Migrating Between Backends

# 1. Update the backend block in your configuration
# 2. Run init — Terraform detects the change and offers to migrate
terraform init -migrate-state

# To reconfigure without migrating (start fresh)
terraform init -reconfigure

Best Practices

  • Enable versioning on state storage. S3 bucket versioning or GCS object versioning lets you recover from a corrupted state by rolling back to a previous version.
  • Encrypt state at rest and in transit. State contains sensitive data (database passwords, private keys). Use server-side encryption and TLS.
  • Split state by blast radius. Separate networking, compute, and data layers into independent state files. A bad apply to the compute layer should not touch the database.
  • Use a consistent key naming convention. For example: {environment}/{component}/terraform.tfstate.
  • Restrict state bucket access using IAM policies. Only CI pipelines and designated operators should have write access.
  • Never store state in version control. The .gitignore should include *.tfstate and *.tfstate.*.
  • Prefer terraform_remote_state sparingly. It creates a coupling between stacks. For large organizations, use a data lookup (like aws_vpc data source with tags) instead.

Core Philosophy

State is Terraform's memory. Without it, Terraform has no idea what infrastructure exists, what it manages, or what needs to change. Treating state with the same care you would give a production database is not an exaggeration; losing or corrupting the state file is functionally equivalent to losing track of your entire infrastructure. Remote backends with locking and versioning are not optional best practices; they are prerequisites for any team or CI-driven workflow.

The principle of minimal blast radius should guide every state architecture decision. Each state file is a unit of risk: a bad apply affects everything in that state. Splitting infrastructure into focused state files (networking, compute, data, monitoring) means that a mistake in the compute layer cannot accidentally destroy the database. The overhead of managing multiple state files is far less than the cost of a single cross-cutting incident.

State surgery (moving, removing, importing resources) is a powerful but dangerous capability. Every state operation should be preceded by a backup (terraform state pull > backup.json) and followed by a plan to verify the result. The goal of any state manipulation is to bring the state back into alignment with reality without modifying actual infrastructure. If a terraform plan after your state surgery shows unexpected changes, stop and investigate before applying.

Anti-Patterns

  • Storing state in Git. Committing terraform.tfstate to version control exposes sensitive data (passwords, private keys, API tokens) to everyone with repository access. State also changes on every apply, creating constant merge conflicts. Always use a remote backend.

  • Single state file for everything. Putting an entire organization's infrastructure into one state file means every plan takes minutes, every apply risks everything, and every team blocks every other team. Split state by component and environment.

  • Using terraform state push -force to resolve conflicts. Force-pushing a local state file overwrites whatever is in the remote backend, potentially losing recent changes from other team members or CI runs. Investigate the conflict, reconcile manually, and only force-push as an absolute last resort with team coordination.

  • Cross-stack coupling via terraform_remote_state. Reading another stack's state creates a hidden dependency that breaks when the upstream stack refactors its outputs. Prefer data source lookups (e.g., querying by tags) or explicit parameter passing through a shared variable file or CI pipeline.

  • Skipping state locking to "speed things up." Disabling locking or not configuring a lock table saves seconds per operation but risks state corruption that can take hours to repair. Concurrent writes to unlocked state can produce a file that is internally inconsistent and irrecoverable without manual surgery.

Common Pitfalls

  • Forgetting DynamoDB table for S3 backend locking. Without a lock table, concurrent applies can corrupt state. Always configure dynamodb_table.
  • Running terraform state mv without a backup. Always pull a copy of the state first: terraform state pull > backup.json.
  • Importing resources that do not match the configuration. After import, run terraform plan immediately. If there is drift between the imported resource and your config, Terraform will try to modify or recreate it.
  • Force-unlocking without verifying the lock is stale. If another process is genuinely running, force-unlocking causes concurrent writes and potential corruption.
  • Migrating state while another apply is in progress. Coordinate with your team before running terraform init -migrate-state.
  • Using terraform state push carelessly. This overwrites the remote state with your local file. If your local state is outdated, you lose recent changes. Use the -force flag only as a last resort.
  • Storing different environments in the same state file. This means a change to staging can accidentally affect production. Use separate state files per environment.

Install this skill directly: skilldb add terraform-skills

Get CLI access →