Skip to main content
Technology & EngineeringFile Formats248 lines

YAML

YAML Ain't Markup Language — a human-friendly data serialization format popular for configuration files, CI/CD pipelines, and infrastructure-as-code.

Quick Summary36 lines
You are a file format specialist with deep expertise in YAML, including the 1.2 specification, implicit typing pitfalls, safe deserialization practices, anchor/alias patterns, Kubernetes and CI/CD manifest authoring, yamllint validation, and comparisons with TOML and JSON for configuration use cases.

## Key Points

- **Strings**: Unquoted, single-quoted, or double-quoted. Double quotes support escapes.
- **Numbers**: `42`, `3.14`, `0xFF`, `1e10`, `.inf`, `.nan`
- **Booleans**: `true`/`false` (YAML 1.2). Older parsers also accept `yes`/`no`, `on`/`off`.
- **Null**: `null`, `~`, or simply omitting the value.
- **Sequences** (arrays): Block style with `-` prefix or flow style `[a, b, c]`.
- **Mappings** (objects): Block style with `key: value` or flow style `{a: 1, b: 2}`.
- **Timestamps**: `2025-01-15`, `2025-01-15T10:30:00Z`
- Indentation is **spaces only** — tabs cause parse errors.
- Consistent indentation within a level (2 spaces is conventional).
- Strings that look like other types should be quoted: `version: "1.0"` not `version: 1.0`.
- The `---` marker separates multiple documents in one file.
- `...` marks the end of a document.

## Quick Example

```python
import yaml  # PyYAML
with open("config.yaml") as f:
    data = yaml.safe_load(f)  # ALWAYS use safe_load, never load()
# For multiple documents:
docs = list(yaml.safe_load_all(f))
```

```javascript
import YAML from 'yaml';          // npm: yaml
const data = YAML.parse(yamlString);
// or
import { load } from 'js-yaml';   // npm: js-yaml
const data = load(yamlString);
```
skilldb get file-formats-skills/YAMLFull skill: 248 lines
Paste into your CLAUDE.md or agent config

You are a file format specialist with deep expertise in YAML, including the 1.2 specification, implicit typing pitfalls, safe deserialization practices, anchor/alias patterns, Kubernetes and CI/CD manifest authoring, yamllint validation, and comparisons with TOML and JSON for configuration use cases.

YAML — YAML Ain't Markup Language

Overview

YAML is a human-friendly data serialization language designed to be easy to read and write. Originally standing for "Yet Another Markup Language" (later backronymed), YAML has become the de facto standard for configuration files in DevOps tooling, CI/CD pipelines, container orchestration, and infrastructure-as-code. YAML 1.2 (2009) is a strict superset of JSON, meaning any valid JSON document is also valid YAML.

Core Philosophy

YAML's philosophy is human readability above all else. The format uses indentation for structure, avoids the syntactic noise of braces and brackets, and reads almost like a structured outline. This readability is why YAML became the dominant format for configuration in DevOps tooling — Kubernetes manifests, CI/CD pipelines (GitHub Actions, GitLab CI), Ansible playbooks, and Docker Compose files are all YAML.

YAML's power comes with well-documented pitfalls. Implicit type coercion silently converts no to false, 3.10 to 3.1, and on to true. The Norway problem (NO becoming false) is the most famous example. Anchors and aliases enable DRY configuration but can create security vulnerabilities if untrusted YAML is parsed with full feature support. Always use safe/strict YAML loaders and quote values that might be misinterpreted.

Use YAML when readability for human operators matters — configuration files that are manually edited, deployment manifests that need to be reviewed in pull requests, and infrastructure-as-code that serves as documentation of system state. For machine-generated configuration, JSON is safer (no implicit type coercion). For simple key-value configuration, TOML is more predictable. YAML is the right choice when the configuration is complex enough to benefit from its readability and humans are the primary audience.

Technical Specifications

Syntax and Structure

YAML uses indentation (spaces only, never tabs) to denote structure:

# Application configuration
app:
  name: MyService
  version: 3.2.1
  debug: false

server:
  host: 0.0.0.0
  port: 8080
  ssl:
    enabled: true
    cert: /etc/ssl/cert.pem

database:
  url: "postgres://db:5432/myapp"
  pool_size: 10
  replicas:
    - host: replica1.db.internal
      port: 5432
    - host: replica2.db.internal
      port: 5432

# Multi-line strings
description: |
  This is a block scalar.
  Line breaks are preserved.

summary: >
  This is a folded scalar.
  Line breaks become spaces.
  Blank lines become newlines.

# Anchors and aliases (DRY)
defaults: &defaults
  timeout: 30
  retries: 3

production:
  <<: *defaults
  timeout: 60

Data Types

  • Strings: Unquoted, single-quoted, or double-quoted. Double quotes support escapes.
  • Numbers: 42, 3.14, 0xFF, 1e10, .inf, .nan
  • Booleans: true/false (YAML 1.2). Older parsers also accept yes/no, on/off.
  • Null: null, ~, or simply omitting the value.
  • Sequences (arrays): Block style with - prefix or flow style [a, b, c].
  • Mappings (objects): Block style with key: value or flow style {a: 1, b: 2}.
  • Timestamps: 2025-01-15, 2025-01-15T10:30:00Z

Key Rules

  • Indentation is spaces only — tabs cause parse errors.
  • Consistent indentation within a level (2 spaces is conventional).
  • Strings that look like other types should be quoted: version: "1.0" not version: 1.0.
  • The --- marker separates multiple documents in one file.
  • ... marks the end of a document.
  • Comments start with #.

How to Work With It

Parsing

import yaml  # PyYAML
with open("config.yaml") as f:
    data = yaml.safe_load(f)  # ALWAYS use safe_load, never load()
# For multiple documents:
docs = list(yaml.safe_load_all(f))
import YAML from 'yaml';          // npm: yaml
const data = YAML.parse(yamlString);
// or
import { load } from 'js-yaml';   // npm: js-yaml
const data = load(yamlString);

Creating

import yaml
yaml.dump(data, default_flow_style=False, allow_unicode=True)
import YAML from 'yaml';
YAML.stringify(data);

Validating

  • yamllint: Linter for YAML files — checks syntax, indentation, line length.
  • JSON Schema: YAML files can be validated against JSON Schema.
  • yq: CLI processor — yq eval '.' file.yaml validates and outputs.
  • IDE support: VS Code YAML extension (Red Hat) provides schema-based validation.

Security Warning

Never use yaml.load() (unsafe) in Python — it can execute arbitrary code via YAML tags like !!python/object. Always use yaml.safe_load(). Similar cautions apply in Ruby and other languages.

Common Use Cases

  • CI/CD: GitHub Actions, GitLab CI, CircleCI, Azure Pipelines.
  • Container orchestration: Kubernetes manifests, Docker Compose.
  • Infrastructure-as-code: Ansible playbooks, CloudFormation, Helm charts.
  • Application config: Rails database.yml, Spring Boot application.yml, Hugo.
  • API specifications: OpenAPI/Swagger definitions.
  • Static site generators: Front matter in Jekyll, Hugo, Eleventy.

Pros & Cons

Pros

  • Highly readable — clean, minimal syntax for configuration.
  • Comments supported — essential for annotating config files.
  • Superset of JSON (YAML 1.2) — valid JSON is valid YAML.
  • Multi-line string handling is excellent (block and folded scalars).
  • Anchors/aliases reduce repetition in complex configs.
  • Rich type system including dates, nulls, and binary.

Cons

  • Indentation sensitivity causes subtle, hard-to-debug errors.
  • Implicit typing is dangerous: NO becomes false, 3.0 becomes a float, 1:30 can become 90.
  • Security vulnerabilities from unsafe deserialization (arbitrary code execution).
  • Large YAML files become hard to manage — deeply nested structures are painful.
  • The spec is surprisingly complex (YAML 1.2 spec is 80+ pages).
  • Anchors/aliases add complexity and can be exploited ("billion laughs" attack).
  • No standardized schema language (borrows JSON Schema).

Compatibility

LanguageBuilt-inPopular Library
PythonNoPyYAML, ruamel.yaml
JavaScriptNojs-yaml, yaml
JavaNoSnakeYAML, Jackson YAML
GoNogopkg.in/yaml.v3
RubyYesPsych (stdlib)
RustNoserde_yaml
C#NoYamlDotNet

MIME type: application/yaml (RFC 9512). File extensions: .yaml, .yml.

Practical Usage

Safely parse and merge multiple YAML config files in Python

import yaml
from pathlib import Path

def load_config(*paths):
    """Merge multiple YAML config files, later files override earlier ones."""
    merged = {}
    for path in paths:
        with open(path) as f:
            data = yaml.safe_load(f)  # ALWAYS safe_load, never load()
            if data:
                merged.update(data)
    return merged

config = load_config("defaults.yaml", "production.yaml", "overrides.yaml")
print(yaml.dump(config, default_flow_style=False))

Lint and validate YAML files in a CI pipeline

# Install yamllint
pip install yamllint

# Lint all YAML files in a project
yamllint -d "{extends: default, rules: {line-length: {max: 120}}}" .

# Validate Kubernetes manifests against their schema
pip install kubeval
kubeval deployment.yaml --strict

# Use yq to query and transform YAML from the command line
yq eval '.spec.replicas' deployment.yaml
yq eval '.metadata.labels.app = "myapp"' -i deployment.yaml

Convert between YAML and JSON

# YAML to JSON (using yq)
yq eval -o=json config.yaml > config.json

# JSON to YAML
yq eval -P config.json > config.yaml

# Python one-liner
python3 -c "import yaml, json, sys; print(json.dumps(yaml.safe_load(open(sys.argv[1])), indent=2))" config.yaml

Anti-Patterns

Using tabs for indentation instead of spaces. YAML strictly forbids tabs, and mixing tabs and spaces produces cryptic parse errors that are invisible in many editors. Configure your editor to insert spaces (conventionally 2) for YAML files and enable visible whitespace.

Relying on implicit boolean typing for configuration values. Country code NO (Norway) silently becomes false, version 1.10 becomes float 1.1, and on/off become booleans. Always quote ambiguous string values: country: "NO", version: "1.10".

Deeply nesting YAML beyond 4-5 levels without considering alternatives. Excessive nesting makes YAML unreadable and error-prone due to indentation sensitivity. Flatten structures, split into separate files, or use templating tools (Helm, Kustomize, Jsonnet) for complex configurations.

Using YAML for machine-to-machine data interchange between services. YAML's implicit typing, parser inconsistencies across languages, and complexity make it unreliable for API payloads. Use JSON or Protocol Buffers for service communication; reserve YAML for human-authored configuration.

Accepting untrusted YAML input without size and depth limits. The "billion laughs" attack exploits YAML anchors and aliases to create exponential memory expansion. Always set parse limits on untrusted input and use safe deserialization functions (yaml.safe_load() in Python, YAML.safe_load() in Ruby).

Related Formats

  • JSON: Subset of YAML; simpler but no comments.
  • TOML: Alternative config format with explicit typing, no indentation sensitivity.
  • INI: Simpler flat configuration format.
  • JSON5: JSON with comments and relaxed syntax.
  • StrictYAML: YAML subset that disables implicit typing for safety.
  • CUE: Configuration language with validation built in.

Install this skill directly: skilldb add file-formats-skills

Get CLI access →