YAML
YAML Ain't Markup Language — a human-friendly data serialization format popular for configuration files, CI/CD pipelines, and infrastructure-as-code.
You are a file format specialist with deep expertise in YAML, including the 1.2 specification, implicit typing pitfalls, safe deserialization practices, anchor/alias patterns, Kubernetes and CI/CD manifest authoring, yamllint validation, and comparisons with TOML and JSON for configuration use cases.
## Key Points
- **Strings**: Unquoted, single-quoted, or double-quoted. Double quotes support escapes.
- **Numbers**: `42`, `3.14`, `0xFF`, `1e10`, `.inf`, `.nan`
- **Booleans**: `true`/`false` (YAML 1.2). Older parsers also accept `yes`/`no`, `on`/`off`.
- **Null**: `null`, `~`, or simply omitting the value.
- **Sequences** (arrays): Block style with `-` prefix or flow style `[a, b, c]`.
- **Mappings** (objects): Block style with `key: value` or flow style `{a: 1, b: 2}`.
- **Timestamps**: `2025-01-15`, `2025-01-15T10:30:00Z`
- Indentation is **spaces only** — tabs cause parse errors.
- Consistent indentation within a level (2 spaces is conventional).
- Strings that look like other types should be quoted: `version: "1.0"` not `version: 1.0`.
- The `---` marker separates multiple documents in one file.
- `...` marks the end of a document.
## Quick Example
```python
import yaml # PyYAML
with open("config.yaml") as f:
data = yaml.safe_load(f) # ALWAYS use safe_load, never load()
# For multiple documents:
docs = list(yaml.safe_load_all(f))
```
```javascript
import YAML from 'yaml'; // npm: yaml
const data = YAML.parse(yamlString);
// or
import { load } from 'js-yaml'; // npm: js-yaml
const data = load(yamlString);
```skilldb get file-formats-skills/YAMLFull skill: 248 linesYou are a file format specialist with deep expertise in YAML, including the 1.2 specification, implicit typing pitfalls, safe deserialization practices, anchor/alias patterns, Kubernetes and CI/CD manifest authoring, yamllint validation, and comparisons with TOML and JSON for configuration use cases.
YAML — YAML Ain't Markup Language
Overview
YAML is a human-friendly data serialization language designed to be easy to read and write. Originally standing for "Yet Another Markup Language" (later backronymed), YAML has become the de facto standard for configuration files in DevOps tooling, CI/CD pipelines, container orchestration, and infrastructure-as-code. YAML 1.2 (2009) is a strict superset of JSON, meaning any valid JSON document is also valid YAML.
Core Philosophy
YAML's philosophy is human readability above all else. The format uses indentation for structure, avoids the syntactic noise of braces and brackets, and reads almost like a structured outline. This readability is why YAML became the dominant format for configuration in DevOps tooling — Kubernetes manifests, CI/CD pipelines (GitHub Actions, GitLab CI), Ansible playbooks, and Docker Compose files are all YAML.
YAML's power comes with well-documented pitfalls. Implicit type coercion silently converts no to false, 3.10 to 3.1, and on to true. The Norway problem (NO becoming false) is the most famous example. Anchors and aliases enable DRY configuration but can create security vulnerabilities if untrusted YAML is parsed with full feature support. Always use safe/strict YAML loaders and quote values that might be misinterpreted.
Use YAML when readability for human operators matters — configuration files that are manually edited, deployment manifests that need to be reviewed in pull requests, and infrastructure-as-code that serves as documentation of system state. For machine-generated configuration, JSON is safer (no implicit type coercion). For simple key-value configuration, TOML is more predictable. YAML is the right choice when the configuration is complex enough to benefit from its readability and humans are the primary audience.
Technical Specifications
Syntax and Structure
YAML uses indentation (spaces only, never tabs) to denote structure:
# Application configuration
app:
name: MyService
version: 3.2.1
debug: false
server:
host: 0.0.0.0
port: 8080
ssl:
enabled: true
cert: /etc/ssl/cert.pem
database:
url: "postgres://db:5432/myapp"
pool_size: 10
replicas:
- host: replica1.db.internal
port: 5432
- host: replica2.db.internal
port: 5432
# Multi-line strings
description: |
This is a block scalar.
Line breaks are preserved.
summary: >
This is a folded scalar.
Line breaks become spaces.
Blank lines become newlines.
# Anchors and aliases (DRY)
defaults: &defaults
timeout: 30
retries: 3
production:
<<: *defaults
timeout: 60
Data Types
- Strings: Unquoted, single-quoted, or double-quoted. Double quotes support escapes.
- Numbers:
42,3.14,0xFF,1e10,.inf,.nan - Booleans:
true/false(YAML 1.2). Older parsers also acceptyes/no,on/off. - Null:
null,~, or simply omitting the value. - Sequences (arrays): Block style with
-prefix or flow style[a, b, c]. - Mappings (objects): Block style with
key: valueor flow style{a: 1, b: 2}. - Timestamps:
2025-01-15,2025-01-15T10:30:00Z
Key Rules
- Indentation is spaces only — tabs cause parse errors.
- Consistent indentation within a level (2 spaces is conventional).
- Strings that look like other types should be quoted:
version: "1.0"notversion: 1.0. - The
---marker separates multiple documents in one file. ...marks the end of a document.- Comments start with
#.
How to Work With It
Parsing
import yaml # PyYAML
with open("config.yaml") as f:
data = yaml.safe_load(f) # ALWAYS use safe_load, never load()
# For multiple documents:
docs = list(yaml.safe_load_all(f))
import YAML from 'yaml'; // npm: yaml
const data = YAML.parse(yamlString);
// or
import { load } from 'js-yaml'; // npm: js-yaml
const data = load(yamlString);
Creating
import yaml
yaml.dump(data, default_flow_style=False, allow_unicode=True)
import YAML from 'yaml';
YAML.stringify(data);
Validating
- yamllint: Linter for YAML files — checks syntax, indentation, line length.
- JSON Schema: YAML files can be validated against JSON Schema.
- yq: CLI processor —
yq eval '.' file.yamlvalidates and outputs. - IDE support: VS Code YAML extension (Red Hat) provides schema-based validation.
Security Warning
Never use yaml.load() (unsafe) in Python — it can execute arbitrary code via YAML tags like !!python/object. Always use yaml.safe_load(). Similar cautions apply in Ruby and other languages.
Common Use Cases
- CI/CD: GitHub Actions, GitLab CI, CircleCI, Azure Pipelines.
- Container orchestration: Kubernetes manifests, Docker Compose.
- Infrastructure-as-code: Ansible playbooks, CloudFormation, Helm charts.
- Application config: Rails
database.yml, Spring Bootapplication.yml, Hugo. - API specifications: OpenAPI/Swagger definitions.
- Static site generators: Front matter in Jekyll, Hugo, Eleventy.
Pros & Cons
Pros
- Highly readable — clean, minimal syntax for configuration.
- Comments supported — essential for annotating config files.
- Superset of JSON (YAML 1.2) — valid JSON is valid YAML.
- Multi-line string handling is excellent (block and folded scalars).
- Anchors/aliases reduce repetition in complex configs.
- Rich type system including dates, nulls, and binary.
Cons
- Indentation sensitivity causes subtle, hard-to-debug errors.
- Implicit typing is dangerous:
NObecomesfalse,3.0becomes a float,1:30can become 90. - Security vulnerabilities from unsafe deserialization (arbitrary code execution).
- Large YAML files become hard to manage — deeply nested structures are painful.
- The spec is surprisingly complex (YAML 1.2 spec is 80+ pages).
- Anchors/aliases add complexity and can be exploited ("billion laughs" attack).
- No standardized schema language (borrows JSON Schema).
Compatibility
| Language | Built-in | Popular Library |
|---|---|---|
| Python | No | PyYAML, ruamel.yaml |
| JavaScript | No | js-yaml, yaml |
| Java | No | SnakeYAML, Jackson YAML |
| Go | No | gopkg.in/yaml.v3 |
| Ruby | Yes | Psych (stdlib) |
| Rust | No | serde_yaml |
| C# | No | YamlDotNet |
MIME type: application/yaml (RFC 9512). File extensions: .yaml, .yml.
Practical Usage
Safely parse and merge multiple YAML config files in Python
import yaml
from pathlib import Path
def load_config(*paths):
"""Merge multiple YAML config files, later files override earlier ones."""
merged = {}
for path in paths:
with open(path) as f:
data = yaml.safe_load(f) # ALWAYS safe_load, never load()
if data:
merged.update(data)
return merged
config = load_config("defaults.yaml", "production.yaml", "overrides.yaml")
print(yaml.dump(config, default_flow_style=False))
Lint and validate YAML files in a CI pipeline
# Install yamllint
pip install yamllint
# Lint all YAML files in a project
yamllint -d "{extends: default, rules: {line-length: {max: 120}}}" .
# Validate Kubernetes manifests against their schema
pip install kubeval
kubeval deployment.yaml --strict
# Use yq to query and transform YAML from the command line
yq eval '.spec.replicas' deployment.yaml
yq eval '.metadata.labels.app = "myapp"' -i deployment.yaml
Convert between YAML and JSON
# YAML to JSON (using yq)
yq eval -o=json config.yaml > config.json
# JSON to YAML
yq eval -P config.json > config.yaml
# Python one-liner
python3 -c "import yaml, json, sys; print(json.dumps(yaml.safe_load(open(sys.argv[1])), indent=2))" config.yaml
Anti-Patterns
Using tabs for indentation instead of spaces. YAML strictly forbids tabs, and mixing tabs and spaces produces cryptic parse errors that are invisible in many editors. Configure your editor to insert spaces (conventionally 2) for YAML files and enable visible whitespace.
Relying on implicit boolean typing for configuration values. Country code NO (Norway) silently becomes false, version 1.10 becomes float 1.1, and on/off become booleans. Always quote ambiguous string values: country: "NO", version: "1.10".
Deeply nesting YAML beyond 4-5 levels without considering alternatives. Excessive nesting makes YAML unreadable and error-prone due to indentation sensitivity. Flatten structures, split into separate files, or use templating tools (Helm, Kustomize, Jsonnet) for complex configurations.
Using YAML for machine-to-machine data interchange between services. YAML's implicit typing, parser inconsistencies across languages, and complexity make it unreliable for API payloads. Use JSON or Protocol Buffers for service communication; reserve YAML for human-authored configuration.
Accepting untrusted YAML input without size and depth limits. The "billion laughs" attack exploits YAML anchors and aliases to create exponential memory expansion. Always set parse limits on untrusted input and use safe deserialization functions (yaml.safe_load() in Python, YAML.safe_load() in Ruby).
Related Formats
- JSON: Subset of YAML; simpler but no comments.
- TOML: Alternative config format with explicit typing, no indentation sensitivity.
- INI: Simpler flat configuration format.
- JSON5: JSON with comments and relaxed syntax.
- StrictYAML: YAML subset that disables implicit typing for safety.
- CUE: Configuration language with validation built in.
Install this skill directly: skilldb add file-formats-skills
Related Skills
3MF 3D Manufacturing Format
The 3MF file format — the modern replacement for STL in 3D printing, supporting colors, materials, multi-object assemblies, and precise manufacturing data in a single package.
7-Zip Compressed Archive
The 7z archive format — open-source high-ratio compression using LZMA2, with strong AES-256 encryption, solid archives, and multi-threading support.
AAC (Advanced Audio Coding)
A lossy audio codec standardized as part of MPEG-2 and MPEG-4, designed to supersede MP3 with better quality at equivalent or lower bitrates.
AC3 (Dolby Digital)
Dolby's surround sound audio codec used in cinema, DVD, Blu-ray, and broadcast television for multichannel 5.1 audio delivery.
AI Adobe Illustrator Format
AI is Adobe Illustrator's native vector graphics file format, used for
AIFF (Audio Interchange File Format)
Apple's uncompressed audio format storing raw PCM data, serving as the Mac equivalent of WAV for professional audio production.