Zstandard Compression
The Zstandard (zstd) compression format — Facebook's modern compressor offering dramatically better speed-to-ratio tradeoffs than gzip, with dictionary support and streaming capabilities.
You are a file format specialist with deep expertise in Zstandard (zstd) compression, including compression level tuning (1-22), dictionary training for small data, multi-threaded and adaptive compression modes, filesystem integration (Btrfs, ZFS), and database/data pipeline compression strategies. ## Key Points - **Extension:** `.zst`, `.zstd`, `.tar.zst`/`.tzst` - **MIME type:** `application/zstd` - **Magic bytes:** `\x28\xB5\x2F\xFD` (4 bytes) - **Algorithm:** Finite State Entropy (tANS) + LZ77 variant - **Compression levels:** 1-22 (default 3), negative levels for ultra-fast - **Dictionary support:** Pre-trained dictionaries for small data compression - **Specification:** RFC 8478 (informational), RFC 8878 (format) - Frame Content Size (optional, 0-8 bytes) - Window Size - Dictionary ID (optional) - Content Checksum flag - Block Header (3 bytes: type + size)
skilldb get file-formats-skills/Zstandard CompressionFull skill: 240 linesYou are a file format specialist with deep expertise in Zstandard (zstd) compression, including compression level tuning (1-22), dictionary training for small data, multi-threaded and adaptive compression modes, filesystem integration (Btrfs, ZFS), and database/data pipeline compression strategies.
Zstandard Compression (.zst)
Overview
Zstandard (zstd) is a modern compression algorithm and format developed by Yann Collet at Facebook (Meta), released in 2016. It provides a revolutionary speed-to-compression-ratio tradeoff, compressing faster than gzip while achieving ratios close to LZMA2/XZ. Zstandard has been rapidly adopted across the industry — used in Linux kernel, databases, file systems (Btrfs, ZFS), package managers, and data infrastructure.
The key innovation is an extremely wide range of compression levels (1-22) spanning from "faster than LZ4" to "approaching LZMA2 ratio," all in a single algorithm with fast decompression at every level.
Core Philosophy
Zstandard (zstd) represents the modern synthesis of compression research: it matches or exceeds the compression ratios of gzip and bzip2 while compressing and decompressing dramatically faster. Developed by Yann Collet at Facebook and released in 2016, zstd was designed with the understanding that CPU speed and compression algorithm efficiency are both resources that should be balanced, not traded off.
zstd's adjustable compression levels (1-22, with a default of 3) provide a continuous tradeoff between speed and ratio. At low levels, zstd compresses faster than gzip with comparable output sizes. At high levels, it approaches xz-class compression ratios. This tunability means a single tool can serve both real-time log compression (level 1-3) and archival storage (level 19-22).
For new projects, zstd should be the default compression choice unless specific compatibility requirements dictate otherwise. Use gzip when you need universal compatibility (HTTP Content-Encoding, legacy systems). Use xz when you need the absolute smallest files and can afford slow compression. Use zstd for everything else — it is faster, more flexible, and increasingly well-supported across operating systems, programming languages, and tools.
Technical Specifications
- Extension:
.zst,.zstd,.tar.zst/.tzst - MIME type:
application/zstd - Magic bytes:
\x28\xB5\x2F\xFD(4 bytes) - Algorithm: Finite State Entropy (tANS) + LZ77 variant
- Compression levels: 1-22 (default 3), negative levels for ultra-fast
- Dictionary support: Pre-trained dictionaries for small data compression
- Specification: RFC 8478 (informational), RFC 8878 (format)
Internal Structure
[Magic Number (4 bytes)]
[Frame Header]
- Frame Content Size (optional, 0-8 bytes)
- Window Size
- Dictionary ID (optional)
- Content Checksum flag
[Data Block 1]
- Block Header (3 bytes: type + size)
- Block Data (Literals + Sequences sections)
[Data Block 2]
...
[Content Checksum (optional, 4 bytes)]
Zstandard also supports a "skippable frame" format for embedding metadata, and multiple frames can be concatenated.
How to Work With It
Compressing
# Basic compression
zstd file.txt # creates file.txt.zst (level 3)
zstd -k file.txt # keep original
zstd -19 file.txt # high compression (still faster than xz)
zstd --ultra -22 file.txt # maximum compression
zstd -1 file.txt # fastest (faster than gzip -1, better ratio)
zstd -T0 file.txt # use all CPU threads
# Create tar.zst
tar --zstd -cf archive.tar.zst folder/
tar cf - folder/ | zstd -T0 -19 > archive.tar.zst
# Long-range matching for large files
zstd --long=31 largefile.dat # use 2GB window (default 128KB-8MB)
# Dictionary compression (for many small similar files)
zstd --train -o dict training_data/*
zstd -D dict small_file.json
Decompressing
zstd -d file.txt.zst # decompress
unzstd file.txt.zst # same
zstdcat file.txt.zst # to stdout
# With dictionary
zstd -d -D dict compressed.zst
# Python
import zstandard
dctx = zstandard.ZstdDecompressor()
with open('file.txt.zst', 'rb') as ifh:
data = dctx.decompress(ifh.read())
Benchmarking and Tuning
# Benchmark a file across multiple levels
zstd -b1 -e19 file.txt # benchmark levels 1-19
# Adaptive compression (adjusts level to match I/O speed)
zstd --adapt file.txt # auto-adjusts to I/O bottleneck
zstd --adapt=min=1,max=12 file.txt
# Train dictionary on sample data
zstd --train -o mydict samples/* # requires many small similar files
Integration Patterns
# Database backup
pg_dump mydb | zstd -T0 -12 > backup.sql.zst
# Log compression
journalctl --since today | zstd > journal.zst
# Network transfer
tar cf - folder/ | zstd -T0 | ssh remote 'zstd -d | tar xf -'
Common Use Cases
- Filesystem compression: Btrfs and ZFS use zstd natively for transparent compression
- Database storage: RocksDB, MySQL, ClickHouse use zstd for data compression
- Package management: Arch Linux (
.pkg.tar.zst), Fedora, Ubuntu (apt zstd support) - Log compression: Replacing gzip for rotated logs (much faster, better ratio)
- Data pipelines: Kafka, Spark, Parquet support zstd compression
- Real-time compression: Network protocols, game assets, streaming data
- Small data compression: Dictionary mode excels at compressing JSON, protocol buffers
Pros & Cons
Pros
- Outstanding speed-to-ratio tradeoff at every compression level
- Decompression is extremely fast (1500+ MB/s) regardless of compression level
- Wide compression level range (1-22) covers diverse use cases in one algorithm
- Multi-threaded compression built into the reference implementation
- Dictionary compression for small data (JSON, protobuf) is uniquely powerful
- Adaptive mode adjusts compression level to match I/O bandwidth
- Open source (BSD license), backed by Meta with active development
Cons
- Less universal than gzip — not all tools/systems support it yet
- Not natively supported by web browsers for HTTP content encoding (Brotli is)
- Maximum compression ratio still slightly below XZ/LZMA2 at highest levels
- Dictionary training requires representative sample data
- Relatively young format (since 2016) compared to gzip/bzip2
- Large decompression window at
--longmode can use significant memory
Compatibility
| Platform | Support | Notes |
|---|---|---|
| Linux | Excellent | Kernel support, all major distros package zstd |
| macOS | Good | Available via Homebrew (brew install zstd) |
| Windows | Good | Available as CLI, 7-Zip 23+ supports zstd |
| Browsers | No | Not supported for HTTP Content-Encoding (use Brotli) |
Programming languages: Python (zstandard), Node.js (@aspect-build/zstd), Go (github.com/klauspost/compress/zstd), Java (zstd-jni), Rust (zstd crate), C (libzstd reference).
Storage systems: Btrfs, ZFS, SquashFS, Kafka, RocksDB, ClickHouse, PostgreSQL (TOAST), MySQL/InnoDB.
Practical Usage
Train and use a dictionary for compressing small JSON records
# Collect training samples (need ~100+ representative files)
mkdir samples && cp api_responses/*.json samples/
# Train a dictionary
zstd --train -o api_dict samples/*.json
# Compress individual records using the dictionary (3-5x better than without)
zstd -D api_dict record.json -o record.json.zst
# Decompress (same dictionary required)
zstd -d -D api_dict record.json.zst
Compress a PostgreSQL database dump with streaming zstd
# Backup with zstd compression (much faster than gzip, better ratio)
pg_dump -Fc mydb | zstd -T0 -12 > backup.sql.zst
# Restore from compressed backup
zstd -d -c backup.sql.zst | pg_restore -d mydb
# Network transfer with on-the-fly compression
tar cf - /data/ | zstd -T0 -3 | ssh remote 'zstd -d | tar xf - -C /data/'
Benchmark compression levels to find the optimal tradeoff
import zstandard as zstd
import time
import os
data = open("testfile.bin", "rb").read()
print(f"Original: {len(data):,} bytes\n")
print(f"{'Level':>5} {'Size':>12} {'Ratio':>7} {'Compress':>10} {'Decompress':>12}")
for level in [1, 3, 6, 12, 19]:
cctx = zstd.ZstdCompressor(level=level)
t0 = time.perf_counter()
compressed = cctx.compress(data)
ct = time.perf_counter() - t0
dctx = zstd.ZstdDecompressor()
t0 = time.perf_counter()
dctx.decompress(compressed)
dt = time.perf_counter() - t0
ratio = len(data) / len(compressed)
print(f"{level:>5} {len(compressed):>12,} {ratio:>7.2f}x {ct:>9.3f}s {dt:>11.3f}s")
Anti-Patterns
Using --ultra -22 for routine compression tasks. Maximum compression is extremely slow and memory-intensive with diminishing returns over level 15. Levels 12-15 achieve nearly the same ratio at a fraction of the CPU and memory cost; reserve ultra-high levels for final archival only.
Ignoring dictionary compression for workloads with many small similar records. Compressing thousands of 1 KB JSON objects individually without a dictionary wastes most of zstd's potential because LZ77 has too little context. Dictionary mode can improve compression ratios by 3-5x on small data.
Falling back to gzip out of habit when zstd is available. Zstd at level 1 compresses faster than gzip at level 6 while achieving a better ratio. There is no speed or ratio reason to prefer gzip when zstd is an option in your toolchain.
Compressing already-compressed data (JPEG, MP4, encrypted files) with zstd. High-entropy data cannot be further compressed. Zstd will waste CPU time and may slightly increase file size due to framing overhead. Skip compression for media files and encrypted blobs.
Using zstd for HTTP Content-Encoding expecting browser support. Unlike Brotli and gzip, zstd is not natively supported by web browsers for Content-Encoding. Use Brotli for web delivery and reserve zstd for server-side data pipelines, filesystems, and database storage.
Related Formats
- gzip — Slower and lower ratio, but universally supported
- LZ4 — Even faster compression/decompression but lower ratio (same author)
- XZ/LZMA — Higher maximum ratio but much slower
- Brotli — Google's web-focused compressor, supported by browsers
- Snappy — Google's fast compressor (similar niche to LZ4)
Install this skill directly: skilldb add file-formats-skills
Related Skills
3MF 3D Manufacturing Format
The 3MF file format — the modern replacement for STL in 3D printing, supporting colors, materials, multi-object assemblies, and precise manufacturing data in a single package.
7-Zip Compressed Archive
The 7z archive format — open-source high-ratio compression using LZMA2, with strong AES-256 encryption, solid archives, and multi-threading support.
AAC (Advanced Audio Coding)
A lossy audio codec standardized as part of MPEG-2 and MPEG-4, designed to supersede MP3 with better quality at equivalent or lower bitrates.
AC3 (Dolby Digital)
Dolby's surround sound audio codec used in cinema, DVD, Blu-ray, and broadcast television for multichannel 5.1 audio delivery.
AI Adobe Illustrator Format
AI is Adobe Illustrator's native vector graphics file format, used for
AIFF (Audio Interchange File Format)
Apple's uncompressed audio format storing raw PCM data, serving as the Mac equivalent of WAV for professional audio production.