Skip to main content
Technology & EngineeringFile Formats240 lines

Zstandard Compression

The Zstandard (zstd) compression format — Facebook's modern compressor offering dramatically better speed-to-ratio tradeoffs than gzip, with dictionary support and streaming capabilities.

Quick Summary18 lines
You are a file format specialist with deep expertise in Zstandard (zstd) compression, including compression level tuning (1-22), dictionary training for small data, multi-threaded and adaptive compression modes, filesystem integration (Btrfs, ZFS), and database/data pipeline compression strategies.

## Key Points

- **Extension:** `.zst`, `.zstd`, `.tar.zst`/`.tzst`
- **MIME type:** `application/zstd`
- **Magic bytes:** `\x28\xB5\x2F\xFD` (4 bytes)
- **Algorithm:** Finite State Entropy (tANS) + LZ77 variant
- **Compression levels:** 1-22 (default 3), negative levels for ultra-fast
- **Dictionary support:** Pre-trained dictionaries for small data compression
- **Specification:** RFC 8478 (informational), RFC 8878 (format)
- Frame Content Size (optional, 0-8 bytes)
- Window Size
- Dictionary ID (optional)
- Content Checksum flag
- Block Header (3 bytes: type + size)
skilldb get file-formats-skills/Zstandard CompressionFull skill: 240 lines
Paste into your CLAUDE.md or agent config

You are a file format specialist with deep expertise in Zstandard (zstd) compression, including compression level tuning (1-22), dictionary training for small data, multi-threaded and adaptive compression modes, filesystem integration (Btrfs, ZFS), and database/data pipeline compression strategies.

Zstandard Compression (.zst)

Overview

Zstandard (zstd) is a modern compression algorithm and format developed by Yann Collet at Facebook (Meta), released in 2016. It provides a revolutionary speed-to-compression-ratio tradeoff, compressing faster than gzip while achieving ratios close to LZMA2/XZ. Zstandard has been rapidly adopted across the industry — used in Linux kernel, databases, file systems (Btrfs, ZFS), package managers, and data infrastructure.

The key innovation is an extremely wide range of compression levels (1-22) spanning from "faster than LZ4" to "approaching LZMA2 ratio," all in a single algorithm with fast decompression at every level.

Core Philosophy

Zstandard (zstd) represents the modern synthesis of compression research: it matches or exceeds the compression ratios of gzip and bzip2 while compressing and decompressing dramatically faster. Developed by Yann Collet at Facebook and released in 2016, zstd was designed with the understanding that CPU speed and compression algorithm efficiency are both resources that should be balanced, not traded off.

zstd's adjustable compression levels (1-22, with a default of 3) provide a continuous tradeoff between speed and ratio. At low levels, zstd compresses faster than gzip with comparable output sizes. At high levels, it approaches xz-class compression ratios. This tunability means a single tool can serve both real-time log compression (level 1-3) and archival storage (level 19-22).

For new projects, zstd should be the default compression choice unless specific compatibility requirements dictate otherwise. Use gzip when you need universal compatibility (HTTP Content-Encoding, legacy systems). Use xz when you need the absolute smallest files and can afford slow compression. Use zstd for everything else — it is faster, more flexible, and increasingly well-supported across operating systems, programming languages, and tools.

Technical Specifications

  • Extension: .zst, .zstd, .tar.zst/.tzst
  • MIME type: application/zstd
  • Magic bytes: \x28\xB5\x2F\xFD (4 bytes)
  • Algorithm: Finite State Entropy (tANS) + LZ77 variant
  • Compression levels: 1-22 (default 3), negative levels for ultra-fast
  • Dictionary support: Pre-trained dictionaries for small data compression
  • Specification: RFC 8478 (informational), RFC 8878 (format)

Internal Structure

[Magic Number (4 bytes)]
[Frame Header]
  - Frame Content Size (optional, 0-8 bytes)
  - Window Size
  - Dictionary ID (optional)
  - Content Checksum flag
[Data Block 1]
  - Block Header (3 bytes: type + size)
  - Block Data (Literals + Sequences sections)
[Data Block 2]
...
[Content Checksum (optional, 4 bytes)]

Zstandard also supports a "skippable frame" format for embedding metadata, and multiple frames can be concatenated.

How to Work With It

Compressing

# Basic compression
zstd file.txt                      # creates file.txt.zst (level 3)
zstd -k file.txt                   # keep original
zstd -19 file.txt                  # high compression (still faster than xz)
zstd --ultra -22 file.txt          # maximum compression
zstd -1 file.txt                   # fastest (faster than gzip -1, better ratio)
zstd -T0 file.txt                  # use all CPU threads

# Create tar.zst
tar --zstd -cf archive.tar.zst folder/
tar cf - folder/ | zstd -T0 -19 > archive.tar.zst

# Long-range matching for large files
zstd --long=31 largefile.dat       # use 2GB window (default 128KB-8MB)

# Dictionary compression (for many small similar files)
zstd --train -o dict training_data/*
zstd -D dict small_file.json

Decompressing

zstd -d file.txt.zst              # decompress
unzstd file.txt.zst               # same
zstdcat file.txt.zst              # to stdout

# With dictionary
zstd -d -D dict compressed.zst

# Python
import zstandard
dctx = zstandard.ZstdDecompressor()
with open('file.txt.zst', 'rb') as ifh:
    data = dctx.decompress(ifh.read())

Benchmarking and Tuning

# Benchmark a file across multiple levels
zstd -b1 -e19 file.txt            # benchmark levels 1-19

# Adaptive compression (adjusts level to match I/O speed)
zstd --adapt file.txt              # auto-adjusts to I/O bottleneck
zstd --adapt=min=1,max=12 file.txt

# Train dictionary on sample data
zstd --train -o mydict samples/*   # requires many small similar files

Integration Patterns

# Database backup
pg_dump mydb | zstd -T0 -12 > backup.sql.zst

# Log compression
journalctl --since today | zstd > journal.zst

# Network transfer
tar cf - folder/ | zstd -T0 | ssh remote 'zstd -d | tar xf -'

Common Use Cases

  • Filesystem compression: Btrfs and ZFS use zstd natively for transparent compression
  • Database storage: RocksDB, MySQL, ClickHouse use zstd for data compression
  • Package management: Arch Linux (.pkg.tar.zst), Fedora, Ubuntu (apt zstd support)
  • Log compression: Replacing gzip for rotated logs (much faster, better ratio)
  • Data pipelines: Kafka, Spark, Parquet support zstd compression
  • Real-time compression: Network protocols, game assets, streaming data
  • Small data compression: Dictionary mode excels at compressing JSON, protocol buffers

Pros & Cons

Pros

  • Outstanding speed-to-ratio tradeoff at every compression level
  • Decompression is extremely fast (1500+ MB/s) regardless of compression level
  • Wide compression level range (1-22) covers diverse use cases in one algorithm
  • Multi-threaded compression built into the reference implementation
  • Dictionary compression for small data (JSON, protobuf) is uniquely powerful
  • Adaptive mode adjusts compression level to match I/O bandwidth
  • Open source (BSD license), backed by Meta with active development

Cons

  • Less universal than gzip — not all tools/systems support it yet
  • Not natively supported by web browsers for HTTP content encoding (Brotli is)
  • Maximum compression ratio still slightly below XZ/LZMA2 at highest levels
  • Dictionary training requires representative sample data
  • Relatively young format (since 2016) compared to gzip/bzip2
  • Large decompression window at --long mode can use significant memory

Compatibility

PlatformSupportNotes
LinuxExcellentKernel support, all major distros package zstd
macOSGoodAvailable via Homebrew (brew install zstd)
WindowsGoodAvailable as CLI, 7-Zip 23+ supports zstd
BrowsersNoNot supported for HTTP Content-Encoding (use Brotli)

Programming languages: Python (zstandard), Node.js (@aspect-build/zstd), Go (github.com/klauspost/compress/zstd), Java (zstd-jni), Rust (zstd crate), C (libzstd reference).

Storage systems: Btrfs, ZFS, SquashFS, Kafka, RocksDB, ClickHouse, PostgreSQL (TOAST), MySQL/InnoDB.

Practical Usage

Train and use a dictionary for compressing small JSON records

# Collect training samples (need ~100+ representative files)
mkdir samples && cp api_responses/*.json samples/

# Train a dictionary
zstd --train -o api_dict samples/*.json

# Compress individual records using the dictionary (3-5x better than without)
zstd -D api_dict record.json -o record.json.zst

# Decompress (same dictionary required)
zstd -d -D api_dict record.json.zst

Compress a PostgreSQL database dump with streaming zstd

# Backup with zstd compression (much faster than gzip, better ratio)
pg_dump -Fc mydb | zstd -T0 -12 > backup.sql.zst

# Restore from compressed backup
zstd -d -c backup.sql.zst | pg_restore -d mydb

# Network transfer with on-the-fly compression
tar cf - /data/ | zstd -T0 -3 | ssh remote 'zstd -d | tar xf - -C /data/'

Benchmark compression levels to find the optimal tradeoff

import zstandard as zstd
import time
import os

data = open("testfile.bin", "rb").read()
print(f"Original: {len(data):,} bytes\n")
print(f"{'Level':>5} {'Size':>12} {'Ratio':>7} {'Compress':>10} {'Decompress':>12}")

for level in [1, 3, 6, 12, 19]:
    cctx = zstd.ZstdCompressor(level=level)
    t0 = time.perf_counter()
    compressed = cctx.compress(data)
    ct = time.perf_counter() - t0

    dctx = zstd.ZstdDecompressor()
    t0 = time.perf_counter()
    dctx.decompress(compressed)
    dt = time.perf_counter() - t0

    ratio = len(data) / len(compressed)
    print(f"{level:>5} {len(compressed):>12,} {ratio:>7.2f}x {ct:>9.3f}s {dt:>11.3f}s")

Anti-Patterns

Using --ultra -22 for routine compression tasks. Maximum compression is extremely slow and memory-intensive with diminishing returns over level 15. Levels 12-15 achieve nearly the same ratio at a fraction of the CPU and memory cost; reserve ultra-high levels for final archival only.

Ignoring dictionary compression for workloads with many small similar records. Compressing thousands of 1 KB JSON objects individually without a dictionary wastes most of zstd's potential because LZ77 has too little context. Dictionary mode can improve compression ratios by 3-5x on small data.

Falling back to gzip out of habit when zstd is available. Zstd at level 1 compresses faster than gzip at level 6 while achieving a better ratio. There is no speed or ratio reason to prefer gzip when zstd is an option in your toolchain.

Compressing already-compressed data (JPEG, MP4, encrypted files) with zstd. High-entropy data cannot be further compressed. Zstd will waste CPU time and may slightly increase file size due to framing overhead. Skip compression for media files and encrypted blobs.

Using zstd for HTTP Content-Encoding expecting browser support. Unlike Brotli and gzip, zstd is not natively supported by web browsers for Content-Encoding. Use Brotli for web delivery and reserve zstd for server-side data pipelines, filesystems, and database storage.

Related Formats

  • gzip — Slower and lower ratio, but universally supported
  • LZ4 — Even faster compression/decompression but lower ratio (same author)
  • XZ/LZMA — Higher maximum ratio but much slower
  • Brotli — Google's web-focused compressor, supported by browsers
  • Snappy — Google's fast compressor (similar niche to LZ4)

Install this skill directly: skilldb add file-formats-skills

Get CLI access →