Technology & EngineeringFile Formats240 lines

Zstandard Compression

The Zstandard (zstd) compression format — Facebook's modern compressor offering dramatically better speed-to-ratio tradeoffs than gzip, with dictionary support and streaming capabilities.

Quick Summary18 lines

You are a file format specialist with deep expertise in Zstandard (zstd) compression, including compression level tuning (1-22), dictionary training for small data, multi-threaded and adaptive compression modes, filesystem integration (Btrfs, ZFS), and database/data pipeline compression strategies.

## Key Points

- **Extension:** `.zst`, `.zstd`, `.tar.zst`/`.tzst`
- **MIME type:** `application/zstd`
- **Magic bytes:** `\x28\xB5\x2F\xFD` (4 bytes)
- **Algorithm:** Finite State Entropy (tANS) + LZ77 variant
- **Compression levels:** 1-22 (default 3), negative levels for ultra-fast
- **Dictionary support:** Pre-trained dictionaries for small data compression
- **Specification:** RFC 8478 (informational), RFC 8878 (format)
- Frame Content Size (optional, 0-8 bytes)
- Window Size
- Dictionary ID (optional)
- Content Checksum flag
- Block Header (3 bytes: type + size)

skilldb get file-formats-skills/Zstandard CompressionFull skill: 240 lines

Paste into your CLAUDE.md or agent config

You are a file format specialist with deep expertise in Zstandard (zstd) compression, including compression level tuning (1-22), dictionary training for small data, multi-threaded and adaptive compression modes, filesystem integration (Btrfs, ZFS), and database/data pipeline compression strategies.

Zstandard Compression (.zst)

Overview

Zstandard (zstd) is a modern compression algorithm and format developed by Yann Collet at Facebook (Meta), released in 2016. It provides a revolutionary speed-to-compression-ratio tradeoff, compressing faster than gzip while achieving ratios close to LZMA2/XZ. Zstandard has been rapidly adopted across the industry — used in Linux kernel, databases, file systems (Btrfs, ZFS), package managers, and data infrastructure.

The key innovation is an extremely wide range of compression levels (1-22) spanning from "faster than LZ4" to "approaching LZMA2 ratio," all in a single algorithm with fast decompression at every level.

Core Philosophy

Zstandard (zstd) represents the modern synthesis of compression research: it matches or exceeds the compression ratios of gzip and bzip2 while compressing and decompressing dramatically faster. Developed by Yann Collet at Facebook and released in 2016, zstd was designed with the understanding that CPU speed and compression algorithm efficiency are both resources that should be balanced, not traded off.

zstd's adjustable compression levels (1-22, with a default of 3) provide a continuous tradeoff between speed and ratio. At low levels, zstd compresses faster than gzip with comparable output sizes. At high levels, it approaches xz-class compression ratios. This tunability means a single tool can serve both real-time log compression (level 1-3) and archival storage (level 19-22).

For new projects, zstd should be the default compression choice unless specific compatibility requirements dictate otherwise. Use gzip when you need universal compatibility (HTTP Content-Encoding, legacy systems). Use xz when you need the absolute smallest files and can afford slow compression. Use zstd for everything else — it is faster, more flexible, and increasingly well-supported across operating systems, programming languages, and tools.

Technical Specifications

Extension: .zst, .zstd, .tar.zst/.tzst
MIME type: application/zstd
Magic bytes: \x28\xB5\x2F\xFD (4 bytes)
Algorithm: Finite State Entropy (tANS) + LZ77 variant
Compression levels: 1-22 (default 3), negative levels for ultra-fast
Dictionary support: Pre-trained dictionaries for small data compression
Specification: RFC 8478 (informational), RFC 8878 (format)

Internal Structure

[Magic Number (4 bytes)]
[Frame Header]
  - Frame Content Size (optional, 0-8 bytes)
  - Window Size
  - Dictionary ID (optional)
  - Content Checksum flag
[Data Block 1]
  - Block Header (3 bytes: type + size)
  - Block Data (Literals + Sequences sections)
[Data Block 2]
...
[Content Checksum (optional, 4 bytes)]

Zstandard also supports a "skippable frame" format for embedding metadata, and multiple frames can be concatenated.

How to Work With It

Compressing

# Basic compression
zstd file.txt                      # creates file.txt.zst (level 3)
zstd -k file.txt                   # keep original
zstd -19 file.txt                  # high compression (still faster than xz)
zstd --ultra -22 file.txt          # maximum compression
zstd -1 file.txt                   # fastest (faster than gzip -1, better ratio)
zstd -T0 file.txt                  # use all CPU threads

# Create tar.zst
tar --zstd -cf archive.tar.zst folder/
tar cf - folder/ | zstd -T0 -19 > archive.tar.zst

# Long-range matching for large files
zstd --long=31 largefile.dat       # use 2GB window (default 128KB-8MB)

# Dictionary compression (for many small similar files)
zstd --train -o dict training_data/*
zstd -D dict small_file.json

Decompressing

zstd -d file.txt.zst              # decompress
unzstd file.txt.zst               # same
zstdcat file.txt.zst              # to stdout

# With dictionary
zstd -d -D dict compressed.zst

# Python
import zstandard
dctx = zstandard.ZstdDecompressor()
with open('file.txt.zst', 'rb') as ifh:
    data = dctx.decompress(ifh.read())

Benchmarking and Tuning

# Benchmark a file across multiple levels
zstd -b1 -e19 file.txt            # benchmark levels 1-19

# Adaptive compression (adjusts level to match I/O speed)
zstd --adapt file.txt              # auto-adjusts to I/O bottleneck
zstd --adapt=min=1,max=12 file.txt

# Train dictionary on sample data
zstd --train -o mydict samples/*   # requires many small similar files

Integration Patterns

# Database backup
pg_dump mydb | zstd -T0 -12 > backup.sql.zst

# Log compression
journalctl --since today | zstd > journal.zst

# Network transfer
tar cf - folder/ | zstd -T0 | ssh remote 'zstd -d | tar xf -'

Common Use Cases

Filesystem compression: Btrfs and ZFS use zstd natively for transparent compression
Database storage: RocksDB, MySQL, ClickHouse use zstd for data compression
Package management: Arch Linux (.pkg.tar.zst), Fedora, Ubuntu (apt zstd support)
Log compression: Replacing gzip for rotated logs (much faster, better ratio)
Data pipelines: Kafka, Spark, Parquet support zstd compression
Real-time compression: Network protocols, game assets, streaming data
Small data compression: Dictionary mode excels at compressing JSON, protocol buffers

Pros & Cons

Pros

Outstanding speed-to-ratio tradeoff at every compression level
Decompression is extremely fast (1500+ MB/s) regardless of compression level
Wide compression level range (1-22) covers diverse use cases in one algorithm
Multi-threaded compression built into the reference implementation
Dictionary compression for small data (JSON, protobuf) is uniquely powerful
Adaptive mode adjusts compression level to match I/O bandwidth
Open source (BSD license), backed by Meta with active development

Cons

Less universal than gzip — not all tools/systems support it yet
Not natively supported by web browsers for HTTP content encoding (Brotli is)
Maximum compression ratio still slightly below XZ/LZMA2 at highest levels
Dictionary training requires representative sample data
Relatively young format (since 2016) compared to gzip/bzip2
Large decompression window at --long mode can use significant memory

Compatibility

Platform	Support	Notes
Linux	Excellent	Kernel support, all major distros package `zstd`
macOS	Good	Available via Homebrew (`brew install zstd`)
Windows	Good	Available as CLI, 7-Zip 23+ supports zstd
Browsers	No	Not supported for HTTP Content-Encoding (use Brotli)

Programming languages: Python (zstandard), Node.js (@aspect-build/zstd), Go (github.com/klauspost/compress/zstd), Java (zstd-jni), Rust (zstd crate), C (libzstd reference).

Storage systems: Btrfs, ZFS, SquashFS, Kafka, RocksDB, ClickHouse, PostgreSQL (TOAST), MySQL/InnoDB.

Practical Usage

Train and use a dictionary for compressing small JSON records

# Collect training samples (need ~100+ representative files)
mkdir samples && cp api_responses/*.json samples/

# Train a dictionary
zstd --train -o api_dict samples/*.json

# Compress individual records using the dictionary (3-5x better than without)
zstd -D api_dict record.json -o record.json.zst

# Decompress (same dictionary required)
zstd -d -D api_dict record.json.zst

Compress a PostgreSQL database dump with streaming zstd

# Backup with zstd compression (much faster than gzip, better ratio)
pg_dump -Fc mydb | zstd -T0 -12 > backup.sql.zst

# Restore from compressed backup
zstd -d -c backup.sql.zst | pg_restore -d mydb

# Network transfer with on-the-fly compression
tar cf - /data/ | zstd -T0 -3 | ssh remote 'zstd -d | tar xf - -C /data/'

Benchmark compression levels to find the optimal tradeoff

import zstandard as zstd
import time
import os

data = open("testfile.bin", "rb").read()
print(f"Original: {len(data):,} bytes\n")
print(f"{'Level':>5} {'Size':>12} {'Ratio':>7} {'Compress':>10} {'Decompress':>12}")

for level in [1, 3, 6, 12, 19]:
    cctx = zstd.ZstdCompressor(level=level)
    t0 = time.perf_counter()
    compressed = cctx.compress(data)
    ct = time.perf_counter() - t0

    dctx = zstd.ZstdDecompressor()
    t0 = time.perf_counter()
    dctx.decompress(compressed)
    dt = time.perf_counter() - t0

    ratio = len(data) / len(compressed)
    print(f"{level:>5} {len(compressed):>12,} {ratio:>7.2f}x {ct:>9.3f}s {dt:>11.3f}s")

Anti-Patterns

Using --ultra -22 for routine compression tasks. Maximum compression is extremely slow and memory-intensive with diminishing returns over level 15. Levels 12-15 achieve nearly the same ratio at a fraction of the CPU and memory cost; reserve ultra-high levels for final archival only.

Ignoring dictionary compression for workloads with many small similar records. Compressing thousands of 1 KB JSON objects individually without a dictionary wastes most of zstd's potential because LZ77 has too little context. Dictionary mode can improve compression ratios by 3-5x on small data.

Falling back to gzip out of habit when zstd is available. Zstd at level 1 compresses faster than gzip at level 6 while achieving a better ratio. There is no speed or ratio reason to prefer gzip when zstd is an option in your toolchain.

Compressing already-compressed data (JPEG, MP4, encrypted files) with zstd. High-entropy data cannot be further compressed. Zstd will waste CPU time and may slightly increase file size due to framing overhead. Skip compression for media files and encrypted blobs.

Using zstd for HTTP Content-Encoding expecting browser support. Unlike Brotli and gzip, zstd is not natively supported by web browsers for Content-Encoding. Use Brotli for web delivery and reserve zstd for server-side data pipelines, filesystems, and database storage.

Related Formats

gzip — Slower and lower ratio, but universally supported
LZ4 — Even faster compression/decompression but lower ratio (same author)
XZ/LZMA — Higher maximum ratio but much slower
Brotli — Google's web-focused compressor, supported by browsers
Snappy — Google's fast compressor (similar niche to LZ4)

Install this skill directly: skilldb add file-formats-skills

Get CLI access →

Zstandard Compression (.zst)

Overview

Core Philosophy

Technical Specifications

Internal Structure

How to Work With It

Compressing

Basic compression

Create tar.zst

Long-range matching for large files

Dictionary compression (for many small similar files)

Decompressing

With dictionary

Python

Benchmarking and Tuning

Benchmark a file across multiple levels

Adaptive compression (adjusts level to match I/O speed)

Train dictionary on sample data

Integration Patterns

Database backup

Log compression

Network transfer

Common Use Cases

Pros & Cons

Pros

Cons

Compatibility

Practical Usage

Train and use a dictionary for compressing small JSON records

Collect training samples (need ~100+ representative files)

Train a dictionary

Compress individual records using the dictionary (3-5x better than without)

Decompress (same dictionary required)

Compress a PostgreSQL database dump with streaming zstd

Backup with zstd compression (much faster than gzip, better ratio)

Restore from compressed backup

Network transfer with on-the-fly compression

Benchmark compression levels to find the optimal tradeoff

Anti-Patterns

Related Formats

Details

Pack: file-formats-skills
File: zst.md
Lines: 240
Category: Technology & Engineering

Download via CLI

Pro

$ skilldb add file-formats-skills

Installs the full File Formats pack to your project.

Zstandard Compression

Zstandard Compression (.zst)

Overview

Core Philosophy

Technical Specifications

Internal Structure

How to Work With It

Compressing

Decompressing

Benchmarking and Tuning

Integration Patterns

Common Use Cases

Pros & Cons

Pros

Cons

Compatibility

Practical Usage

Train and use a dictionary for compressing small JSON records

Compress a PostgreSQL database dump with streaming zstd

Benchmark compression levels to find the optimal tradeoff

Anti-Patterns

Related Formats

Related Skills

3MF 3D Manufacturing Format

7-Zip Compressed Archive

AAC (Advanced Audio Coding)

AC3 (Dolby Digital)

AI Adobe Illustrator Format

AIFF (Audio Interchange File Format)