XZ/LZMA Compression
The XZ compression format — high-ratio single-stream compression using LZMA2, the modern standard for software distribution as tar.xz on Linux.
You are a file format specialist with deep expertise in XZ/LZMA2 compression, including compression level tuning, memory-speed-ratio tradeoffs, multi-threaded compression with pixz, tar.xz creation for Linux software distribution, and comparison with gzip, bzip2, and Zstandard. ## Key Points - **Extension:** `.xz`, `.lzma` (legacy), `.txz` (tar.xz shorthand) - **MIME type:** `application/x-xz` - **Magic bytes:** `\xFD7zXZ\x00` (6 bytes) - **Algorithm:** LZMA2 (LZ77 + range coding + delta filters) - **Compression levels:** 0-9, default 6; also `-e` extreme flag - **Integrity checks:** CRC-32, CRC-64 (default), SHA-256 - **Specification:** Published specification at tukaani.org - Magic bytes + Stream Flags + CRC-32 - Block Header (compressed size, uncompressed size, filters) - Compressed Data (LZMA2 stream) - Block Padding + Check (CRC/SHA-256) - Records mapping blocks to uncompressed offsets ## Quick Example ```bash xz -l file.txt.xz # show compression info xz -lv file.txt.xz # verbose info (blocks, checks, ratio) xz -t file.txt.xz # test integrity ``` ```bash # Check memory requirements for a preset xz --info-memory # show available memory # Level 9: ~674 MB compression, ~65 MB decompression # Level 6: ~94 MB compression, ~9 MB decompression # Level 3: ~18 MB compression, ~2 MB decompression ```
skilldb get file-formats-skills/XZ/LZMA CompressionFull skill: 216 linesYou are a file format specialist with deep expertise in XZ/LZMA2 compression, including compression level tuning, memory-speed-ratio tradeoffs, multi-threaded compression with pixz, tar.xz creation for Linux software distribution, and comparison with gzip, bzip2, and Zstandard.
XZ/LZMA Compression (.xz)
Overview
XZ is a compression format and tool that uses the LZMA2 algorithm to achieve excellent compression ratios. Developed by Lasse Collin and Igor Pavlov (creator of LZMA/7-Zip), XZ has become the standard compression format for Linux software distribution, replacing both gzip and bzip2 for source tarballs and package archives.
XZ provides the best compression ratio among commonly used formats for text, source code, and binaries, at the cost of slower compression speed. Decompression is reasonably fast and low-memory.
Core Philosophy
xz is a compression format that prioritizes maximum compression ratio, using the LZMA2 algorithm to achieve the smallest possible output at the cost of significantly higher CPU and memory usage during compression. In the hierarchy of common Unix compression tools, xz produces the smallest files (followed by bzip2, then gzip), but is also the slowest to compress.
xz's compression advantage makes it the standard choice for distributing large, infrequently-updated files where download bandwidth matters more than compression time: Linux kernel tarballs, distribution package repositories, and software release archives. The compression happens once; the decompression happens thousands or millions of times by downloaders, so optimizing for small file size is the right tradeoff.
For workloads where compression speed matters (log rotation, real-time data pipelines, build systems), use zstd, which approaches xz's compression ratios at dramatically faster speeds. xz remains the right choice when you need the absolute smallest file size and can afford to wait for compression, or when your distribution channel expects .tar.xz format.
Technical Specifications
- Extension:
.xz,.lzma(legacy),.txz(tar.xz shorthand) - MIME type:
application/x-xz - Magic bytes:
\xFD7zXZ\x00(6 bytes) - Algorithm: LZMA2 (LZ77 + range coding + delta filters)
- Compression levels: 0-9, default 6; also
-eextreme flag - Integrity checks: CRC-32, CRC-64 (default), SHA-256
- Specification: Published specification at tukaani.org
Internal Structure
[Stream Header (12 bytes)]
- Magic bytes + Stream Flags + CRC-32
[Block 1]
- Block Header (compressed size, uncompressed size, filters)
- Compressed Data (LZMA2 stream)
- Block Padding + Check (CRC/SHA-256)
[Block 2]
...
[Index]
- Records mapping blocks to uncompressed offsets
- Enables random access when used with multiple blocks
[Stream Footer]
- CRC-32 + Backward Size + Stream Flags + Footer Magic
Multiple streams can be concatenated. The Index enables seeking to specific blocks without decompressing the entire file.
How to Work With It
Compressing
# Compress a file
xz file.txt # creates file.txt.xz, removes original
xz -k file.txt # keep original
xz -9 file.txt # maximum compression
xz -9e file.txt # extreme (slower, marginally better)
xz -0 file.txt # fastest
xz -T 0 file.txt # use all CPU threads (multi-threaded LZMA2)
# Create tar.xz
tar cJf archive.tar.xz folder/
tar cf - folder/ | xz -9 -T 0 > archive.tar.xz # parallel compression
# Parallel xz (alternative)
pixz -9 file.txt # parallel, indexable xz
# Control memory usage
xz -6 --memlimit=512MiB file.txt
Decompressing
xz -d file.txt.xz # decompress
unxz file.txt.xz # same
xzcat file.txt.xz # decompress to stdout
# Python
import lzma
with lzma.open('file.txt.xz', 'rt') as f:
content = f.read()
Inspecting
xz -l file.txt.xz # show compression info
xz -lv file.txt.xz # verbose info (blocks, checks, ratio)
xz -t file.txt.xz # test integrity
Memory Considerations
# Check memory requirements for a preset
xz --info-memory # show available memory
# Level 9: ~674 MB compression, ~65 MB decompression
# Level 6: ~94 MB compression, ~9 MB decompression
# Level 3: ~18 MB compression, ~2 MB decompression
Common Use Cases
- Linux source distribution:
.tar.xzis the standard for kernel, GNU tools, and most projects - Linux package managers: Arch Linux packages (
.pkg.tar.xz), Debian/Ubuntu (.debuses xz internally) - Firmware images: Compressed firmware and initramfs images
- Archival: Long-term storage where compression time is less important than size
- Man pages: Compressed with xz on many distributions
- Data distribution: Scientific datasets, database dumps where size matters
Pros & Cons
Pros
- Excellent compression ratio — typically 10-30% smaller than gzip, 5-15% smaller than bzip2
- Fast decompression relative to compression time
- Low memory usage for decompression (important for embedded/constrained systems)
- Multi-threaded compression available (
-Tflag in xz 5.2+) - Built-in integrity checking (CRC-64 default, optional SHA-256)
- Block-based format with index enables random access
- Standard on modern Linux distributions
Cons
- Very slow compression, especially at high levels (-9e can be 10-20x slower than gzip)
- High memory usage during compression (674 MB at level 9)
- Not suitable for real-time or streaming compression
- No encryption support
- Decompression still slower than gzip or Zstandard
- 2024 supply-chain attack on xz-utils (CVE-2024-3094) damaged trust temporarily
Compatibility
| Platform | Native Support | Notes |
|---|---|---|
| Linux | Yes | xz-utils pre-installed on virtually all distributions |
| macOS | Yes | Available via Homebrew, pre-installed on some versions |
| Windows | Via tools | 7-Zip (native LZMA support), Git Bash, WSL |
| FreeBSD | Yes | Pre-installed |
Programming languages: Python (lzma in stdlib since 3.3), Node.js (lzma-native), Go (github.com/ulikunitz/xz), Java (Apache Commons Compress, xz-java), Rust (xz2/liblzma), C (liblzma).
Practical Usage
Create a highly compressed tar.xz archive with multi-threading
# Compress a source tree with maximum compression and all CPU cores
tar cf - linux-6.8/ | xz -9e -T0 --memlimit=4GiB > linux-6.8.tar.xz
# Verify the archive integrity
xz -t linux-6.8.tar.xz && echo "Integrity OK"
# Show compression statistics
xz -lv linux-6.8.tar.xz
Decompress and process XZ data in Python
import lzma
import tarfile
# Read a compressed text file
with lzma.open('logfile.txt.xz', 'rt', encoding='utf-8') as f:
for line in f:
if 'ERROR' in line:
print(line.strip())
# Extract a tar.xz archive programmatically
with tarfile.open('package.tar.xz', 'r:xz') as tar:
tar.extractall(path='./extracted/')
print(f"Extracted {len(tar.getnames())} files")
Compare compression ratios across formats
# Benchmark a file across gzip, bzip2, xz, and zstd
FILE="data.tar"
echo "Original: $(stat --format='%s' $FILE) bytes"
for tool in "gzip -9" "bzip2 -9" "xz -9" "zstd -19"; do
name=$(echo $tool | cut -d' ' -f1)
$tool -k $FILE 2>/dev/null
ext=$(ls ${FILE}.* 2>/dev/null | head -1)
echo "$name: $(stat --format='%s' $ext) bytes"
rm -f "$ext"
done
Anti-Patterns
Using xz -9e for compressing data that will be frequently recompressed or updated. Extreme compression levels are 10-20x slower than default and only save a few extra percent. Use -6 (default) or -3 for iterative workflows; reserve -9e for final release artifacts that will be compressed once and decompressed many times.
Compressing already-compressed files (JPEG, MP4, ZIP) with xz. XZ cannot meaningfully compress data that is already compressed and may actually increase the file size due to framing overhead. Only use xz on compressible content like text, source code, binaries, and uncompressed data.
Running xz -9 on a system with limited RAM without setting a memory limit. Level 9 compression requires approximately 674 MB of RAM. On constrained systems (containers, CI runners, embedded), this can trigger OOM kills. Always use --memlimit or choose a lower compression level appropriate for available memory.
Using xz for real-time or streaming compression pipelines. XZ's compression is inherently slow and latency-heavy, making it unsuitable for real-time data streams or interactive applications. Use Zstandard or LZ4 for low-latency streaming; use xz only for batch archival tasks.
Deploying xz-compressed assets for web delivery instead of Brotli or Zstandard. Browsers have no native support for xz in HTTP Content-Encoding. Use Brotli for web content compression or Zstandard where supported; xz is only appropriate for downloadable archive files.
Related Formats
- LZMA — The predecessor format (
.lzma), lacks the container features of XZ - 7z — Uses the same LZMA2 algorithm in an archive container
- gzip — Much faster but significantly worse ratio
- Zstandard — Modern alternative with much faster compression at comparable ratios
- Brotli — Google's format, focused on web content compression
- bzip2 — Older alternative, XZ compresses better and decompresses faster
Install this skill directly: skilldb add file-formats-skills
Related Skills
3MF 3D Manufacturing Format
The 3MF file format — the modern replacement for STL in 3D printing, supporting colors, materials, multi-object assemblies, and precise manufacturing data in a single package.
7-Zip Compressed Archive
The 7z archive format — open-source high-ratio compression using LZMA2, with strong AES-256 encryption, solid archives, and multi-threading support.
AAC (Advanced Audio Coding)
A lossy audio codec standardized as part of MPEG-2 and MPEG-4, designed to supersede MP3 with better quality at equivalent or lower bitrates.
AC3 (Dolby Digital)
Dolby's surround sound audio codec used in cinema, DVD, Blu-ray, and broadcast television for multichannel 5.1 audio delivery.
AI Adobe Illustrator Format
AI is Adobe Illustrator's native vector graphics file format, used for
AIFF (Audio Interchange File Format)
Apple's uncompressed audio format storing raw PCM data, serving as the Mac equivalent of WAV for professional audio production.