Technology & EngineeringFile Formats216 lines

XZ/LZMA Compression

The XZ compression format — high-ratio single-stream compression using LZMA2, the modern standard for software distribution as tar.xz on Linux.

Quick Summary34 lines

You are a file format specialist with deep expertise in XZ/LZMA2 compression, including compression level tuning, memory-speed-ratio tradeoffs, multi-threaded compression with pixz, tar.xz creation for Linux software distribution, and comparison with gzip, bzip2, and Zstandard.

## Key Points

- **Extension:** `.xz`, `.lzma` (legacy), `.txz` (tar.xz shorthand)
- **MIME type:** `application/x-xz`
- **Magic bytes:** `\xFD7zXZ\x00` (6 bytes)
- **Algorithm:** LZMA2 (LZ77 + range coding + delta filters)
- **Compression levels:** 0-9, default 6; also `-e` extreme flag
- **Integrity checks:** CRC-32, CRC-64 (default), SHA-256
- **Specification:** Published specification at tukaani.org
- Magic bytes + Stream Flags + CRC-32
- Block Header (compressed size, uncompressed size, filters)
- Compressed Data (LZMA2 stream)
- Block Padding + Check (CRC/SHA-256)
- Records mapping blocks to uncompressed offsets

## Quick Example

```bash
xz -l file.txt.xz                 # show compression info
xz -lv file.txt.xz                # verbose info (blocks, checks, ratio)
xz -t file.txt.xz                 # test integrity
```

```bash
# Check memory requirements for a preset
xz --info-memory                   # show available memory
# Level 9: ~674 MB compression, ~65 MB decompression
# Level 6: ~94 MB compression, ~9 MB decompression
# Level 3: ~18 MB compression, ~2 MB decompression
```

skilldb get file-formats-skills/XZ/LZMA CompressionFull skill: 216 lines

Paste into your CLAUDE.md or agent config

You are a file format specialist with deep expertise in XZ/LZMA2 compression, including compression level tuning, memory-speed-ratio tradeoffs, multi-threaded compression with pixz, tar.xz creation for Linux software distribution, and comparison with gzip, bzip2, and Zstandard.

XZ/LZMA Compression (.xz)

Overview

XZ is a compression format and tool that uses the LZMA2 algorithm to achieve excellent compression ratios. Developed by Lasse Collin and Igor Pavlov (creator of LZMA/7-Zip), XZ has become the standard compression format for Linux software distribution, replacing both gzip and bzip2 for source tarballs and package archives.

XZ provides the best compression ratio among commonly used formats for text, source code, and binaries, at the cost of slower compression speed. Decompression is reasonably fast and low-memory.

Core Philosophy

xz is a compression format that prioritizes maximum compression ratio, using the LZMA2 algorithm to achieve the smallest possible output at the cost of significantly higher CPU and memory usage during compression. In the hierarchy of common Unix compression tools, xz produces the smallest files (followed by bzip2, then gzip), but is also the slowest to compress.

xz's compression advantage makes it the standard choice for distributing large, infrequently-updated files where download bandwidth matters more than compression time: Linux kernel tarballs, distribution package repositories, and software release archives. The compression happens once; the decompression happens thousands or millions of times by downloaders, so optimizing for small file size is the right tradeoff.

For workloads where compression speed matters (log rotation, real-time data pipelines, build systems), use zstd, which approaches xz's compression ratios at dramatically faster speeds. xz remains the right choice when you need the absolute smallest file size and can afford to wait for compression, or when your distribution channel expects .tar.xz format.

Technical Specifications

Extension: .xz, .lzma (legacy), .txz (tar.xz shorthand)
MIME type: application/x-xz
Magic bytes: \xFD7zXZ\x00 (6 bytes)
Algorithm: LZMA2 (LZ77 + range coding + delta filters)
Compression levels: 0-9, default 6; also -e extreme flag
Integrity checks: CRC-32, CRC-64 (default), SHA-256
Specification: Published specification at tukaani.org

Internal Structure

[Stream Header (12 bytes)]
  - Magic bytes + Stream Flags + CRC-32
[Block 1]
  - Block Header (compressed size, uncompressed size, filters)
  - Compressed Data (LZMA2 stream)
  - Block Padding + Check (CRC/SHA-256)
[Block 2]
...
[Index]
  - Records mapping blocks to uncompressed offsets
  - Enables random access when used with multiple blocks
[Stream Footer]
  - CRC-32 + Backward Size + Stream Flags + Footer Magic

Multiple streams can be concatenated. The Index enables seeking to specific blocks without decompressing the entire file.

How to Work With It

Compressing

# Compress a file
xz file.txt                       # creates file.txt.xz, removes original
xz -k file.txt                    # keep original
xz -9 file.txt                    # maximum compression
xz -9e file.txt                   # extreme (slower, marginally better)
xz -0 file.txt                    # fastest
xz -T 0 file.txt                  # use all CPU threads (multi-threaded LZMA2)

# Create tar.xz
tar cJf archive.tar.xz folder/
tar cf - folder/ | xz -9 -T 0 > archive.tar.xz  # parallel compression

# Parallel xz (alternative)
pixz -9 file.txt                   # parallel, indexable xz

# Control memory usage
xz -6 --memlimit=512MiB file.txt

Decompressing

xz -d file.txt.xz                 # decompress
unxz file.txt.xz                  # same
xzcat file.txt.xz                 # decompress to stdout

# Python
import lzma
with lzma.open('file.txt.xz', 'rt') as f:
    content = f.read()

Inspecting

xz -l file.txt.xz                 # show compression info
xz -lv file.txt.xz                # verbose info (blocks, checks, ratio)
xz -t file.txt.xz                 # test integrity

Memory Considerations

# Check memory requirements for a preset
xz --info-memory                   # show available memory
# Level 9: ~674 MB compression, ~65 MB decompression
# Level 6: ~94 MB compression, ~9 MB decompression
# Level 3: ~18 MB compression, ~2 MB decompression

Common Use Cases

Linux source distribution: .tar.xz is the standard for kernel, GNU tools, and most projects
Linux package managers: Arch Linux packages (.pkg.tar.xz), Debian/Ubuntu (.deb uses xz internally)
Firmware images: Compressed firmware and initramfs images
Archival: Long-term storage where compression time is less important than size
Man pages: Compressed with xz on many distributions
Data distribution: Scientific datasets, database dumps where size matters

Pros & Cons

Pros

Excellent compression ratio — typically 10-30% smaller than gzip, 5-15% smaller than bzip2
Fast decompression relative to compression time
Low memory usage for decompression (important for embedded/constrained systems)
Multi-threaded compression available (-T flag in xz 5.2+)
Built-in integrity checking (CRC-64 default, optional SHA-256)
Block-based format with index enables random access
Standard on modern Linux distributions

Cons

Very slow compression, especially at high levels (-9e can be 10-20x slower than gzip)
High memory usage during compression (674 MB at level 9)
Not suitable for real-time or streaming compression
No encryption support
Decompression still slower than gzip or Zstandard
2024 supply-chain attack on xz-utils (CVE-2024-3094) damaged trust temporarily

Compatibility

Platform	Native Support	Notes
Linux	Yes	`xz-utils` pre-installed on virtually all distributions
macOS	Yes	Available via Homebrew, pre-installed on some versions
Windows	Via tools	7-Zip (native LZMA support), Git Bash, WSL
FreeBSD	Yes	Pre-installed

Programming languages: Python (lzma in stdlib since 3.3), Node.js (lzma-native), Go (github.com/ulikunitz/xz), Java (Apache Commons Compress, xz-java), Rust (xz2/liblzma), C (liblzma).

Practical Usage

Create a highly compressed tar.xz archive with multi-threading

# Compress a source tree with maximum compression and all CPU cores
tar cf - linux-6.8/ | xz -9e -T0 --memlimit=4GiB > linux-6.8.tar.xz

# Verify the archive integrity
xz -t linux-6.8.tar.xz && echo "Integrity OK"

# Show compression statistics
xz -lv linux-6.8.tar.xz

Decompress and process XZ data in Python

import lzma
import tarfile

# Read a compressed text file
with lzma.open('logfile.txt.xz', 'rt', encoding='utf-8') as f:
    for line in f:
        if 'ERROR' in line:
            print(line.strip())

# Extract a tar.xz archive programmatically
with tarfile.open('package.tar.xz', 'r:xz') as tar:
    tar.extractall(path='./extracted/')
    print(f"Extracted {len(tar.getnames())} files")

Compare compression ratios across formats

# Benchmark a file across gzip, bzip2, xz, and zstd
FILE="data.tar"
echo "Original: $(stat --format='%s' $FILE) bytes"
for tool in "gzip -9" "bzip2 -9" "xz -9" "zstd -19"; do
  name=$(echo $tool | cut -d' ' -f1)
  $tool -k $FILE 2>/dev/null
  ext=$(ls ${FILE}.* 2>/dev/null | head -1)
  echo "$name: $(stat --format='%s' $ext) bytes"
  rm -f "$ext"
done

Anti-Patterns

Using xz -9e for compressing data that will be frequently recompressed or updated. Extreme compression levels are 10-20x slower than default and only save a few extra percent. Use -6 (default) or -3 for iterative workflows; reserve -9e for final release artifacts that will be compressed once and decompressed many times.

Compressing already-compressed files (JPEG, MP4, ZIP) with xz. XZ cannot meaningfully compress data that is already compressed and may actually increase the file size due to framing overhead. Only use xz on compressible content like text, source code, binaries, and uncompressed data.

Running xz -9 on a system with limited RAM without setting a memory limit. Level 9 compression requires approximately 674 MB of RAM. On constrained systems (containers, CI runners, embedded), this can trigger OOM kills. Always use --memlimit or choose a lower compression level appropriate for available memory.

Using xz for real-time or streaming compression pipelines. XZ's compression is inherently slow and latency-heavy, making it unsuitable for real-time data streams or interactive applications. Use Zstandard or LZ4 for low-latency streaming; use xz only for batch archival tasks.

Deploying xz-compressed assets for web delivery instead of Brotli or Zstandard. Browsers have no native support for xz in HTTP Content-Encoding. Use Brotli for web content compression or Zstandard where supported; xz is only appropriate for downloadable archive files.

Related Formats

LZMA — The predecessor format (.lzma), lacks the container features of XZ
7z — Uses the same LZMA2 algorithm in an archive container
gzip — Much faster but significantly worse ratio
Zstandard — Modern alternative with much faster compression at comparable ratios
Brotli — Google's format, focused on web content compression
bzip2 — Older alternative, XZ compresses better and decompresses faster

Install this skill directly: skilldb add file-formats-skills

Get CLI access →

XZ/LZMA Compression (.xz)

Overview

Core Philosophy

Technical Specifications

Internal Structure

How to Work With It

Compressing

Compress a file

Create tar.xz

Parallel xz (alternative)

Control memory usage

Decompressing

Python

Inspecting

Memory Considerations

Check memory requirements for a preset

Level 9: ~674 MB compression, ~65 MB decompression

Level 6: ~94 MB compression, ~9 MB decompression

Level 3: ~18 MB compression, ~2 MB decompression

Common Use Cases

Pros & Cons

Pros

Cons

Compatibility

Practical Usage

Create a highly compressed tar.xz archive with multi-threading

Compress a source tree with maximum compression and all CPU cores

Verify the archive integrity

Show compression statistics

Decompress and process XZ data in Python

Read a compressed text file

Extract a tar.xz archive programmatically

Compare compression ratios across formats

Benchmark a file across gzip, bzip2, xz, and zstd

Anti-Patterns

Related Formats

Details

Pack: file-formats-skills
File: xz.md
Lines: 216
Category: Technology & Engineering

Download via CLI

Pro

$ skilldb add file-formats-skills

Installs the full File Formats pack to your project.

XZ/LZMA Compression

XZ/LZMA Compression (.xz)

Overview

Core Philosophy

Technical Specifications

Internal Structure

How to Work With It

Compressing

Decompressing

Inspecting

Memory Considerations

Common Use Cases

Pros & Cons

Pros

Cons

Compatibility

Practical Usage

Create a highly compressed tar.xz archive with multi-threading

Decompress and process XZ data in Python

Compare compression ratios across formats

Anti-Patterns

Related Formats

Related Skills

3MF 3D Manufacturing Format

7-Zip Compressed Archive

AAC (Advanced Audio Coding)

AC3 (Dolby Digital)

AI Adobe Illustrator Format

AIFF (Audio Interchange File Format)