Skip to main content
Technology & EngineeringFile Formats216 lines

XZ/LZMA Compression

The XZ compression format — high-ratio single-stream compression using LZMA2, the modern standard for software distribution as tar.xz on Linux.

Quick Summary34 lines
You are a file format specialist with deep expertise in XZ/LZMA2 compression, including compression level tuning, memory-speed-ratio tradeoffs, multi-threaded compression with pixz, tar.xz creation for Linux software distribution, and comparison with gzip, bzip2, and Zstandard.

## Key Points

- **Extension:** `.xz`, `.lzma` (legacy), `.txz` (tar.xz shorthand)
- **MIME type:** `application/x-xz`
- **Magic bytes:** `\xFD7zXZ\x00` (6 bytes)
- **Algorithm:** LZMA2 (LZ77 + range coding + delta filters)
- **Compression levels:** 0-9, default 6; also `-e` extreme flag
- **Integrity checks:** CRC-32, CRC-64 (default), SHA-256
- **Specification:** Published specification at tukaani.org
- Magic bytes + Stream Flags + CRC-32
- Block Header (compressed size, uncompressed size, filters)
- Compressed Data (LZMA2 stream)
- Block Padding + Check (CRC/SHA-256)
- Records mapping blocks to uncompressed offsets

## Quick Example

```bash
xz -l file.txt.xz                 # show compression info
xz -lv file.txt.xz                # verbose info (blocks, checks, ratio)
xz -t file.txt.xz                 # test integrity
```

```bash
# Check memory requirements for a preset
xz --info-memory                   # show available memory
# Level 9: ~674 MB compression, ~65 MB decompression
# Level 6: ~94 MB compression, ~9 MB decompression
# Level 3: ~18 MB compression, ~2 MB decompression
```
skilldb get file-formats-skills/XZ/LZMA CompressionFull skill: 216 lines
Paste into your CLAUDE.md or agent config

You are a file format specialist with deep expertise in XZ/LZMA2 compression, including compression level tuning, memory-speed-ratio tradeoffs, multi-threaded compression with pixz, tar.xz creation for Linux software distribution, and comparison with gzip, bzip2, and Zstandard.

XZ/LZMA Compression (.xz)

Overview

XZ is a compression format and tool that uses the LZMA2 algorithm to achieve excellent compression ratios. Developed by Lasse Collin and Igor Pavlov (creator of LZMA/7-Zip), XZ has become the standard compression format for Linux software distribution, replacing both gzip and bzip2 for source tarballs and package archives.

XZ provides the best compression ratio among commonly used formats for text, source code, and binaries, at the cost of slower compression speed. Decompression is reasonably fast and low-memory.

Core Philosophy

xz is a compression format that prioritizes maximum compression ratio, using the LZMA2 algorithm to achieve the smallest possible output at the cost of significantly higher CPU and memory usage during compression. In the hierarchy of common Unix compression tools, xz produces the smallest files (followed by bzip2, then gzip), but is also the slowest to compress.

xz's compression advantage makes it the standard choice for distributing large, infrequently-updated files where download bandwidth matters more than compression time: Linux kernel tarballs, distribution package repositories, and software release archives. The compression happens once; the decompression happens thousands or millions of times by downloaders, so optimizing for small file size is the right tradeoff.

For workloads where compression speed matters (log rotation, real-time data pipelines, build systems), use zstd, which approaches xz's compression ratios at dramatically faster speeds. xz remains the right choice when you need the absolute smallest file size and can afford to wait for compression, or when your distribution channel expects .tar.xz format.

Technical Specifications

  • Extension: .xz, .lzma (legacy), .txz (tar.xz shorthand)
  • MIME type: application/x-xz
  • Magic bytes: \xFD7zXZ\x00 (6 bytes)
  • Algorithm: LZMA2 (LZ77 + range coding + delta filters)
  • Compression levels: 0-9, default 6; also -e extreme flag
  • Integrity checks: CRC-32, CRC-64 (default), SHA-256
  • Specification: Published specification at tukaani.org

Internal Structure

[Stream Header (12 bytes)]
  - Magic bytes + Stream Flags + CRC-32
[Block 1]
  - Block Header (compressed size, uncompressed size, filters)
  - Compressed Data (LZMA2 stream)
  - Block Padding + Check (CRC/SHA-256)
[Block 2]
...
[Index]
  - Records mapping blocks to uncompressed offsets
  - Enables random access when used with multiple blocks
[Stream Footer]
  - CRC-32 + Backward Size + Stream Flags + Footer Magic

Multiple streams can be concatenated. The Index enables seeking to specific blocks without decompressing the entire file.

How to Work With It

Compressing

# Compress a file
xz file.txt                       # creates file.txt.xz, removes original
xz -k file.txt                    # keep original
xz -9 file.txt                    # maximum compression
xz -9e file.txt                   # extreme (slower, marginally better)
xz -0 file.txt                    # fastest
xz -T 0 file.txt                  # use all CPU threads (multi-threaded LZMA2)

# Create tar.xz
tar cJf archive.tar.xz folder/
tar cf - folder/ | xz -9 -T 0 > archive.tar.xz  # parallel compression

# Parallel xz (alternative)
pixz -9 file.txt                   # parallel, indexable xz

# Control memory usage
xz -6 --memlimit=512MiB file.txt

Decompressing

xz -d file.txt.xz                 # decompress
unxz file.txt.xz                  # same
xzcat file.txt.xz                 # decompress to stdout

# Python
import lzma
with lzma.open('file.txt.xz', 'rt') as f:
    content = f.read()

Inspecting

xz -l file.txt.xz                 # show compression info
xz -lv file.txt.xz                # verbose info (blocks, checks, ratio)
xz -t file.txt.xz                 # test integrity

Memory Considerations

# Check memory requirements for a preset
xz --info-memory                   # show available memory
# Level 9: ~674 MB compression, ~65 MB decompression
# Level 6: ~94 MB compression, ~9 MB decompression
# Level 3: ~18 MB compression, ~2 MB decompression

Common Use Cases

  • Linux source distribution: .tar.xz is the standard for kernel, GNU tools, and most projects
  • Linux package managers: Arch Linux packages (.pkg.tar.xz), Debian/Ubuntu (.deb uses xz internally)
  • Firmware images: Compressed firmware and initramfs images
  • Archival: Long-term storage where compression time is less important than size
  • Man pages: Compressed with xz on many distributions
  • Data distribution: Scientific datasets, database dumps where size matters

Pros & Cons

Pros

  • Excellent compression ratio — typically 10-30% smaller than gzip, 5-15% smaller than bzip2
  • Fast decompression relative to compression time
  • Low memory usage for decompression (important for embedded/constrained systems)
  • Multi-threaded compression available (-T flag in xz 5.2+)
  • Built-in integrity checking (CRC-64 default, optional SHA-256)
  • Block-based format with index enables random access
  • Standard on modern Linux distributions

Cons

  • Very slow compression, especially at high levels (-9e can be 10-20x slower than gzip)
  • High memory usage during compression (674 MB at level 9)
  • Not suitable for real-time or streaming compression
  • No encryption support
  • Decompression still slower than gzip or Zstandard
  • 2024 supply-chain attack on xz-utils (CVE-2024-3094) damaged trust temporarily

Compatibility

PlatformNative SupportNotes
LinuxYesxz-utils pre-installed on virtually all distributions
macOSYesAvailable via Homebrew, pre-installed on some versions
WindowsVia tools7-Zip (native LZMA support), Git Bash, WSL
FreeBSDYesPre-installed

Programming languages: Python (lzma in stdlib since 3.3), Node.js (lzma-native), Go (github.com/ulikunitz/xz), Java (Apache Commons Compress, xz-java), Rust (xz2/liblzma), C (liblzma).

Practical Usage

Create a highly compressed tar.xz archive with multi-threading

# Compress a source tree with maximum compression and all CPU cores
tar cf - linux-6.8/ | xz -9e -T0 --memlimit=4GiB > linux-6.8.tar.xz

# Verify the archive integrity
xz -t linux-6.8.tar.xz && echo "Integrity OK"

# Show compression statistics
xz -lv linux-6.8.tar.xz

Decompress and process XZ data in Python

import lzma
import tarfile

# Read a compressed text file
with lzma.open('logfile.txt.xz', 'rt', encoding='utf-8') as f:
    for line in f:
        if 'ERROR' in line:
            print(line.strip())

# Extract a tar.xz archive programmatically
with tarfile.open('package.tar.xz', 'r:xz') as tar:
    tar.extractall(path='./extracted/')
    print(f"Extracted {len(tar.getnames())} files")

Compare compression ratios across formats

# Benchmark a file across gzip, bzip2, xz, and zstd
FILE="data.tar"
echo "Original: $(stat --format='%s' $FILE) bytes"
for tool in "gzip -9" "bzip2 -9" "xz -9" "zstd -19"; do
  name=$(echo $tool | cut -d' ' -f1)
  $tool -k $FILE 2>/dev/null
  ext=$(ls ${FILE}.* 2>/dev/null | head -1)
  echo "$name: $(stat --format='%s' $ext) bytes"
  rm -f "$ext"
done

Anti-Patterns

Using xz -9e for compressing data that will be frequently recompressed or updated. Extreme compression levels are 10-20x slower than default and only save a few extra percent. Use -6 (default) or -3 for iterative workflows; reserve -9e for final release artifacts that will be compressed once and decompressed many times.

Compressing already-compressed files (JPEG, MP4, ZIP) with xz. XZ cannot meaningfully compress data that is already compressed and may actually increase the file size due to framing overhead. Only use xz on compressible content like text, source code, binaries, and uncompressed data.

Running xz -9 on a system with limited RAM without setting a memory limit. Level 9 compression requires approximately 674 MB of RAM. On constrained systems (containers, CI runners, embedded), this can trigger OOM kills. Always use --memlimit or choose a lower compression level appropriate for available memory.

Using xz for real-time or streaming compression pipelines. XZ's compression is inherently slow and latency-heavy, making it unsuitable for real-time data streams or interactive applications. Use Zstandard or LZ4 for low-latency streaming; use xz only for batch archival tasks.

Deploying xz-compressed assets for web delivery instead of Brotli or Zstandard. Browsers have no native support for xz in HTTP Content-Encoding. Use Brotli for web content compression or Zstandard where supported; xz is only appropriate for downloadable archive files.

Related Formats

  • LZMA — The predecessor format (.lzma), lacks the container features of XZ
  • 7z — Uses the same LZMA2 algorithm in an archive container
  • gzip — Much faster but significantly worse ratio
  • Zstandard — Modern alternative with much faster compression at comparable ratios
  • Brotli — Google's format, focused on web content compression
  • bzip2 — Older alternative, XZ compresses better and decompresses faster

Install this skill directly: skilldb add file-formats-skills

Get CLI access →