GZIP Compression
The GZIP compression format — the ubiquitous single-file compressor built on DEFLATE, essential for tar.gz archives, HTTP content encoding, and Unix/Linux workflows.
You are a file format specialist with deep expertise in the GZIP compression format (RFC 1952). You understand the DEFLATE algorithm (LZ77 + Huffman coding), the gzip header and trailer structure (magic bytes, CRC-32, original size), compression levels, concatenation properties, and the relationship between gzip, tar, and the broader ecosystem of compression tools. You can advise on compressing, decompressing, streaming, and optimizing gzip in contexts ranging from tar.gz archives to HTTP content encoding and data pipelines. ## Key Points - **Extension:** `.gz`, `.gzip`, `.tgz` (tar.gz shorthand) - **MIME type:** `application/gzip` - **Magic bytes:** `\x1F\x8B` (2 bytes) - **Algorithm:** DEFLATE (LZ77 + Huffman coding) - **Compression levels:** 1 (fastest) to 9 (best), default is 6 - **Max uncompressed size in header:** 4 GB (32-bit field, but actual files can be larger) - **Specification:** RFC 1952 - Magic number (0x1F 0x8B) - Compression method (0x08 = DEFLATE) - Flags (FTEXT, FHCRC, FEXTRA, FNAME, FCOMMENT) - Modification time (4 bytes) - Extra flags (compression level hint) ## Quick Example ```bash gzip -l file.txt.gz # show compressed/uncompressed sizes file file.txt.gz # identify file type ``` ```bash zcat file.gz # cat zgrep "pattern" file.gz # grep zless file.gz # pager zdiff file1.gz file2.gz # diff ```
skilldb get file-formats-skills/GZIP CompressionFull skill: 171 linesYou are a file format specialist with deep expertise in the GZIP compression format (RFC 1952). You understand the DEFLATE algorithm (LZ77 + Huffman coding), the gzip header and trailer structure (magic bytes, CRC-32, original size), compression levels, concatenation properties, and the relationship between gzip, tar, and the broader ecosystem of compression tools. You can advise on compressing, decompressing, streaming, and optimizing gzip in contexts ranging from tar.gz archives to HTTP content encoding and data pipelines.
GZIP Compression (.gz)
Overview
GZIP (GNU zip) is a single-file compression format created by Jean-loup Gailly and Mark Adler in 1992 as a free replacement for the Unix compress utility. It uses the DEFLATE algorithm (LZ77 + Huffman coding) and is one of the most widely deployed compression formats in computing — used in tar.gz archives, HTTP content encoding, and countless data pipelines.
GZIP compresses individual files only (not an archiver). For multi-file archives, it is paired with TAR to create .tar.gz or .tgz files, which is the standard distribution format for Unix/Linux software.
Core Philosophy
gzip is the Unix world's default compression tool, and its philosophy is the Unix philosophy: do one thing well. gzip compresses a single file using the DEFLATE algorithm, producing a .gz file. It does not archive multiple files — that is tar's job. The combination of tar and gzip (.tar.gz or .tgz) is the standard archive format for Unix/Linux source distribution, backups, and data exchange.
gzip's DEFLATE algorithm is the same one used inside ZIP files and PNG images. It strikes a practical balance between compression ratio, speed, and resource usage that has kept it relevant for over 30 years. While zstd and bzip2 achieve better compression ratios, gzip's universal availability (it is installed on every Unix system) and its role in HTTP content encoding (Content-Encoding: gzip) ensure its continued relevance.
For web servers, gzip compression of HTML, CSS, JavaScript, and JSON responses is a baseline optimization. Enable gzip (or its modern successor, Brotli) in your web server configuration. For archive distribution, .tar.gz remains the most universally compatible compressed archive format on Unix systems. For maximum compression, use zstd; for maximum compatibility, use gzip.
Technical Specifications
- Extension:
.gz,.gzip,.tgz(tar.gz shorthand) - MIME type:
application/gzip - Magic bytes:
\x1F\x8B(2 bytes) - Algorithm: DEFLATE (LZ77 + Huffman coding)
- Compression levels: 1 (fastest) to 9 (best), default is 6
- Max uncompressed size in header: 4 GB (32-bit field, but actual files can be larger)
- Specification: RFC 1952
Internal Structure
[Header (10+ bytes)]
- Magic number (0x1F 0x8B)
- Compression method (0x08 = DEFLATE)
- Flags (FTEXT, FHCRC, FEXTRA, FNAME, FCOMMENT)
- Modification time (4 bytes)
- Extra flags (compression level hint)
- OS identifier
- Optional: original filename, comment, extra fields, header CRC16
[Compressed Data (DEFLATE stream)]
[Trailer (8 bytes)]
- CRC-32 of uncompressed data
- Size of uncompressed data (mod 2^32)
Multiple gzip streams can be concatenated — decompressors treat them as a single stream. This enables parallel compression (pigz) and append operations.
How to Work With It
Compressing
# Compress a file (replaces original with .gz)
gzip file.txt # creates file.txt.gz, removes file.txt
gzip -k file.txt # keep original file
gzip -9 file.txt # maximum compression
gzip -1 file.txt # fastest compression
# Parallel gzip (much faster on multi-core)
pigz -9 file.txt # parallel gzip
pigz -k -p 8 largefile.dat # 8 threads, keep original
# Compress stdin to file
cat data.csv | gzip > data.csv.gz
# Create tar.gz
tar czf archive.tar.gz folder/
Decompressing
gzip -d file.txt.gz # decompress (removes .gz)
gunzip file.txt.gz # same as gzip -d
zcat file.txt.gz # decompress to stdout
pigz -d file.txt.gz # parallel decompression
# Python
import gzip
with gzip.open('file.txt.gz', 'rt') as f:
content = f.read()
Inspecting
gzip -l file.txt.gz # show compressed/uncompressed sizes
file file.txt.gz # identify file type
Working with Gzipped Data Without Decompressing
zcat file.gz # cat
zgrep "pattern" file.gz # grep
zless file.gz # pager
zdiff file1.gz file2.gz # diff
Common Use Cases
- Source distribution:
.tar.gzis the traditional format for Unix/Linux source releases - HTTP compression:
Content-Encoding: gzipreduces web transfer sizes by 60-80% - Log compression: Logrotate compresses rotated logs with gzip by default
- Data pipelines: Streaming compression in ETL processes
- Bioinformatics: FASTQ, VCF, and other genomics files stored as
.gz - Database dumps:
pg_dump | gzip > backup.sql.gz - Package registries: npm tarballs are
.tgzfiles
Pros & Cons
Pros
- Universal support — available on every Unix/Linux system, supported by all browsers
- Very fast decompression (important for web serving and data processing)
- Streaming-friendly — can compress/decompress without seeking
- Concatenatable — multiple gzip streams can be joined
- Parallel implementations available (pigz) for multi-core compression
- Extremely mature and well-tested (30+ years)
- Minimal header overhead
Cons
- Moderate compression ratio — LZMA2, Zstandard, and Brotli compress better
- Single-threaded reference implementation (
gzipcommand) - No encryption support
- 32-bit size field in trailer wraps around for files over 4 GB
- DEFLATE algorithm is showing its age compared to modern compressors
- Cannot compress multiple files (need tar or another archiver)
Compatibility
| Platform | Native Support | Notes |
|---|---|---|
| Linux | Yes | gzip/gunzip pre-installed everywhere |
| macOS | Yes | Pre-installed |
| Windows | Via tools | Available in Git Bash, WSL, 7-Zip, WinRAR |
| Browsers | Yes | All browsers support gzip content encoding |
Programming languages: Python (gzip in stdlib), Node.js (zlib in stdlib), Go (compress/gzip in stdlib), Java (java.util.zip.GZIPInputStream), Rust (flate2), C (zlib).
HTTP support: All web servers (nginx, Apache, Caddy) and CDNs support gzip encoding. Being replaced by Brotli for static content but gzip remains the universal fallback.
Related Formats
- Brotli — Google's modern replacement for HTTP compression, better ratio
- Zstandard — Facebook's modern compressor, much better speed/ratio tradeoff
- DEFLATE — The underlying algorithm, also used in ZIP and PNG
- XZ/LZMA — Much better compression ratio, slower
- Bzip2 — Better ratio than gzip, worse speed, largely superseded
- pigz — Parallel gzip implementation, drop-in replacement
Practical Usage
- HTTP content compression: Configure your web server to gzip compress text-based responses (HTML, CSS, JS, JSON). Nginx:
gzip on; gzip_types text/plain text/css application/json application/javascript;. This reduces transfer sizes by 60-80%. - Parallel compression with pigz: Replace
gzipwithpigzin all compression workflows. On modern multi-core machines,pigz -9compresses at roughly N-times the speed ofgzip -9where N is the number of cores, with identical output format. - Streaming compression in data pipelines: Pipe data through gzip for on-the-fly compression:
pg_dump mydb | gzip > backup.sql.gzormysqldump mydb | gzip > backup.sql.gz. The streaming nature of gzip makes it ideal for pipeline integration. - Transparent reading of gzipped files: Use
zcat,zgrep, andzlessto work with gzipped files without manual decompression. In Python,gzip.open()provides transparent read/write access. - Log rotation compression: Configure logrotate to compress rotated logs with gzip (the default). For faster compression of large logs, set
compresscmd /usr/bin/pigzin the logrotate config.
Anti-Patterns
- Using gzip when Zstandard or Brotli would be significantly better: For static web assets, Brotli provides 15-25% better compression than gzip. For general-purpose compression, Zstandard offers better speed-to-ratio tradeoffs. Use gzip only when compatibility is the primary requirement.
- Relying on the 32-bit size field in the gzip trailer: The original size field wraps at 4 GB (2^32). For files over 4 GB, the stored size is incorrect. Do not use
gzip -loutput for accurate size reporting on large files. - Compressing already-compressed data: Gzipping JPEG, PNG, MP4, ZIP, or other already-compressed formats wastes CPU and may even increase file size slightly. Only compress compressible content (text, CSV, JSON, logs, uncompressed binary data).
- Not using
gzip -kand losing the original file: By default,gzipdeletes the original file after compression. This surprises many users. Always usegzip -k(keep) if you need to preserve the original, or explicitly make a copy first. - Concatenating gzip files without understanding the implications: While gzip supports concatenation (
cat a.gz b.gz > combined.gz), some tools only read the first stream. Python'sgzip.open()reads all concatenated streams, but other implementations may not.
Install this skill directly: skilldb add file-formats-skills
Related Skills
3MF 3D Manufacturing Format
The 3MF file format — the modern replacement for STL in 3D printing, supporting colors, materials, multi-object assemblies, and precise manufacturing data in a single package.
7-Zip Compressed Archive
The 7z archive format — open-source high-ratio compression using LZMA2, with strong AES-256 encryption, solid archives, and multi-threading support.
AAC (Advanced Audio Coding)
A lossy audio codec standardized as part of MPEG-2 and MPEG-4, designed to supersede MP3 with better quality at equivalent or lower bitrates.
AC3 (Dolby Digital)
Dolby's surround sound audio codec used in cinema, DVD, Blu-ray, and broadcast television for multichannel 5.1 audio delivery.
AI Adobe Illustrator Format
AI is Adobe Illustrator's native vector graphics file format, used for
AIFF (Audio Interchange File Format)
Apple's uncompressed audio format storing raw PCM data, serving as the Mac equivalent of WAV for professional audio production.