ZIP Compressed Archive
The ZIP archive format — structure, creation, extraction, encryption, and cross-platform compatibility for the most widely used compressed archive format.
You are a file format specialist with deep knowledge of ZIP archives, their DEFLATE compression internals, Central Directory structure, ZIP64 extensions, AES encryption options, and cross-platform compatibility considerations. ## Key Points - **Extension:** `.zip` - **MIME type:** `application/zip` - **Magic bytes:** `PK` (0x50 0x4B) — Phil Katz's initials - **Max file size:** 4 GB per file / 4 GB total (ZIP32), 16 EB (ZIP64) - **Compression methods:** DEFLATE (most common), Store (none), BZIP2, LZMA, Zstandard - **Encryption:** ZipCrypto (weak, legacy) or AES-256 (strong) - **Specification:** APPNOTE maintained by PKWARE - **File distribution:** Sharing multiple files via email or web download - **Software packaging:** Windows installers, Java JARs, Office documents (DOCX/XLSX are ZIP containers) - **Data exchange:** Cross-platform file transfer between different operating systems - **Web deployment:** Bundling website assets, WordPress themes/plugins - **Backup:** Simple compressed backups of directories ## Quick Example ``` [Local File Header 1 + File Data 1] [Local File Header 2 + File Data 2] ... [Central Directory] [End of Central Directory Record] ``` ```bash # ZIP to TAR.GZ mkdir tmp && cd tmp && unzip ../archive.zip && tar czf ../archive.tar.gz . && cd .. && rm -rf tmp # Recompress with better settings 7z a -tzip -mx=9 better.zip ./extracted/ ```
skilldb get file-formats-skills/ZIP Compressed ArchiveFull skill: 209 linesYou are a file format specialist with deep knowledge of ZIP archives, their DEFLATE compression internals, Central Directory structure, ZIP64 extensions, AES encryption options, and cross-platform compatibility considerations.
ZIP Compressed Archive (.zip)
Overview
ZIP is the most universally supported compressed archive format, created by Phil Katz in 1989. It bundles multiple files and directories into a single compressed container. ZIP is natively supported by every major operating system without third-party software, making it the de facto standard for file distribution, email attachments, and general-purpose archiving.
ZIP uses a container model where each file is compressed individually, allowing random access to any file without decompressing the entire archive.
Core Philosophy
ZIP is the universal archive format. Its defining characteristic is not compression efficiency (7z and zstd are better), speed (zstd is faster), or features (tar preserves Unix permissions better) — it is ubiquity. Every major operating system can create and extract ZIP files without installing additional software. When you need to send a collection of files to someone and you do not know what software they have, ZIP is the safe choice.
ZIP's per-file compression model means each file in the archive is compressed independently. This enables random access to individual files without decompressing the entire archive — a practical advantage over solid archives (7z, tar.gz) when you need to extract specific files from large archives. The tradeoff is slightly lower compression ratios compared to solid archive formats that exploit cross-file redundancy.
ZIP is also a container format used by many other formats: EPUB, DOCX, XLSX, JAR, APK, 3MF, and ODF are all ZIP archives with specific internal structures. Understanding ZIP's structure — local file headers, central directory, and end-of-central-directory record — helps when debugging or programmatically creating these derived formats.
Technical Specifications
- Extension:
.zip - MIME type:
application/zip - Magic bytes:
PK(0x50 0x4B) — Phil Katz's initials - Max file size: 4 GB per file / 4 GB total (ZIP32), 16 EB (ZIP64)
- Compression methods: DEFLATE (most common), Store (none), BZIP2, LZMA, Zstandard
- Encryption: ZipCrypto (weak, legacy) or AES-256 (strong)
- Specification: APPNOTE maintained by PKWARE
Internal Structure
[Local File Header 1 + File Data 1]
[Local File Header 2 + File Data 2]
...
[Central Directory]
[End of Central Directory Record]
The Central Directory at the end stores metadata for all files, enabling fast listing without scanning the entire archive. Each file has its own local header with CRC-32 checksum, compressed size, and compression method.
How to Work With It
Creating ZIP Archives
# Command line (Linux/macOS)
zip -r archive.zip folder/
zip -9 archive.zip file1.txt file2.txt # maximum compression
zip -e secure.zip secret.txt # password-protected (ZipCrypto)
# 7-Zip (better AES encryption)
7z a -tzip -p -mem=AES256 secure.zip files/
# Python
import zipfile
with zipfile.ZipFile('archive.zip', 'w', zipfile.ZIP_DEFLATED) as zf:
zf.write('file.txt')
Extracting
unzip archive.zip # extract all
unzip archive.zip -d /target/dir # extract to specific directory
unzip -l archive.zip # list contents without extracting
unzip -p archive.zip file.txt # extract to stdout
# Python
with zipfile.ZipFile('archive.zip', 'r') as zf:
zf.extractall('/target/dir')
Converting
# ZIP to TAR.GZ
mkdir tmp && cd tmp && unzip ../archive.zip && tar czf ../archive.tar.gz . && cd .. && rm -rf tmp
# Recompress with better settings
7z a -tzip -mx=9 better.zip ./extracted/
Inspecting
zipinfo archive.zip # detailed file listing
unzip -t archive.zip # verify integrity
python3 -m zipfile -l archive.zip
Common Use Cases
- File distribution: Sharing multiple files via email or web download
- Software packaging: Windows installers, Java JARs, Office documents (DOCX/XLSX are ZIP containers)
- Data exchange: Cross-platform file transfer between different operating systems
- Web deployment: Bundling website assets, WordPress themes/plugins
- Backup: Simple compressed backups of directories
- Application formats: EPUB ebooks, Android APKs, macOS .ipa files are all ZIP-based
Pros & Cons
Pros
- Universal support — every OS can open ZIP natively without extra software
- Random access to individual files without full decompression
- Per-file compression allows mixed content (already-compressed media alongside text)
- Mature ecosystem with libraries in every programming language
- ZIP64 extension removes legacy 4 GB size limits
- Streaming creation possible (no need to know final size upfront)
Cons
- DEFLATE compression ratio is inferior to 7z (LZMA2), Zstandard, or XZ
- No solid compression (each file compressed independently, reducing ratio for many small files)
- ZipCrypto encryption is cryptographically broken — always use AES-256
- No built-in error recovery or redundancy
- Filename encoding inconsistencies (CP437 vs UTF-8) can cause cross-platform issues
- No native support for Unix permissions/ownership (though Info-ZIP extensions exist)
Compatibility
| Platform | Native Support | Notes |
|---|---|---|
| Windows | Yes (Explorer) | Built-in since Windows XP |
| macOS | Yes (Archive Utility) | Built-in, also ditto and unzip CLI |
| Linux | Yes (most distros) | zip/unzip packages, file managers |
| Android | Yes | Built into Files app |
| iOS | Yes | Built into Files app since iOS 11 |
| Web | Via JavaScript | JSZip, fflate libraries |
Programming languages: Native or standard library support in Python (zipfile), Java (java.util.zip), C# (System.IO.Compression), Go (archive/zip), Node.js (archiver, adm-zip), Rust (zip crate).
Practical Usage
Create a ZIP archive with AES-256 encryption using Python
import pyminizip # pip install pyminizip
# Create an AES-encrypted ZIP (compression level 5, password protected)
pyminizip.compress_multiple(
["report.pdf", "data.csv"],
["report.pdf", "data.csv"], # names inside the archive
"secure_delivery.zip",
"strong_password_here",
5 # compression level 0-9
)
Inspect and selectively extract files from a ZIP archive
# List contents with detailed info
zipinfo -l archive.zip
# Extract only CSV files from a nested ZIP
unzip -j archive.zip "*.csv" -d ./csv_output/
# Test integrity without extracting
unzip -t archive.zip
# Extract a single file to stdout (useful for piping)
unzip -p archive.zip data/results.json | jq '.summary'
Build a ZIP archive programmatically in Node.js
const archiver = require('archiver');
const fs = require('fs');
const output = fs.createWriteStream('project.zip');
const archive = archiver('zip', { zlib: { level: 9 } });
archive.pipe(output);
archive.directory('src/', 'src');
archive.file('package.json', { name: 'package.json' });
archive.glob('docs/**/*.md');
archive.finalize();
output.on('close', () => {
console.log(`Archive created: ${archive.pointer()} bytes`);
});
Anti-Patterns
Using ZipCrypto encryption for sensitive data. ZipCrypto is the default encryption in many ZIP tools but is cryptographically broken -- it can be cracked in minutes with known-plaintext attacks. Always use AES-256 encryption (available via 7-Zip, WinZip, or pyminizip) for any data requiring confidentiality.
Creating ZIP archives of many small files without considering solid archiving alternatives. ZIP compresses each file independently, so compressing 10,000 small log files individually yields poor overall compression. Use tar.gz, tar.zst, or 7z with solid compression for collections of many small similar files.
Assuming ZIP filenames are always UTF-8 across platforms. The original ZIP spec used CP437 encoding for filenames. Archives created on Windows with non-ASCII filenames may produce garbled names on Linux/macOS and vice versa. Use the -UN=UTF8 flag with Info-ZIP or 7-Zip's -mcu=on to ensure UTF-8 filenames.
Extracting untrusted ZIP files without path traversal protection. Malicious ZIP files can contain entries with ../ in their paths (Zip Slip vulnerability), writing files outside the intended directory. Always validate that extracted paths resolve within the target directory, or use libraries with built-in protection like Python's zipfile (which rejects absolute paths by default).
Relying on the 4 GB size limit of standard ZIP without enabling ZIP64. Standard ZIP32 has a 4 GB per-file and 4 GB total archive limit. Large backup jobs will silently produce corrupt archives if the tool does not automatically enable ZIP64. Use -fz with Info-ZIP or ensure your library enables ZIP64 for large files.
Related Formats
- 7z — Better compression ratio with LZMA2, but less universal support
- RAR — Better compression than ZIP, proprietary format
- TAR.GZ — Preferred on Unix/Linux, preserves permissions natively
- Zstandard — Modern compression with much better speed/ratio tradeoff
- JAR — Java Archive, a ZIP file with a manifest
- DOCX/XLSX/PPTX — Microsoft Office formats are ZIP containers with XML inside
Install this skill directly: skilldb add file-formats-skills
Related Skills
3MF 3D Manufacturing Format
The 3MF file format — the modern replacement for STL in 3D printing, supporting colors, materials, multi-object assemblies, and precise manufacturing data in a single package.
7-Zip Compressed Archive
The 7z archive format — open-source high-ratio compression using LZMA2, with strong AES-256 encryption, solid archives, and multi-threading support.
AAC (Advanced Audio Coding)
A lossy audio codec standardized as part of MPEG-2 and MPEG-4, designed to supersede MP3 with better quality at equivalent or lower bitrates.
AC3 (Dolby Digital)
Dolby's surround sound audio codec used in cinema, DVD, Blu-ray, and broadcast television for multichannel 5.1 audio delivery.
AI Adobe Illustrator Format
AI is Adobe Illustrator's native vector graphics file format, used for
AIFF (Audio Interchange File Format)
Apple's uncompressed audio format storing raw PCM data, serving as the Mac equivalent of WAV for professional audio production.