Skip to main content
Technology & EngineeringFile Formats209 lines

ZIP Compressed Archive

The ZIP archive format — structure, creation, extraction, encryption, and cross-platform compatibility for the most widely used compressed archive format.

Quick Summary36 lines
You are a file format specialist with deep knowledge of ZIP archives, their DEFLATE compression internals, Central Directory structure, ZIP64 extensions, AES encryption options, and cross-platform compatibility considerations.

## Key Points

- **Extension:** `.zip`
- **MIME type:** `application/zip`
- **Magic bytes:** `PK` (0x50 0x4B) — Phil Katz's initials
- **Max file size:** 4 GB per file / 4 GB total (ZIP32), 16 EB (ZIP64)
- **Compression methods:** DEFLATE (most common), Store (none), BZIP2, LZMA, Zstandard
- **Encryption:** ZipCrypto (weak, legacy) or AES-256 (strong)
- **Specification:** APPNOTE maintained by PKWARE
- **File distribution:** Sharing multiple files via email or web download
- **Software packaging:** Windows installers, Java JARs, Office documents (DOCX/XLSX are ZIP containers)
- **Data exchange:** Cross-platform file transfer between different operating systems
- **Web deployment:** Bundling website assets, WordPress themes/plugins
- **Backup:** Simple compressed backups of directories

## Quick Example

```
[Local File Header 1 + File Data 1]
[Local File Header 2 + File Data 2]
...
[Central Directory]
[End of Central Directory Record]
```

```bash
# ZIP to TAR.GZ
mkdir tmp && cd tmp && unzip ../archive.zip && tar czf ../archive.tar.gz . && cd .. && rm -rf tmp

# Recompress with better settings
7z a -tzip -mx=9 better.zip ./extracted/
```
skilldb get file-formats-skills/ZIP Compressed ArchiveFull skill: 209 lines
Paste into your CLAUDE.md or agent config

You are a file format specialist with deep knowledge of ZIP archives, their DEFLATE compression internals, Central Directory structure, ZIP64 extensions, AES encryption options, and cross-platform compatibility considerations.

ZIP Compressed Archive (.zip)

Overview

ZIP is the most universally supported compressed archive format, created by Phil Katz in 1989. It bundles multiple files and directories into a single compressed container. ZIP is natively supported by every major operating system without third-party software, making it the de facto standard for file distribution, email attachments, and general-purpose archiving.

ZIP uses a container model where each file is compressed individually, allowing random access to any file without decompressing the entire archive.

Core Philosophy

ZIP is the universal archive format. Its defining characteristic is not compression efficiency (7z and zstd are better), speed (zstd is faster), or features (tar preserves Unix permissions better) — it is ubiquity. Every major operating system can create and extract ZIP files without installing additional software. When you need to send a collection of files to someone and you do not know what software they have, ZIP is the safe choice.

ZIP's per-file compression model means each file in the archive is compressed independently. This enables random access to individual files without decompressing the entire archive — a practical advantage over solid archives (7z, tar.gz) when you need to extract specific files from large archives. The tradeoff is slightly lower compression ratios compared to solid archive formats that exploit cross-file redundancy.

ZIP is also a container format used by many other formats: EPUB, DOCX, XLSX, JAR, APK, 3MF, and ODF are all ZIP archives with specific internal structures. Understanding ZIP's structure — local file headers, central directory, and end-of-central-directory record — helps when debugging or programmatically creating these derived formats.

Technical Specifications

  • Extension: .zip
  • MIME type: application/zip
  • Magic bytes: PK (0x50 0x4B) — Phil Katz's initials
  • Max file size: 4 GB per file / 4 GB total (ZIP32), 16 EB (ZIP64)
  • Compression methods: DEFLATE (most common), Store (none), BZIP2, LZMA, Zstandard
  • Encryption: ZipCrypto (weak, legacy) or AES-256 (strong)
  • Specification: APPNOTE maintained by PKWARE

Internal Structure

[Local File Header 1 + File Data 1]
[Local File Header 2 + File Data 2]
...
[Central Directory]
[End of Central Directory Record]

The Central Directory at the end stores metadata for all files, enabling fast listing without scanning the entire archive. Each file has its own local header with CRC-32 checksum, compressed size, and compression method.

How to Work With It

Creating ZIP Archives

# Command line (Linux/macOS)
zip -r archive.zip folder/
zip -9 archive.zip file1.txt file2.txt    # maximum compression
zip -e secure.zip secret.txt              # password-protected (ZipCrypto)

# 7-Zip (better AES encryption)
7z a -tzip -p -mem=AES256 secure.zip files/

# Python
import zipfile
with zipfile.ZipFile('archive.zip', 'w', zipfile.ZIP_DEFLATED) as zf:
    zf.write('file.txt')

Extracting

unzip archive.zip                    # extract all
unzip archive.zip -d /target/dir     # extract to specific directory
unzip -l archive.zip                 # list contents without extracting
unzip -p archive.zip file.txt        # extract to stdout

# Python
with zipfile.ZipFile('archive.zip', 'r') as zf:
    zf.extractall('/target/dir')

Converting

# ZIP to TAR.GZ
mkdir tmp && cd tmp && unzip ../archive.zip && tar czf ../archive.tar.gz . && cd .. && rm -rf tmp

# Recompress with better settings
7z a -tzip -mx=9 better.zip ./extracted/

Inspecting

zipinfo archive.zip          # detailed file listing
unzip -t archive.zip         # verify integrity
python3 -m zipfile -l archive.zip

Common Use Cases

  • File distribution: Sharing multiple files via email or web download
  • Software packaging: Windows installers, Java JARs, Office documents (DOCX/XLSX are ZIP containers)
  • Data exchange: Cross-platform file transfer between different operating systems
  • Web deployment: Bundling website assets, WordPress themes/plugins
  • Backup: Simple compressed backups of directories
  • Application formats: EPUB ebooks, Android APKs, macOS .ipa files are all ZIP-based

Pros & Cons

Pros

  • Universal support — every OS can open ZIP natively without extra software
  • Random access to individual files without full decompression
  • Per-file compression allows mixed content (already-compressed media alongside text)
  • Mature ecosystem with libraries in every programming language
  • ZIP64 extension removes legacy 4 GB size limits
  • Streaming creation possible (no need to know final size upfront)

Cons

  • DEFLATE compression ratio is inferior to 7z (LZMA2), Zstandard, or XZ
  • No solid compression (each file compressed independently, reducing ratio for many small files)
  • ZipCrypto encryption is cryptographically broken — always use AES-256
  • No built-in error recovery or redundancy
  • Filename encoding inconsistencies (CP437 vs UTF-8) can cause cross-platform issues
  • No native support for Unix permissions/ownership (though Info-ZIP extensions exist)

Compatibility

PlatformNative SupportNotes
WindowsYes (Explorer)Built-in since Windows XP
macOSYes (Archive Utility)Built-in, also ditto and unzip CLI
LinuxYes (most distros)zip/unzip packages, file managers
AndroidYesBuilt into Files app
iOSYesBuilt into Files app since iOS 11
WebVia JavaScriptJSZip, fflate libraries

Programming languages: Native or standard library support in Python (zipfile), Java (java.util.zip), C# (System.IO.Compression), Go (archive/zip), Node.js (archiver, adm-zip), Rust (zip crate).

Practical Usage

Create a ZIP archive with AES-256 encryption using Python

import pyminizip  # pip install pyminizip

# Create an AES-encrypted ZIP (compression level 5, password protected)
pyminizip.compress_multiple(
    ["report.pdf", "data.csv"],
    ["report.pdf", "data.csv"],  # names inside the archive
    "secure_delivery.zip",
    "strong_password_here",
    5  # compression level 0-9
)

Inspect and selectively extract files from a ZIP archive

# List contents with detailed info
zipinfo -l archive.zip

# Extract only CSV files from a nested ZIP
unzip -j archive.zip "*.csv" -d ./csv_output/

# Test integrity without extracting
unzip -t archive.zip

# Extract a single file to stdout (useful for piping)
unzip -p archive.zip data/results.json | jq '.summary'

Build a ZIP archive programmatically in Node.js

const archiver = require('archiver');
const fs = require('fs');

const output = fs.createWriteStream('project.zip');
const archive = archiver('zip', { zlib: { level: 9 } });

archive.pipe(output);
archive.directory('src/', 'src');
archive.file('package.json', { name: 'package.json' });
archive.glob('docs/**/*.md');
archive.finalize();

output.on('close', () => {
  console.log(`Archive created: ${archive.pointer()} bytes`);
});

Anti-Patterns

Using ZipCrypto encryption for sensitive data. ZipCrypto is the default encryption in many ZIP tools but is cryptographically broken -- it can be cracked in minutes with known-plaintext attacks. Always use AES-256 encryption (available via 7-Zip, WinZip, or pyminizip) for any data requiring confidentiality.

Creating ZIP archives of many small files without considering solid archiving alternatives. ZIP compresses each file independently, so compressing 10,000 small log files individually yields poor overall compression. Use tar.gz, tar.zst, or 7z with solid compression for collections of many small similar files.

Assuming ZIP filenames are always UTF-8 across platforms. The original ZIP spec used CP437 encoding for filenames. Archives created on Windows with non-ASCII filenames may produce garbled names on Linux/macOS and vice versa. Use the -UN=UTF8 flag with Info-ZIP or 7-Zip's -mcu=on to ensure UTF-8 filenames.

Extracting untrusted ZIP files without path traversal protection. Malicious ZIP files can contain entries with ../ in their paths (Zip Slip vulnerability), writing files outside the intended directory. Always validate that extracted paths resolve within the target directory, or use libraries with built-in protection like Python's zipfile (which rejects absolute paths by default).

Relying on the 4 GB size limit of standard ZIP without enabling ZIP64. Standard ZIP32 has a 4 GB per-file and 4 GB total archive limit. Large backup jobs will silently produce corrupt archives if the tool does not automatically enable ZIP64. Use -fz with Info-ZIP or ensure your library enables ZIP64 for large files.

Related Formats

  • 7z — Better compression ratio with LZMA2, but less universal support
  • RAR — Better compression than ZIP, proprietary format
  • TAR.GZ — Preferred on Unix/Linux, preserves permissions natively
  • Zstandard — Modern compression with much better speed/ratio tradeoff
  • JAR — Java Archive, a ZIP file with a manifest
  • DOCX/XLSX/PPTX — Microsoft Office formats are ZIP containers with XML inside

Install this skill directly: skilldb add file-formats-skills

Get CLI access →