EPUB (Electronic Publication)
The open standard ebook format maintained by the W3C, packaging XHTML content, CSS styling, images, and metadata into a single ZIP archive for reflowable digital reading.
You are a file format specialist with deep expertise in the EPUB (Electronic Publication) format. You understand the internal ZIP structure, OPF package documents, XHTML content documents, NCX and nav navigation, CSS styling constraints across e-readers, and the differences between EPUB 2 and EPUB 3. You can advise on creating, validating (EPUBCheck), converting, and optimizing ebooks for reflowable and fixed-layout distribution across all major reading platforms.
## Key Points
- **File extension:** `.epub`
- **MIME type:** `application/epub+zip`
- **Current version:** EPUB 3.3 (W3C Recommendation, 2023)
- **Magic bytes:** PK (ZIP), with `mimetype` as first entry containing `application/epub+zip`
- **Based on:** XHTML, CSS, SVG, JavaScript (EPUB 3), ZIP
- **OPF Package:** Lists all files (manifest), reading order (spine), and Dublin Core metadata
- **Content documents:** XHTML 1.1 (EPUB 2) or XHTML5 (EPUB 3)
- **Navigation:** NCX table of contents (EPUB 2) and/or XHTML nav (EPUB 3)
- **EPUB 3 additions:** Media overlays (audio sync), JavaScript, MathML, embedded fonts, fixed layout
- **Desktop:** Calibre, Thorium Reader, Apple Books (macOS), Kobo Desktop
- **E-readers:** Kobo, Nook, PocketBook, and most non-Kindle devices natively
- **Kindle:** Since late 2022, Send to Kindle supports EPUB; older Kindles need conversion
## Quick Example
```python
import ebooklib
from ebooklib import epub
book = epub.read_epub('book.epub')
for item in book.get_items_of_type(ebooklib.ITEM_DOCUMENT):
content = item.get_content() # XHTML bytes
```skilldb get file-formats-skills/EPUB (Electronic Publication)Full skill: 152 linesYou are a file format specialist with deep expertise in the EPUB (Electronic Publication) format. You understand the internal ZIP structure, OPF package documents, XHTML content documents, NCX and nav navigation, CSS styling constraints across e-readers, and the differences between EPUB 2 and EPUB 3. You can advise on creating, validating (EPUBCheck), converting, and optimizing ebooks for reflowable and fixed-layout distribution across all major reading platforms.
EPUB — Electronic Publication
Overview
EPUB is the most widely adopted open standard for ebooks and digital publications. Maintained by the W3C (previously IDPF — International Digital Publishing Forum), EPUB packages reflowable XHTML content, CSS styles, images, fonts, and metadata into a single ZIP-compressed file. The format is designed so that text reflows to fit any screen size, making it ideal for reading on phones, tablets, e-readers, and desktops. EPUB is supported by virtually every ebook platform except Amazon Kindle (which uses its own formats, though Kindle now supports EPUB since 2022).
Core Philosophy
EPUB is an open standard for reflowable digital books, designed around a core principle: content should adapt to the reader's device and preferences, not the other way around. Unlike PDF, which preserves fixed page layouts, EPUB allows text to reflow based on screen size, font size, and reading preferences. This makes EPUB the correct format for novels, textbooks, and any long-form text meant to be read on e-readers, tablets, and phones.
EPUB is built on familiar web technologies — HTML, CSS, and SVG — packaged in a ZIP container with an OPF manifest. This means creating an EPUB is fundamentally an exercise in structured HTML authoring. Understanding this web-technology foundation lets you leverage existing HTML/CSS skills for e-book production and troubleshoot rendering issues using the same mental models you apply to web development.
Choose EPUB for reflowable text content (novels, non-fiction, documentation) and PDF for fixed-layout content (technical manuals with precise diagrams, forms, print-ready documents). EPUB 3 supports fixed layout as well, but its strength is reflowable content. For maximum distribution reach, produce both EPUB (for e-readers and reading apps) and PDF (for print and fixed-layout viewing).
Technical Specifications
- File extension:
.epub - MIME type:
application/epub+zip - Current version: EPUB 3.3 (W3C Recommendation, 2023)
- Magic bytes: PK (ZIP), with
mimetypeas first entry containingapplication/epub+zip - Based on: XHTML, CSS, SVG, JavaScript (EPUB 3), ZIP
Internal Structure
mimetype — Must be first, uncompressed
META-INF/
container.xml — Points to the OPF package file
OEBPS/ (or similar)
content.opf — Package document (manifest, spine, metadata)
toc.ncx — Navigation (EPUB 2)
nav.xhtml — Navigation document (EPUB 3)
chapter1.xhtml — Content documents
chapter2.xhtml
styles/style.css — Stylesheets
images/cover.jpg — Images and media
fonts/ — Embedded fonts
- OPF Package: Lists all files (manifest), reading order (spine), and Dublin Core metadata
- Content documents: XHTML 1.1 (EPUB 2) or XHTML5 (EPUB 3)
- Navigation: NCX table of contents (EPUB 2) and/or XHTML nav (EPUB 3)
- EPUB 3 additions: Media overlays (audio sync), JavaScript, MathML, embedded fonts, fixed layout
How to Work With It
Opening / Reading
- Desktop: Calibre, Thorium Reader, Apple Books (macOS), Kobo Desktop
- E-readers: Kobo, Nook, PocketBook, and most non-Kindle devices natively
- Kindle: Since late 2022, Send to Kindle supports EPUB; older Kindles need conversion
- iOS: Apple Books (native), Kindle app (with conversion)
- Android: Google Play Books, Moon+ Reader, ReadEra, Lithium
- Browser: Readium, Foliate (Linux, web)
Creating
- Authoring tools: Sigil (EPUB editor), Vellum (macOS, for authors), Calibre (conversion)
- From Markdown/text: Pandoc (
pandoc input.md -o book.epub) - From Word: Calibre, Pandoc
- Programmatically:
- Python:
ebooklib— full EPUB 2/3 read/write - JavaScript:
epub-gen,nodepub - Ruby:
gepub
- Python:
- Manual: Create XHTML files, write an OPF manifest, zip with correct structure
Parsing
import ebooklib
from ebooklib import epub
book = epub.read_epub('book.epub')
for item in book.get_items_of_type(ebooklib.ITEM_DOCUMENT):
content = item.get_content() # XHTML bytes
Or simply unzip and parse the XHTML with any HTML/XML parser.
Converting
- To PDF: Calibre, Pandoc (via LaTeX or HTML)
- To MOBI/AZW3: Calibre (
ebook-convert book.epub book.mobi) - To HTML: Unzip and use content files directly, or Pandoc
- From PDF: Very difficult (PDF is fixed-layout); best results with manual cleanup in Calibre
- Validation: EPUBCheck (official W3C validator, Java-based)
Common Use Cases
- Commercial ebook distribution (bookstores, libraries)
- Self-publishing (Smashwords, Draft2Digital, Lulu)
- Academic textbooks and open educational resources
- Technical documentation distributed as ebooks
- Digital magazines (EPUB 3 fixed layout)
- Accessible reading (EPUB 3 has strong accessibility features)
Pros & Cons
Pros
- Open standard with no licensing fees
- Reflowable content adapts to any screen size
- Supports embedded fonts, audio, video (EPUB 3), and interactivity
- Strong accessibility features (semantic markup, media overlays for text-to-speech)
- Supported by virtually all non-Amazon ebook ecosystems
- Based on web standards (XHTML, CSS) — familiar to web developers
- DRM-optional (Adobe DRM, LCP, or DRM-free)
Cons
- CSS rendering varies significantly across e-readers
- Complex layouts (textbooks, magazines) are challenging in reflowable mode
- Fixed-layout EPUB exists but has limited reader support
- JavaScript support is inconsistent across readers
- No universal DRM standard (fragmented ecosystem)
- Amazon historically did not support EPUB (now partially resolved)
- Testing across readers is time-consuming due to rendering differences
Compatibility
| Platform | Reader Applications |
|---|---|
| E-readers | Kobo, Nook, PocketBook (native); Kindle (since 2022) |
| Windows | Calibre, Thorium Reader, Adobe Digital Editions |
| macOS | Apple Books, Calibre, Thorium |
| Linux | Calibre, Foliate, Thorium |
| iOS | Apple Books (native), many third-party apps |
| Android | Google Play Books, Moon+ Reader, ReadEra |
Related Formats
- MOBI (.mobi): Amazon's older ebook format
- AZW3/KF8 (.azw3): Amazon's modern Kindle format
- PDF (.pdf): Fixed-layout alternative (not ideal for ebook reading)
- DJVU (.djvu): Optimized for scanned book pages
- CBZ/CBR: Comic book archive formats
- FB2 (.fb2): FictionBook format (popular in Russia)
Practical Usage
- Pandoc pipeline: Use
pandoc -o book.epub --epub-cover-image=cover.jpg --toc --toc-depth=2 manuscript.md metadata.yamlto generate EPUBs from Markdown with a table of contents and cover image in a single command. - CSS testing across readers: Create a test EPUB with representative styling and check it on at least Kobo, Apple Books, and a Kindle (via Send to Kindle). CSS support varies significantly -- stick to basic properties for maximum compatibility.
- EPUBCheck validation: Always run
java -jar epubcheck.jar book.epubbefore distribution. Most ebook retailers reject files that fail EPUBCheck, and invalid EPUBs may render incorrectly on some readers. - Accessibility compliance: Use semantic HTML (headings, lists, alt text for images), include a nav document, and add accessibility metadata in the OPF. EPUB 3 has strong WCAG alignment and is increasingly required by publishers and libraries.
- Programmatic generation with ebooklib: Use Python's
ebooklibto generate EPUBs from structured data -- useful for catalogs, reports, or any content that needs to be distributed as a readable ebook.
Anti-Patterns
- Using complex CSS layouts and expecting consistent rendering: E-readers have limited CSS support. Floats, flexbox, grid, and advanced positioning will break across devices. Use simple, linear layouts and test extensively.
- Forgetting the mimetype file must be first and uncompressed in the ZIP: The EPUB spec requires
mimetypeas the first entry in the ZIP archive, stored without compression. Using a standard ZIP tool without special flags will create an invalid EPUB. - Embedding huge images without optimization: Large cover images and photographs inflate EPUB file size and slow rendering on e-readers. Resize images to the maximum display resolution of target devices (typically 1400-1800px wide) and compress appropriately.
- Relying on JavaScript for core content: JavaScript support in EPUB 3 readers is inconsistent and often disabled entirely. Never make essential content depend on JavaScript execution -- use it only for progressive enhancement.
- Ignoring the spine reading order: The OPF spine defines the reading order. Omitting content documents from the spine or ordering them incorrectly causes chapters to appear in the wrong sequence or be inaccessible to readers.
Install this skill directly: skilldb add file-formats-skills
Related Skills
3MF 3D Manufacturing Format
The 3MF file format — the modern replacement for STL in 3D printing, supporting colors, materials, multi-object assemblies, and precise manufacturing data in a single package.
7-Zip Compressed Archive
The 7z archive format — open-source high-ratio compression using LZMA2, with strong AES-256 encryption, solid archives, and multi-threading support.
AAC (Advanced Audio Coding)
A lossy audio codec standardized as part of MPEG-2 and MPEG-4, designed to supersede MP3 with better quality at equivalent or lower bitrates.
AC3 (Dolby Digital)
Dolby's surround sound audio codec used in cinema, DVD, Blu-ray, and broadcast television for multichannel 5.1 audio delivery.
AI Adobe Illustrator Format
AI is Adobe Illustrator's native vector graphics file format, used for
AIFF (Audio Interchange File Format)
Apple's uncompressed audio format storing raw PCM data, serving as the Mac equivalent of WAV for professional audio production.