XLS (Microsoft Excel Binary Format)
The legacy binary spreadsheet format used by Microsoft Excel from 1997 through 2003, storing worksheets, formulas, and formatting in an OLE2 compound file structure.
You are a file format specialist with deep expertise in XLS (Microsoft Excel Binary Format), including BIFF8 record structure, OLE2 compound file internals, xlrd/Apache POI parsing, migration to XLSX, and legacy data extraction workflows.
## Key Points
- **File extension:** `.xls`
- **MIME type:** `application/vnd.ms-excel`
- **Magic bytes:** `D0 CF 11 E0 A1 B1 1A E1` (OLE2 compound file)
- **Format name:** BIFF8 (Binary Interchange File Format version 8, used by Excel 97-2003)
- **Specification:** Published by Microsoft as `[MS-XLS]` in 2008
- **Max rows:** 65,536 (2^16)
- **Max columns:** 256 (A to IV)
- **Max characters per cell:** 32,767
- **Max worksheets:** Limited by memory (practical ~255)
- **Workbook stream:** Primary stream containing all BIFF8 records
- **Summary Information:** Standard metadata properties
- **Document Summary Information:** Extended metadata
## Quick Example
```python
import xlrd
wb = xlrd.open_workbook('data.xls')
ws = wb.sheet_by_index(0)
for row_idx in range(ws.nrows):
print(ws.row_values(row_idx))
```skilldb get file-formats-skills/XLS (Microsoft Excel Binary Format)Full skill: 191 linesYou are a file format specialist with deep expertise in XLS (Microsoft Excel Binary Format), including BIFF8 record structure, OLE2 compound file internals, xlrd/Apache POI parsing, migration to XLSX, and legacy data extraction workflows.
XLS — Microsoft Excel Binary Format (Legacy)
Overview
XLS is the proprietary binary file format used by Microsoft Excel versions 97 through 2003. Like DOC, it uses the OLE2 (Object Linking and Embedding) compound file structure — a mini filesystem within a single file. The format stores worksheets, cell data, formulas, charts, formatting, and macros in a complex binary encoding known as BIFF (Binary Interchange File Format). While superseded by XLSX in 2007, XLS files remain common in legacy systems, financial archives, and older enterprise environments.
Core Philosophy
XLS is Microsoft Excel's legacy binary spreadsheet format, used from Excel 97 through Excel 2003. Built on the BIFF (Binary Interchange File Format) structure within an OLE2 compound document, XLS files store worksheets, formulas, formatting, charts, and VBA macros in a proprietary binary format that is difficult to parse outside of Microsoft Excel.
XLS is a legacy format that should not be used for new work. XLSX (Office Open XML) replaced XLS in 2007, offering better compression, XML-based structure, and broader third-party tool support. The only reason to produce XLS today is compatibility with very old systems or software that cannot handle XLSX — a scenario that is increasingly rare.
When you encounter XLS files, convert them to XLSX for continued use or to CSV/Parquet for data processing. Be aware that XLS has specific limitations — 65,536 rows, 256 columns, 30 MB practical file size limit — that XLSX relaxes significantly. VBA macros in XLS files may need adjustment when converting to XLSX/XLSM due to differences in the object model across Excel versions.
Technical Specifications
- File extension:
.xls - MIME type:
application/vnd.ms-excel - Magic bytes:
D0 CF 11 E0 A1 B1 1A E1(OLE2 compound file) - Format name: BIFF8 (Binary Interchange File Format version 8, used by Excel 97-2003)
- Specification: Published by Microsoft as
[MS-XLS]in 2008 - Max rows: 65,536 (2^16)
- Max columns: 256 (A to IV)
- Max characters per cell: 32,767
- Max worksheets: Limited by memory (practical ~255)
Internal Structure
The OLE2 container holds several streams:
- Workbook stream: Primary stream containing all BIFF8 records
- Summary Information: Standard metadata properties
- Document Summary Information: Extended metadata
- VBA macros (optional): VBA project storage
BIFF8 records are variable-length binary structures, each with a 4-byte header (2-byte record type + 2-byte data length). Key record types include BOF (beginning of file/sheet), DIMENSION, ROW, LABEL (string cells), NUMBER, FORMULA, FORMAT, FONT, XF (cell formatting), and EOF.
BIFF History
| Version | Excel Version | Year |
|---|---|---|
| BIFF2 | Excel 2.0 | 1987 |
| BIFF3 | Excel 3.0 | 1990 |
| BIFF4 | Excel 4.0 | 1992 |
| BIFF5 | Excel 5.0 | 1993 |
| BIFF8 | Excel 97-2003 | 1997 |
How to Work With It
Opening
- Microsoft Excel: All versions; 2007+ opens in Compatibility Mode
- LibreOffice Calc: Good support for BIFF8
- Google Sheets: Import and convert
- WPS Office, OnlyOffice: Both support XLS
Creating
Modern applications default to XLSX. To save as XLS:
- Excel: File > Save As > Excel 97-2003 Workbook (*.xls)
- LibreOffice: File > Save As > Microsoft Excel 97-2003 (.xls)
Parsing
- Python:
xlrd(read-only, BIFF5-8),openpyxldoes NOT support XLS - Java: Apache POI (
HSSFWorkbook) - .NET: NPOI, ExcelDataReader
- Node.js: SheetJS (
xlsxpackage) reads XLS - Pandas:
pd.read_excel('file.xls', engine='xlrd')
import xlrd
wb = xlrd.open_workbook('data.xls')
ws = wb.sheet_by_index(0)
for row_idx in range(ws.nrows):
print(ws.row_values(row_idx))
Converting
- To XLSX: Open in Excel or LibreOffice and resave;
libreoffice --convert-to xlsx - To CSV: Excel, LibreOffice,
pandas,in2csv(csvkit) - To PDF:
libreoffice --convert-to pdf - Batch conversion:
libreoffice --headless --convert-to xlsx *.xls
Common Use Cases
- Legacy financial and accounting systems
- Historical data archives from the 1997-2007 era
- Government and regulatory filings in older formats
- Data exchange with systems that cannot handle XLSX
- VBA macro workbooks in older enterprise environments
- Embedded systems and industrial tools that export XLS
Pros & Cons
Pros
- Extremely wide legacy support
- Compact binary format — fast to read/write for simple data
- Specification published by Microsoft, enabling third-party implementations
- Mature tooling in Apache POI, xlrd, and other libraries
- Well-understood format after decades of use
Cons
- Proprietary binary format — not human-readable or inspectable
- Severe row/column limits (65,536 rows, 256 columns)
- Security risk from embedded VBA macros
- No longer the default format — XLSX is standard since 2007
- Complex internal structure makes custom parsing difficult
- Does not support modern Excel features (structured tables, slicers, Power Query)
- OLE2 compound file structure adds overhead and complexity
Compatibility
| Platform | Support |
|---|---|
| Windows | Excel, LibreOffice, WPS Office |
| macOS | Excel, LibreOffice, Numbers (import) |
| Linux | LibreOffice, OnlyOffice, xlrd/SheetJS |
| Web | Google Sheets (import/convert), Microsoft 365 (converts to XLSX) |
| Mobile | Excel, Google Sheets |
All modern applications will encourage conversion to XLSX upon opening.
Practical Usage
Extract data from XLS files with Python and pandas
import pandas as pd
# Read specific sheet with xlrd engine (required for .xls)
df = pd.read_excel('legacy_report.xls', sheet_name='Sales', engine='xlrd')
print(df.head())
# Read all sheets into a dictionary of DataFrames
all_sheets = pd.read_excel('legacy_report.xls', sheet_name=None, engine='xlrd')
for name, sheet_df in all_sheets.items():
print(f"Sheet '{name}': {len(sheet_df)} rows, {len(sheet_df.columns)} columns")
Batch convert XLS files to XLSX with LibreOffice
# Convert all XLS files in a directory to XLSX using headless LibreOffice
libreoffice --headless --convert-to xlsx --outdir ./converted/ *.xls
# Verify conversion succeeded
for f in ./converted/*.xlsx; do
python3 -c "import openpyxl; wb=openpyxl.load_workbook('$f'); print(f'OK: $f ({len(wb.sheetnames)} sheets)')"
done
Inspect XLS structure with xlrd
import xlrd
wb = xlrd.open_workbook('data.xls')
print(f"Sheets: {wb.sheet_names()}")
for sheet in wb.sheets():
print(f" '{sheet.name}': {sheet.nrows} rows x {sheet.ncols} cols")
# Check for date cells (stored as floats in XLS)
for row_idx in range(min(5, sheet.nrows)):
for col_idx in range(sheet.ncols):
cell = sheet.cell(row_idx, col_idx)
if cell.ctype == xlrd.XL_CELL_DATE:
print(f" Date at ({row_idx},{col_idx}): {xlrd.xldate_as_datetime(cell.value, wb.datemode)}")
Anti-Patterns
Using openpyxl to read XLS files. openpyxl only supports XLSX (Open XML) format. Attempting to open a .xls file with openpyxl will raise an exception. Use xlrd for reading XLS files, or convert to XLSX first.
Creating new XLS files for modern data workflows. The 65,536-row and 256-column limits are severely restrictive for modern datasets. Always create XLSX files for new work; the only reason to produce XLS is compatibility with legacy systems that cannot read XLSX.
Opening XLS files with macros from untrusted sources without disabling macros. XLS macro viruses were a major attack vector in the 1990s-2000s and remain a threat. Always open untrusted XLS files with macros disabled, or use a library like xlrd (which ignores macros entirely) for data extraction.
Assuming date values in XLS are stored as formatted dates. XLS stores dates as floating-point numbers (days since a serial date epoch), with separate formatting applied. When parsing programmatically, you must use xlrd.xldate_as_datetime() or equivalent to convert the float to an actual date, or you will get meaningless numbers.
Keeping critical business data exclusively in XLS format without migration planning. Microsoft and third-party libraries are progressively dropping XLS support. xlrd removed XLSX support to focus on legacy XLS, but even XLS reading libraries receive minimal maintenance. Migrate to XLSX or a database for long-term data preservation.
Related Formats
- XLSX (.xlsx): Modern XML-based replacement
- XLSB (.xlsb): Binary XLSX (modern binary format, larger limits)
- XLT (.xlt): XLS template format
- XLA (.xla): Excel add-in format
- ODS (.ods): OpenDocument Spreadsheet alternative
- CSV (.csv): Plain text tabular data
Install this skill directly: skilldb add file-formats-skills
Related Skills
3MF 3D Manufacturing Format
The 3MF file format — the modern replacement for STL in 3D printing, supporting colors, materials, multi-object assemblies, and precise manufacturing data in a single package.
7-Zip Compressed Archive
The 7z archive format — open-source high-ratio compression using LZMA2, with strong AES-256 encryption, solid archives, and multi-threading support.
AAC (Advanced Audio Coding)
A lossy audio codec standardized as part of MPEG-2 and MPEG-4, designed to supersede MP3 with better quality at equivalent or lower bitrates.
AC3 (Dolby Digital)
Dolby's surround sound audio codec used in cinema, DVD, Blu-ray, and broadcast television for multichannel 5.1 audio delivery.
AI Adobe Illustrator Format
AI is Adobe Illustrator's native vector graphics file format, used for
AIFF (Audio Interchange File Format)
Apple's uncompressed audio format storing raw PCM data, serving as the Mac equivalent of WAV for professional audio production.