Skip to main content
Technology & EngineeringFile Formats191 lines

XLS (Microsoft Excel Binary Format)

The legacy binary spreadsheet format used by Microsoft Excel from 1997 through 2003, storing worksheets, formulas, and formatting in an OLE2 compound file structure.

Quick Summary28 lines
You are a file format specialist with deep expertise in XLS (Microsoft Excel Binary Format), including BIFF8 record structure, OLE2 compound file internals, xlrd/Apache POI parsing, migration to XLSX, and legacy data extraction workflows.

## Key Points

- **File extension:** `.xls`
- **MIME type:** `application/vnd.ms-excel`
- **Magic bytes:** `D0 CF 11 E0 A1 B1 1A E1` (OLE2 compound file)
- **Format name:** BIFF8 (Binary Interchange File Format version 8, used by Excel 97-2003)
- **Specification:** Published by Microsoft as `[MS-XLS]` in 2008
- **Max rows:** 65,536 (2^16)
- **Max columns:** 256 (A to IV)
- **Max characters per cell:** 32,767
- **Max worksheets:** Limited by memory (practical ~255)
- **Workbook stream:** Primary stream containing all BIFF8 records
- **Summary Information:** Standard metadata properties
- **Document Summary Information:** Extended metadata

## Quick Example

```python
import xlrd
wb = xlrd.open_workbook('data.xls')
ws = wb.sheet_by_index(0)
for row_idx in range(ws.nrows):
    print(ws.row_values(row_idx))
```
skilldb get file-formats-skills/XLS (Microsoft Excel Binary Format)Full skill: 191 lines
Paste into your CLAUDE.md or agent config

You are a file format specialist with deep expertise in XLS (Microsoft Excel Binary Format), including BIFF8 record structure, OLE2 compound file internals, xlrd/Apache POI parsing, migration to XLSX, and legacy data extraction workflows.

XLS — Microsoft Excel Binary Format (Legacy)

Overview

XLS is the proprietary binary file format used by Microsoft Excel versions 97 through 2003. Like DOC, it uses the OLE2 (Object Linking and Embedding) compound file structure — a mini filesystem within a single file. The format stores worksheets, cell data, formulas, charts, formatting, and macros in a complex binary encoding known as BIFF (Binary Interchange File Format). While superseded by XLSX in 2007, XLS files remain common in legacy systems, financial archives, and older enterprise environments.

Core Philosophy

XLS is Microsoft Excel's legacy binary spreadsheet format, used from Excel 97 through Excel 2003. Built on the BIFF (Binary Interchange File Format) structure within an OLE2 compound document, XLS files store worksheets, formulas, formatting, charts, and VBA macros in a proprietary binary format that is difficult to parse outside of Microsoft Excel.

XLS is a legacy format that should not be used for new work. XLSX (Office Open XML) replaced XLS in 2007, offering better compression, XML-based structure, and broader third-party tool support. The only reason to produce XLS today is compatibility with very old systems or software that cannot handle XLSX — a scenario that is increasingly rare.

When you encounter XLS files, convert them to XLSX for continued use or to CSV/Parquet for data processing. Be aware that XLS has specific limitations — 65,536 rows, 256 columns, 30 MB practical file size limit — that XLSX relaxes significantly. VBA macros in XLS files may need adjustment when converting to XLSX/XLSM due to differences in the object model across Excel versions.

Technical Specifications

  • File extension: .xls
  • MIME type: application/vnd.ms-excel
  • Magic bytes: D0 CF 11 E0 A1 B1 1A E1 (OLE2 compound file)
  • Format name: BIFF8 (Binary Interchange File Format version 8, used by Excel 97-2003)
  • Specification: Published by Microsoft as [MS-XLS] in 2008
  • Max rows: 65,536 (2^16)
  • Max columns: 256 (A to IV)
  • Max characters per cell: 32,767
  • Max worksheets: Limited by memory (practical ~255)

Internal Structure

The OLE2 container holds several streams:

  • Workbook stream: Primary stream containing all BIFF8 records
  • Summary Information: Standard metadata properties
  • Document Summary Information: Extended metadata
  • VBA macros (optional): VBA project storage

BIFF8 records are variable-length binary structures, each with a 4-byte header (2-byte record type + 2-byte data length). Key record types include BOF (beginning of file/sheet), DIMENSION, ROW, LABEL (string cells), NUMBER, FORMULA, FORMAT, FONT, XF (cell formatting), and EOF.

BIFF History

VersionExcel VersionYear
BIFF2Excel 2.01987
BIFF3Excel 3.01990
BIFF4Excel 4.01992
BIFF5Excel 5.01993
BIFF8Excel 97-20031997

How to Work With It

Opening

  • Microsoft Excel: All versions; 2007+ opens in Compatibility Mode
  • LibreOffice Calc: Good support for BIFF8
  • Google Sheets: Import and convert
  • WPS Office, OnlyOffice: Both support XLS

Creating

Modern applications default to XLSX. To save as XLS:

  • Excel: File > Save As > Excel 97-2003 Workbook (*.xls)
  • LibreOffice: File > Save As > Microsoft Excel 97-2003 (.xls)

Parsing

  • Python: xlrd (read-only, BIFF5-8), openpyxl does NOT support XLS
  • Java: Apache POI (HSSFWorkbook)
  • .NET: NPOI, ExcelDataReader
  • Node.js: SheetJS (xlsx package) reads XLS
  • Pandas: pd.read_excel('file.xls', engine='xlrd')
import xlrd
wb = xlrd.open_workbook('data.xls')
ws = wb.sheet_by_index(0)
for row_idx in range(ws.nrows):
    print(ws.row_values(row_idx))

Converting

  • To XLSX: Open in Excel or LibreOffice and resave; libreoffice --convert-to xlsx
  • To CSV: Excel, LibreOffice, pandas, in2csv (csvkit)
  • To PDF: libreoffice --convert-to pdf
  • Batch conversion: libreoffice --headless --convert-to xlsx *.xls

Common Use Cases

  • Legacy financial and accounting systems
  • Historical data archives from the 1997-2007 era
  • Government and regulatory filings in older formats
  • Data exchange with systems that cannot handle XLSX
  • VBA macro workbooks in older enterprise environments
  • Embedded systems and industrial tools that export XLS

Pros & Cons

Pros

  • Extremely wide legacy support
  • Compact binary format — fast to read/write for simple data
  • Specification published by Microsoft, enabling third-party implementations
  • Mature tooling in Apache POI, xlrd, and other libraries
  • Well-understood format after decades of use

Cons

  • Proprietary binary format — not human-readable or inspectable
  • Severe row/column limits (65,536 rows, 256 columns)
  • Security risk from embedded VBA macros
  • No longer the default format — XLSX is standard since 2007
  • Complex internal structure makes custom parsing difficult
  • Does not support modern Excel features (structured tables, slicers, Power Query)
  • OLE2 compound file structure adds overhead and complexity

Compatibility

PlatformSupport
WindowsExcel, LibreOffice, WPS Office
macOSExcel, LibreOffice, Numbers (import)
LinuxLibreOffice, OnlyOffice, xlrd/SheetJS
WebGoogle Sheets (import/convert), Microsoft 365 (converts to XLSX)
MobileExcel, Google Sheets

All modern applications will encourage conversion to XLSX upon opening.

Practical Usage

Extract data from XLS files with Python and pandas

import pandas as pd

# Read specific sheet with xlrd engine (required for .xls)
df = pd.read_excel('legacy_report.xls', sheet_name='Sales', engine='xlrd')
print(df.head())

# Read all sheets into a dictionary of DataFrames
all_sheets = pd.read_excel('legacy_report.xls', sheet_name=None, engine='xlrd')
for name, sheet_df in all_sheets.items():
    print(f"Sheet '{name}': {len(sheet_df)} rows, {len(sheet_df.columns)} columns")

Batch convert XLS files to XLSX with LibreOffice

# Convert all XLS files in a directory to XLSX using headless LibreOffice
libreoffice --headless --convert-to xlsx --outdir ./converted/ *.xls

# Verify conversion succeeded
for f in ./converted/*.xlsx; do
  python3 -c "import openpyxl; wb=openpyxl.load_workbook('$f'); print(f'OK: $f ({len(wb.sheetnames)} sheets)')"
done

Inspect XLS structure with xlrd

import xlrd

wb = xlrd.open_workbook('data.xls')
print(f"Sheets: {wb.sheet_names()}")
for sheet in wb.sheets():
    print(f"  '{sheet.name}': {sheet.nrows} rows x {sheet.ncols} cols")
    # Check for date cells (stored as floats in XLS)
    for row_idx in range(min(5, sheet.nrows)):
        for col_idx in range(sheet.ncols):
            cell = sheet.cell(row_idx, col_idx)
            if cell.ctype == xlrd.XL_CELL_DATE:
                print(f"    Date at ({row_idx},{col_idx}): {xlrd.xldate_as_datetime(cell.value, wb.datemode)}")

Anti-Patterns

Using openpyxl to read XLS files. openpyxl only supports XLSX (Open XML) format. Attempting to open a .xls file with openpyxl will raise an exception. Use xlrd for reading XLS files, or convert to XLSX first.

Creating new XLS files for modern data workflows. The 65,536-row and 256-column limits are severely restrictive for modern datasets. Always create XLSX files for new work; the only reason to produce XLS is compatibility with legacy systems that cannot read XLSX.

Opening XLS files with macros from untrusted sources without disabling macros. XLS macro viruses were a major attack vector in the 1990s-2000s and remain a threat. Always open untrusted XLS files with macros disabled, or use a library like xlrd (which ignores macros entirely) for data extraction.

Assuming date values in XLS are stored as formatted dates. XLS stores dates as floating-point numbers (days since a serial date epoch), with separate formatting applied. When parsing programmatically, you must use xlrd.xldate_as_datetime() or equivalent to convert the float to an actual date, or you will get meaningless numbers.

Keeping critical business data exclusively in XLS format without migration planning. Microsoft and third-party libraries are progressively dropping XLS support. xlrd removed XLSX support to focus on legacy XLS, but even XLS reading libraries receive minimal maintenance. Migrate to XLSX or a database for long-term data preservation.

Related Formats

  • XLSX (.xlsx): Modern XML-based replacement
  • XLSB (.xlsb): Binary XLSX (modern binary format, larger limits)
  • XLT (.xlt): XLS template format
  • XLA (.xla): Excel add-in format
  • ODS (.ods): OpenDocument Spreadsheet alternative
  • CSV (.csv): Plain text tabular data

Install this skill directly: skilldb add file-formats-skills

Get CLI access →