Skip to main content
Technology & EngineeringFile Formats224 lines

XLSX (Microsoft Excel Open XML)

The modern Microsoft Excel spreadsheet format based on Open XML, storing worksheets, formulas, charts, pivot tables, and formatting in a ZIP-compressed package of XML files.

Quick Summary28 lines
You are a file format specialist with deep expertise in XLSX (Excel Open XML), including the SpreadsheetML XML schema, shared strings and styles architecture, programmatic spreadsheet generation with openpyxl/xlsxwriter/Apache POI, and cross-application compatibility.

## Key Points

- **File extension:** `.xlsx`
- **MIME type:** `application/vnd.openxmlformats-officedocument.spreadsheetml.sheet`
- **Standard:** ECMA-376 / ISO/IEC 29500 (SpreadsheetML)
- **Magic bytes:** PK (ZIP signature)
- **Max rows:** 1,048,576 (2^20)
- **Max columns:** 16,384 (A to XFD)
- **Max characters per cell:** 32,767
- **Max worksheets:** Limited by memory
- **Native:** Microsoft Excel (Windows, macOS, web, mobile)
- **Free:** LibreOffice Calc, Google Sheets, WPS Office, OnlyOffice
- **Online:** Microsoft 365, Google Sheets (import/export)
- Any spreadsheet application listed above

## Quick Example

```xml
<row r="1">
    <c r="A1" t="s"><v>0</v></c>  <!-- String index 0 from sharedStrings -->
    <c r="B1"><v>42</v></c>        <!-- Numeric value -->
    <c r="C1"><f>A1+B1</f><v>42</v></c>  <!-- Formula with cached value -->
</row>
```
skilldb get file-formats-skills/XLSX (Microsoft Excel Open XML)Full skill: 224 lines
Paste into your CLAUDE.md or agent config

You are a file format specialist with deep expertise in XLSX (Excel Open XML), including the SpreadsheetML XML schema, shared strings and styles architecture, programmatic spreadsheet generation with openpyxl/xlsxwriter/Apache POI, and cross-application compatibility.

XLSX — Microsoft Excel Open XML Spreadsheet

Overview

XLSX is the default spreadsheet format for Microsoft Excel since Office 2007. It replaced the binary XLS format with an open, XML-based structure standardized as part of Office Open XML (ECMA-376 / ISO/IEC 29500). An XLSX file is a ZIP archive containing XML files that define worksheets, shared strings, styles, formulas, charts, and relationships. It is the most widely used spreadsheet format in business, finance, and data analysis.

Core Philosophy

XLSX is the standard spreadsheet format for business and data analysis, supported by Microsoft Excel, Google Sheets, LibreOffice Calc, and virtually every data tool and programming language. As an Office Open XML format, XLSX stores worksheets, formulas, formatting, charts, and pivot tables in a ZIP archive of XML files, making it inspectable and programmatically manipulable.

XLSX occupies the middle ground between CSV's simplicity and database systems' power. It preserves data types, formulas, cell formatting, charts, and multiple worksheets — features CSV lacks — while remaining accessible to non-technical users who work primarily in spreadsheets. For sharing data that includes formatting, derived calculations, or visual presentation, XLSX is the most widely compatible choice.

For programmatic spreadsheet generation, leverage XLSX's ZIP+XML structure through libraries like openpyxl (Python), SheetJS (JavaScript), Apache POI (Java), or EPPlus (.NET). These tools can create sophisticated workbooks with formulas, conditional formatting, and charts without requiring Excel. When your use case is pure data processing with no need for formatting, prefer CSV or Parquet — they are simpler, faster, and more portable.

Technical Specifications

  • File extension: .xlsx
  • MIME type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
  • Standard: ECMA-376 / ISO/IEC 29500 (SpreadsheetML)
  • Magic bytes: PK (ZIP signature)
  • Max rows: 1,048,576 (2^20)
  • Max columns: 16,384 (A to XFD)
  • Max characters per cell: 32,767
  • Max worksheets: Limited by memory

Internal Structure

[Content_Types].xml
_rels/.rels
xl/
    workbook.xml               — Workbook structure, sheet references
    worksheets/sheet1.xml      — Cell data, formulas per sheet
    sharedStrings.xml          — Shared string table (deduplication)
    styles.xml                 — Cell formatting, number formats, fonts
    theme/theme1.xml           — Color and font themes
    charts/                    — Chart definitions (DrawingML)
    pivotTables/               — Pivot table definitions
    tables/                    — Structured table definitions
    _rels/workbook.xml.rels    — Part relationships

Cell data in sheet1.xml uses row/column references:

<row r="1">
    <c r="A1" t="s"><v>0</v></c>  <!-- String index 0 from sharedStrings -->
    <c r="B1"><v>42</v></c>        <!-- Numeric value -->
    <c r="C1"><f>A1+B1</f><v>42</v></c>  <!-- Formula with cached value -->
</row>

How to Work With It

Opening

  • Native: Microsoft Excel (Windows, macOS, web, mobile)
  • Free: LibreOffice Calc, Google Sheets, WPS Office, OnlyOffice
  • Online: Microsoft 365, Google Sheets (import/export)

Creating

  • Any spreadsheet application listed above
  • Programmatically:
    • Python: openpyxl (read/write), xlsxwriter (write-only, fast), pandas with openpyxl engine
    • Java: Apache POI (XSSFWorkbook)
    • .NET: EPPlus, ClosedXML, DocumentFormat.OpenXml
    • Node.js: exceljs, xlsx (SheetJS)
    • PHP: PhpSpreadsheet (successor to PHPExcel)
    • Go: excelize
    • Ruby: axlsx, roo (reading)

Parsing / Reading

import openpyxl
wb = openpyxl.load_workbook('data.xlsx')
ws = wb.active
for row in ws.iter_rows(values_only=True):
    print(row)

# Or with pandas:
import pandas as pd
df = pd.read_excel('data.xlsx', sheet_name='Sheet1')

Converting

  • To CSV: Excel, LibreOffice, pandas, or xlsx2csv
  • To PDF: Excel print-to-PDF, LibreOffice headless
  • To JSON: pandas + to_json(), SheetJS
  • To ODS: LibreOffice, or Save As in Excel
  • From CSV: Open in Excel/LibreOffice and save; pandas read then write

Common Use Cases

  • Financial modeling and budgeting
  • Business reporting and dashboards
  • Data collection and entry forms
  • Inventory and project management
  • Scientific data recording and basic analysis
  • HR, payroll, and administrative tracking
  • Data interchange between business systems
  • Pivot tables and data summarization

Pros & Cons

Pros

  • Industry standard for business data exchange
  • Open standard (ISO/IEC 29500) despite Microsoft origins
  • Rich feature set: formulas, charts, pivot tables, conditional formatting, data validation
  • Excellent programmatic library support across all major languages
  • ZIP-based structure allows inspection and manipulation
  • Strong backward and forward compatibility within Excel versions
  • Supports very large datasets (1M+ rows)

Cons

  • Not a database — lacks relational integrity, concurrent access, or ACID properties
  • Formula recalculation can be slow with very large or complex sheets
  • Rendering differences between Excel and alternative applications
  • Easy to introduce errors in large spreadsheets (no type safety)
  • Macro-enabled variant (.xlsm) carries security risks
  • File size grows quickly with formatting and embedded objects
  • Merging/diffing spreadsheets is difficult (not text-friendly for version control)

Compatibility

PlatformApplications
WindowsExcel, LibreOffice, WPS Office
macOSExcel, LibreOffice, Numbers (import/export)
LinuxLibreOffice, OnlyOffice, WPS Office
WebMicrosoft 365, Google Sheets
MobileExcel, Google Sheets, Numbers

Formulas are generally compatible across applications; charts, pivot tables, and advanced formatting may render differently outside Excel.

Practical Usage

Generate a formatted XLSX report with openpyxl

from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment

wb = Workbook()
ws = wb.active
ws.title = "Sales Report"

# Header row with styling
headers = ["Product", "Q1", "Q2", "Q3", "Q4", "Total"]
header_fill = PatternFill(start_color="4472C4", end_color="4472C4", fill_type="solid")
for col, header in enumerate(headers, 1):
    cell = ws.cell(row=1, column=col, value=header)
    cell.font = Font(bold=True, color="FFFFFF")
    cell.fill = header_fill

# Data with formula
data = [("Widget A", 150, 200, 180, 220), ("Widget B", 300, 280, 310, 350)]
for row_idx, (name, *quarters) in enumerate(data, 2):
    ws.cell(row=row_idx, column=1, value=name)
    for q, val in enumerate(quarters, 2):
        ws.cell(row=row_idx, column=q, value=val)
    ws.cell(row=row_idx, column=6).value = f"=SUM(B{row_idx}:E{row_idx})"

wb.save("sales_report.xlsx")

Convert XLSX to CSV and JSON with pandas

import pandas as pd

df = pd.read_excel("data.xlsx", sheet_name="Sheet1")

# To CSV
df.to_csv("data.csv", index=False)

# To JSON (records orientation for API consumption)
df.to_json("data.json", orient="records", indent=2)

# To multiple formats from multiple sheets
xlsx = pd.ExcelFile("multi_sheet.xlsx")
for sheet_name in xlsx.sheet_names:
    df = pd.read_excel(xlsx, sheet_name=sheet_name)
    df.to_csv(f"{sheet_name}.csv", index=False)

Inspect XLSX internal XML structure directly

# XLSX is just a ZIP file — inspect its contents
unzip -l report.xlsx

# Extract and view the first worksheet's XML
unzip -p report.xlsx xl/worksheets/sheet1.xml | xmllint --format - | head -50

# Check shared strings (all text content is deduplicated here)
unzip -p report.xlsx xl/sharedStrings.xml | xmllint --format -

Anti-Patterns

Using XLSX as a database substitute for multi-user concurrent access. XLSX has no locking, no ACID transactions, and no concurrent write support. Use SQLite, PostgreSQL, or even a shared Google Sheet for multi-user scenarios; export to XLSX only for reporting.

Reading an entire large XLSX into memory when you only need a subset of rows. Loading a million-row XLSX with openpyxl in default mode consumes gigabytes of RAM. Use openpyxl.load_workbook(read_only=True) or pandas.read_excel(usecols=..., nrows=...) to stream or limit what you read.

Storing machine-generated data in XLSX with heavy formatting. Cell styles, conditional formatting, and themes dramatically increase file size and generation time. For data pipelines, use xlsxwriter in constant_memory mode or export to CSV/Parquet instead; add formatting only for human-facing reports.

Committing XLSX files to Git for version control. XLSX is a binary ZIP file that cannot be meaningfully diffed or merged. Store the source data in CSV, JSON, or a database, and generate XLSX as a build artifact. If XLSX must be tracked, consider tools like xlsx-diff for review.

Trusting formula results cached in the XML without recalculation. XLSX stores the last-computed value of each formula, but this value may be stale if the file was saved without recalculation. When reading programmatically, either recalculate formulas (e.g., with xlcalc or by opening in Excel) or read only raw data cells.

Related Formats

  • XLS (.xls): Legacy binary Excel format
  • XLSM (.xlsm): Macro-enabled XLSX
  • XLTX/.XLTM: Template variants
  • XLSB (.xlsb): Binary XLSX (faster, smaller, but not XML-based)
  • ODS (.ods): OpenDocument Spreadsheet
  • CSV (.csv): Plain text tabular data (no formatting or formulas)
  • Numbers (.numbers): Apple's spreadsheet format

Install this skill directly: skilldb add file-formats-skills

Get CLI access →