XLSX (Microsoft Excel Open XML)
The modern Microsoft Excel spreadsheet format based on Open XML, storing worksheets, formulas, charts, pivot tables, and formatting in a ZIP-compressed package of XML files.
You are a file format specialist with deep expertise in XLSX (Excel Open XML), including the SpreadsheetML XML schema, shared strings and styles architecture, programmatic spreadsheet generation with openpyxl/xlsxwriter/Apache POI, and cross-application compatibility.
## Key Points
- **File extension:** `.xlsx`
- **MIME type:** `application/vnd.openxmlformats-officedocument.spreadsheetml.sheet`
- **Standard:** ECMA-376 / ISO/IEC 29500 (SpreadsheetML)
- **Magic bytes:** PK (ZIP signature)
- **Max rows:** 1,048,576 (2^20)
- **Max columns:** 16,384 (A to XFD)
- **Max characters per cell:** 32,767
- **Max worksheets:** Limited by memory
- **Native:** Microsoft Excel (Windows, macOS, web, mobile)
- **Free:** LibreOffice Calc, Google Sheets, WPS Office, OnlyOffice
- **Online:** Microsoft 365, Google Sheets (import/export)
- Any spreadsheet application listed above
## Quick Example
```xml
<row r="1">
<c r="A1" t="s"><v>0</v></c> <!-- String index 0 from sharedStrings -->
<c r="B1"><v>42</v></c> <!-- Numeric value -->
<c r="C1"><f>A1+B1</f><v>42</v></c> <!-- Formula with cached value -->
</row>
```skilldb get file-formats-skills/XLSX (Microsoft Excel Open XML)Full skill: 224 linesYou are a file format specialist with deep expertise in XLSX (Excel Open XML), including the SpreadsheetML XML schema, shared strings and styles architecture, programmatic spreadsheet generation with openpyxl/xlsxwriter/Apache POI, and cross-application compatibility.
XLSX — Microsoft Excel Open XML Spreadsheet
Overview
XLSX is the default spreadsheet format for Microsoft Excel since Office 2007. It replaced the binary XLS format with an open, XML-based structure standardized as part of Office Open XML (ECMA-376 / ISO/IEC 29500). An XLSX file is a ZIP archive containing XML files that define worksheets, shared strings, styles, formulas, charts, and relationships. It is the most widely used spreadsheet format in business, finance, and data analysis.
Core Philosophy
XLSX is the standard spreadsheet format for business and data analysis, supported by Microsoft Excel, Google Sheets, LibreOffice Calc, and virtually every data tool and programming language. As an Office Open XML format, XLSX stores worksheets, formulas, formatting, charts, and pivot tables in a ZIP archive of XML files, making it inspectable and programmatically manipulable.
XLSX occupies the middle ground between CSV's simplicity and database systems' power. It preserves data types, formulas, cell formatting, charts, and multiple worksheets — features CSV lacks — while remaining accessible to non-technical users who work primarily in spreadsheets. For sharing data that includes formatting, derived calculations, or visual presentation, XLSX is the most widely compatible choice.
For programmatic spreadsheet generation, leverage XLSX's ZIP+XML structure through libraries like openpyxl (Python), SheetJS (JavaScript), Apache POI (Java), or EPPlus (.NET). These tools can create sophisticated workbooks with formulas, conditional formatting, and charts without requiring Excel. When your use case is pure data processing with no need for formatting, prefer CSV or Parquet — they are simpler, faster, and more portable.
Technical Specifications
- File extension:
.xlsx - MIME type:
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet - Standard: ECMA-376 / ISO/IEC 29500 (SpreadsheetML)
- Magic bytes: PK (ZIP signature)
- Max rows: 1,048,576 (2^20)
- Max columns: 16,384 (A to XFD)
- Max characters per cell: 32,767
- Max worksheets: Limited by memory
Internal Structure
[Content_Types].xml
_rels/.rels
xl/
workbook.xml — Workbook structure, sheet references
worksheets/sheet1.xml — Cell data, formulas per sheet
sharedStrings.xml — Shared string table (deduplication)
styles.xml — Cell formatting, number formats, fonts
theme/theme1.xml — Color and font themes
charts/ — Chart definitions (DrawingML)
pivotTables/ — Pivot table definitions
tables/ — Structured table definitions
_rels/workbook.xml.rels — Part relationships
Cell data in sheet1.xml uses row/column references:
<row r="1">
<c r="A1" t="s"><v>0</v></c> <!-- String index 0 from sharedStrings -->
<c r="B1"><v>42</v></c> <!-- Numeric value -->
<c r="C1"><f>A1+B1</f><v>42</v></c> <!-- Formula with cached value -->
</row>
How to Work With It
Opening
- Native: Microsoft Excel (Windows, macOS, web, mobile)
- Free: LibreOffice Calc, Google Sheets, WPS Office, OnlyOffice
- Online: Microsoft 365, Google Sheets (import/export)
Creating
- Any spreadsheet application listed above
- Programmatically:
- Python:
openpyxl(read/write),xlsxwriter(write-only, fast),pandaswith openpyxl engine - Java: Apache POI (
XSSFWorkbook) - .NET: EPPlus, ClosedXML,
DocumentFormat.OpenXml - Node.js:
exceljs,xlsx(SheetJS) - PHP: PhpSpreadsheet (successor to PHPExcel)
- Go:
excelize - Ruby:
axlsx,roo(reading)
- Python:
Parsing / Reading
import openpyxl
wb = openpyxl.load_workbook('data.xlsx')
ws = wb.active
for row in ws.iter_rows(values_only=True):
print(row)
# Or with pandas:
import pandas as pd
df = pd.read_excel('data.xlsx', sheet_name='Sheet1')
Converting
- To CSV: Excel, LibreOffice,
pandas, orxlsx2csv - To PDF: Excel print-to-PDF, LibreOffice headless
- To JSON:
pandas+to_json(), SheetJS - To ODS: LibreOffice, or Save As in Excel
- From CSV: Open in Excel/LibreOffice and save;
pandasread then write
Common Use Cases
- Financial modeling and budgeting
- Business reporting and dashboards
- Data collection and entry forms
- Inventory and project management
- Scientific data recording and basic analysis
- HR, payroll, and administrative tracking
- Data interchange between business systems
- Pivot tables and data summarization
Pros & Cons
Pros
- Industry standard for business data exchange
- Open standard (ISO/IEC 29500) despite Microsoft origins
- Rich feature set: formulas, charts, pivot tables, conditional formatting, data validation
- Excellent programmatic library support across all major languages
- ZIP-based structure allows inspection and manipulation
- Strong backward and forward compatibility within Excel versions
- Supports very large datasets (1M+ rows)
Cons
- Not a database — lacks relational integrity, concurrent access, or ACID properties
- Formula recalculation can be slow with very large or complex sheets
- Rendering differences between Excel and alternative applications
- Easy to introduce errors in large spreadsheets (no type safety)
- Macro-enabled variant (.xlsm) carries security risks
- File size grows quickly with formatting and embedded objects
- Merging/diffing spreadsheets is difficult (not text-friendly for version control)
Compatibility
| Platform | Applications |
|---|---|
| Windows | Excel, LibreOffice, WPS Office |
| macOS | Excel, LibreOffice, Numbers (import/export) |
| Linux | LibreOffice, OnlyOffice, WPS Office |
| Web | Microsoft 365, Google Sheets |
| Mobile | Excel, Google Sheets, Numbers |
Formulas are generally compatible across applications; charts, pivot tables, and advanced formatting may render differently outside Excel.
Practical Usage
Generate a formatted XLSX report with openpyxl
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment
wb = Workbook()
ws = wb.active
ws.title = "Sales Report"
# Header row with styling
headers = ["Product", "Q1", "Q2", "Q3", "Q4", "Total"]
header_fill = PatternFill(start_color="4472C4", end_color="4472C4", fill_type="solid")
for col, header in enumerate(headers, 1):
cell = ws.cell(row=1, column=col, value=header)
cell.font = Font(bold=True, color="FFFFFF")
cell.fill = header_fill
# Data with formula
data = [("Widget A", 150, 200, 180, 220), ("Widget B", 300, 280, 310, 350)]
for row_idx, (name, *quarters) in enumerate(data, 2):
ws.cell(row=row_idx, column=1, value=name)
for q, val in enumerate(quarters, 2):
ws.cell(row=row_idx, column=q, value=val)
ws.cell(row=row_idx, column=6).value = f"=SUM(B{row_idx}:E{row_idx})"
wb.save("sales_report.xlsx")
Convert XLSX to CSV and JSON with pandas
import pandas as pd
df = pd.read_excel("data.xlsx", sheet_name="Sheet1")
# To CSV
df.to_csv("data.csv", index=False)
# To JSON (records orientation for API consumption)
df.to_json("data.json", orient="records", indent=2)
# To multiple formats from multiple sheets
xlsx = pd.ExcelFile("multi_sheet.xlsx")
for sheet_name in xlsx.sheet_names:
df = pd.read_excel(xlsx, sheet_name=sheet_name)
df.to_csv(f"{sheet_name}.csv", index=False)
Inspect XLSX internal XML structure directly
# XLSX is just a ZIP file — inspect its contents
unzip -l report.xlsx
# Extract and view the first worksheet's XML
unzip -p report.xlsx xl/worksheets/sheet1.xml | xmllint --format - | head -50
# Check shared strings (all text content is deduplicated here)
unzip -p report.xlsx xl/sharedStrings.xml | xmllint --format -
Anti-Patterns
Using XLSX as a database substitute for multi-user concurrent access. XLSX has no locking, no ACID transactions, and no concurrent write support. Use SQLite, PostgreSQL, or even a shared Google Sheet for multi-user scenarios; export to XLSX only for reporting.
Reading an entire large XLSX into memory when you only need a subset of rows. Loading a million-row XLSX with openpyxl in default mode consumes gigabytes of RAM. Use openpyxl.load_workbook(read_only=True) or pandas.read_excel(usecols=..., nrows=...) to stream or limit what you read.
Storing machine-generated data in XLSX with heavy formatting. Cell styles, conditional formatting, and themes dramatically increase file size and generation time. For data pipelines, use xlsxwriter in constant_memory mode or export to CSV/Parquet instead; add formatting only for human-facing reports.
Committing XLSX files to Git for version control. XLSX is a binary ZIP file that cannot be meaningfully diffed or merged. Store the source data in CSV, JSON, or a database, and generate XLSX as a build artifact. If XLSX must be tracked, consider tools like xlsx-diff for review.
Trusting formula results cached in the XML without recalculation. XLSX stores the last-computed value of each formula, but this value may be stale if the file was saved without recalculation. When reading programmatically, either recalculate formulas (e.g., with xlcalc or by opening in Excel) or read only raw data cells.
Related Formats
- XLS (.xls): Legacy binary Excel format
- XLSM (.xlsm): Macro-enabled XLSX
- XLTX/.XLTM: Template variants
- XLSB (.xlsb): Binary XLSX (faster, smaller, but not XML-based)
- ODS (.ods): OpenDocument Spreadsheet
- CSV (.csv): Plain text tabular data (no formatting or formulas)
- Numbers (.numbers): Apple's spreadsheet format
Install this skill directly: skilldb add file-formats-skills
Related Skills
3MF 3D Manufacturing Format
The 3MF file format — the modern replacement for STL in 3D printing, supporting colors, materials, multi-object assemblies, and precise manufacturing data in a single package.
7-Zip Compressed Archive
The 7z archive format — open-source high-ratio compression using LZMA2, with strong AES-256 encryption, solid archives, and multi-threading support.
AAC (Advanced Audio Coding)
A lossy audio codec standardized as part of MPEG-2 and MPEG-4, designed to supersede MP3 with better quality at equivalent or lower bitrates.
AC3 (Dolby Digital)
Dolby's surround sound audio codec used in cinema, DVD, Blu-ray, and broadcast television for multichannel 5.1 audio delivery.
AI Adobe Illustrator Format
AI is Adobe Illustrator's native vector graphics file format, used for
AIFF (Audio Interchange File Format)
Apple's uncompressed audio format storing raw PCM data, serving as the Mac equivalent of WAV for professional audio production.