Jupyter Notebook
Jupyter Notebook format (.ipynb) — a JSON-based document format combining live code, equations, visualizations, and narrative text in computational notebooks.
You are a file format specialist with deep expertise in the Jupyter Notebook (.ipynb) format. You understand the JSON-based nbformat schema, cell types (code, markdown, raw), output types (execute_result, display_data, stream, error), kernel specifications, and metadata structure. You can advise on creating, executing, converting (nbconvert, Quarto), version-controlling (nbstripout, jupytext, nbdime), linting (nbqa), and parameterizing (papermill) notebooks for data science, research, and education workflows. ## Key Points - **code**: Executable code cells with optional outputs. - **markdown**: Rich text using Markdown (including LaTeX math: `$E=mc^2$`). - **raw**: Unprocessed text, passed through without rendering or execution. - **execute_result**: Return value of the last expression in a cell. - **display_data**: Rich display output (HTML, images, plots, widgets). - **stream**: stdout/stderr text output. - **error**: Exception traceback information. - **nbformat 4** (current): Introduced in Jupyter 4.0, the standard since 2015. - **nbformat 5** (in development): Adds cell IDs for better diffing and collaboration. - **Data science**: Exploratory data analysis, feature engineering, visualization. - **Machine learning**: Model training experiments, hyperparameter tuning, evaluation. - **Scientific research**: Reproducible research papers, simulation analysis.
skilldb get file-formats-skills/Jupyter NotebookFull skill: 314 linesYou are a file format specialist with deep expertise in the Jupyter Notebook (.ipynb) format. You understand the JSON-based nbformat schema, cell types (code, markdown, raw), output types (execute_result, display_data, stream, error), kernel specifications, and metadata structure. You can advise on creating, executing, converting (nbconvert, Quarto), version-controlling (nbstripout, jupytext, nbdime), linting (nbqa), and parameterizing (papermill) notebooks for data science, research, and education workflows.
Jupyter Notebook — Interactive Computing Format
Overview
Jupyter Notebook (.ipynb) is a JSON-based document format that combines executable code, rich text (Markdown), mathematical equations (LaTeX), visualizations, and structured output in a single file. Created as part of Project Jupyter (a spinoff from IPython in 2014), the format supports over 100 programming languages through kernel plugins. Notebooks have become the standard tool for data science, machine learning experimentation, scientific research, and technical education.
Core Philosophy
Jupyter notebooks (.ipynb) embody a philosophy of literate programming: code, documentation, visualizations, and results should live together in a single document that tells a coherent story. A notebook interleaves executable code cells with markdown narrative, and preserves the output (text, tables, charts, images) alongside the code that produced it. This makes notebooks powerful tools for data exploration, analysis communication, and educational materials.
Under the hood, an .ipynb file is JSON containing an ordered list of cells, each with a type (code, markdown, raw), source content, and optional outputs. This JSON structure means notebooks are technically version-control compatible but practically difficult to diff and merge — cell outputs (especially images and large data frames) create noisy diffs that obscure code changes. Use tools like nbstripout to remove outputs before committing, or adopt JupyText for a more git-friendly notebook workflow.
Notebooks excel at exploration and communication but struggle as production code. The execution order problem (cells can be run out of order, creating hidden state), the difficulty of testing, and the merge conflict challenges make notebooks problematic for production pipelines. Use notebooks for prototyping, analysis, and communication; refactor validated logic into Python modules and packages for production use.
Technical Specifications
File Structure
An .ipynb file is a JSON document following the nbformat schema:
{
"nbformat": 4,
"nbformat_minor": 5,
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.11.0",
"mimetype": "text/x-python",
"file_extension": ".py"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": ["# Analysis Title\n", "Description of the analysis."]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"source": ["import pandas as pd\n", "df = pd.read_csv('data.csv')\n", "df.head()"],
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": "<table>...</table>",
"text/plain": " col1 col2\n0 1 2"
},
"metadata": {},
"execution_count": 1
}
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"source": ["import matplotlib.pyplot as plt\n", "df.plot()\n", "plt.show()"],
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUg...",
"text/plain": "<Figure size 640x480>"
},
"metadata": {}
}
]
},
{
"cell_type": "raw",
"metadata": {},
"source": ["Raw text, not rendered as markdown or executed."]
}
]
}
Cell Types
- code: Executable code cells with optional outputs.
- markdown: Rich text using Markdown (including LaTeX math:
$E=mc^2$). - raw: Unprocessed text, passed through without rendering or execution.
Output Types
- execute_result: Return value of the last expression in a cell.
- display_data: Rich display output (HTML, images, plots, widgets).
- stream: stdout/stderr text output.
- error: Exception traceback information.
Outputs support multiple MIME types (text/plain, text/html, image/png, application/json, etc.) — the frontend picks the richest format it can render.
Versioning
- nbformat 4 (current): Introduced in Jupyter 4.0, the standard since 2015.
- nbformat 5 (in development): Adds cell IDs for better diffing and collaboration.
How to Work With It
Running Notebooks
# Classic Jupyter Notebook
pip install notebook
jupyter notebook # launches browser interface
# JupyterLab (modern interface)
pip install jupyterlab
jupyter lab
# VS Code
# Install "Jupyter" extension — full notebook support built in
# Google Colab
# Upload .ipynb or open from Google Drive — free GPU/TPU access
Programmatic Execution
# Execute notebook from command line (nbconvert)
jupyter nbconvert --to notebook --execute input.ipynb --output output.ipynb
# Execute with papermill (parameterized execution)
pip install papermill
papermill input.ipynb output.ipynb -p param1 value1 -p param2 value2
# Execute programmatically
import nbformat
from nbclient import NotebookClient
nb = nbformat.read("notebook.ipynb", as_version=4)
client = NotebookClient(nb, timeout=600, kernel_name="python3")
client.execute()
nbformat.write(nb, "executed.ipynb")
Converting
# Convert to various formats
jupyter nbconvert --to html notebook.ipynb # static HTML
jupyter nbconvert --to pdf notebook.ipynb # PDF (requires LaTeX)
jupyter nbconvert --to markdown notebook.ipynb # Markdown
jupyter nbconvert --to script notebook.ipynb # .py script
jupyter nbconvert --to slides notebook.ipynb # Reveal.js slides
jupyter nbconvert --to latex notebook.ipynb # LaTeX
# Quarto (modern alternative for publishing)
quarto render notebook.ipynb --to html
quarto render notebook.ipynb --to pdf
Converting to/from Scripts
# Notebook to Python script
jupyter nbconvert --to script notebook.ipynb
# Python script to notebook (jupytext)
pip install jupytext
jupytext --to notebook script.py # .py -> .ipynb
jupytext --to py:percent notebook.ipynb # .ipynb -> .py (percent format)
jupytext --set-formats ipynb,py:percent notebook.ipynb # pair formats
# Jupytext percent format (.py file that round-trips with .ipynb)
# ---
# jupyter:
# kernelspec:
# name: python3
# ---
# %% [markdown]
# # Analysis Title
# %%
import pandas as pd
df = pd.read_csv('data.csv')
# %%
df.describe()
Version Control Best Practices
Notebooks are notoriously difficult to version control because outputs (especially images) create huge diffs:
# Strip outputs before committing
jupyter nbconvert --clear-output --inplace notebook.ipynb
# Or use nbstripout (automatic via git filter)
pip install nbstripout
nbstripout --install # sets up git filter globally
nbstripout --install --attributes .gitattributes # per-repo
# Use jupytext to version control .py files instead of .ipynb
jupytext --set-formats ipynb,py:percent notebook.ipynb
# ReviewNB — GitHub app for reviewing notebook diffs
# nbdime — diff and merge tool for notebooks
pip install nbdime
nbdime config-git --enable # better git diff/merge for .ipynb
nbdiff notebook_v1.ipynb notebook_v2.ipynb
nbmerge base.ipynb local.ipynb remote.ipynb
Validation and Linting
# Validate notebook format
python -c "import nbformat; nbformat.read('nb.ipynb', as_version=4)"
# Lint code cells
pip install nbqa
nbqa ruff notebook.ipynb # run ruff linter on code cells
nbqa black notebook.ipynb # format code cells with black
nbqa mypy notebook.ipynb # type checking
nbqa isort notebook.ipynb # sort imports
Common Use Cases
- Data science: Exploratory data analysis, feature engineering, visualization.
- Machine learning: Model training experiments, hyperparameter tuning, evaluation.
- Scientific research: Reproducible research papers, simulation analysis.
- Education: Interactive tutorials, course materials, coding exercises.
- Reporting: Automated reports with live data (papermill + nbconvert).
- Documentation: Runnable API examples, library tutorials.
- Prototyping: Quick iteration on algorithms before productionizing.
Pros & Cons
Pros
- Combines code, output, and documentation in a single living document.
- Rich output — inline plots, HTML tables, interactive widgets, images.
- Language-agnostic — kernels for Python, R, Julia, Scala, JavaScript, and more.
- Excellent for exploratory work — run cells interactively, iterate fast.
- Cloud-hosted options — Colab, Kaggle, SageMaker provide free compute.
- Widely adopted in data science and ML communities.
- Extensible — widgets (ipywidgets), magic commands, custom output.
Cons
- Version control is painful — JSON with embedded base64 images creates noisy diffs.
- Hidden state — cell execution order can create unreproducible results.
- Not suitable for production code — no modularity, no testing framework.
- Large file sizes when outputs (images, dataframes) are included.
- Merge conflicts are nearly impossible to resolve manually.
- Encourages poor software engineering practices (global state, no functions).
- Security risk — notebooks can contain and execute arbitrary code.
- Collaboration is difficult without specialized tools (Google Colab, ReviewNB).
Compatibility
| Interface | Description |
|---|---|
| Jupyter Notebook | Classic web interface |
| JupyterLab | Modern IDE-like interface |
| VS Code | Full notebook support via extension |
| Google Colab | Free cloud notebooks with GPU |
| Kaggle Kernels | Competition-focused cloud notebooks |
| AWS SageMaker | Enterprise ML notebook environment |
| Databricks | Spark-native notebook environment |
| Deepnote | Collaborative cloud notebooks |
| Observable | JavaScript notebooks (different format) |
| nteract | Desktop notebook application |
MIME type: application/x-ipynb+json. File extension: .ipynb.
Related Formats
- R Markdown (.Rmd): R ecosystem equivalent — text-based, better for version control.
- Quarto (.qmd): Next-gen publishing format — supports Python, R, Julia notebooks.
- Jupytext (.py, .md): Text-based representations that sync with .ipynb.
- Mathematica (.nb): Wolfram's proprietary notebook format.
- MATLAB Live Scripts (.mlx): MATLAB's notebook-like format.
- Observable (.ojs): JavaScript reactive notebook format.
- Marimo (.py): Python notebooks stored as pure Python scripts — reproducible by design.
Practical Usage
- Strip outputs before committing: Install
nbstripoutand configure it as a git filter (nbstripout --install --attributes .gitattributes) to automatically remove outputs from notebooks before every commit. This eliminates noisy diffs from embedded images and data. - Parameterized execution with papermill: Use
papermill input.ipynb output.ipynb -p date 2025-03-15 -p threshold 0.95to execute notebooks with different parameters. This enables automated reporting pipelines and batch experimentation. - Jupytext for version control: Pair
.ipynbfiles with.py(percent format) using jupytext. Version control the.pyfile for clean diffs, and regenerate the.ipynbwith outputs as needed. - Lint and format code cells: Use
nbqa black notebook.ipynbandnbqa ruff notebook.ipynbto apply standard Python formatting and linting to code cells without affecting markdown cells. - Convert to production code: Once exploratory work is complete, extract code cells into proper Python modules with functions, classes, and tests. Use
jupyter nbconvert --to scriptas a starting point, then refactor into a package structure.
Anti-Patterns
- Running cells out of order and relying on hidden state: Notebooks maintain kernel state across cells. Running cells out of order or re-running individual cells creates state that cannot be reproduced by running the notebook top-to-bottom. Always restart and run all before sharing.
- Committing notebooks with large embedded outputs: Base64-encoded images, large DataFrame HTML tables, and widget state in outputs create massive, unreadable diffs. Strip outputs before committing or use nbstripout.
- Using notebooks as production code: Notebooks lack modularity, are difficult to test, and encourage global state. Extract reusable logic into Python packages and use notebooks only for exploration, prototyping, and presentation.
- Storing credentials in notebook cells: Code cells with API keys, passwords, or tokens persist in the JSON file and version history. Use environment variables or credential files loaded at runtime, never hardcoded values.
- Ignoring the kernel specification: Notebooks are tied to a specific kernel. Sharing a notebook that specifies
python3.9with a recipient who has onlypython3.11causes kernel-not-found errors. Document the required environment or use arequirements.txt/environment.ymlalongside the notebook.
Install this skill directly: skilldb add file-formats-skills
Related Skills
3MF 3D Manufacturing Format
The 3MF file format — the modern replacement for STL in 3D printing, supporting colors, materials, multi-object assemblies, and precise manufacturing data in a single package.
7-Zip Compressed Archive
The 7z archive format — open-source high-ratio compression using LZMA2, with strong AES-256 encryption, solid archives, and multi-threading support.
AAC (Advanced Audio Coding)
A lossy audio codec standardized as part of MPEG-2 and MPEG-4, designed to supersede MP3 with better quality at equivalent or lower bitrates.
AC3 (Dolby Digital)
Dolby's surround sound audio codec used in cinema, DVD, Blu-ray, and broadcast television for multichannel 5.1 audio delivery.
AI Adobe Illustrator Format
AI is Adobe Illustrator's native vector graphics file format, used for
AIFF (Audio Interchange File Format)
Apple's uncompressed audio format storing raw PCM data, serving as the Mac equivalent of WAV for professional audio production.