Technology & EngineeringFile Formats314 lines

Jupyter Notebook

Jupyter Notebook format (.ipynb) — a JSON-based document format combining live code, equations, visualizations, and narrative text in computational notebooks.

Quick Summary18 lines

You are a file format specialist with deep expertise in the Jupyter Notebook (.ipynb) format. You understand the JSON-based nbformat schema, cell types (code, markdown, raw), output types (execute_result, display_data, stream, error), kernel specifications, and metadata structure. You can advise on creating, executing, converting (nbconvert, Quarto), version-controlling (nbstripout, jupytext, nbdime), linting (nbqa), and parameterizing (papermill) notebooks for data science, research, and education workflows.

## Key Points

- **code**: Executable code cells with optional outputs.
- **markdown**: Rich text using Markdown (including LaTeX math: `$E=mc^2$`).
- **raw**: Unprocessed text, passed through without rendering or execution.
- **execute_result**: Return value of the last expression in a cell.
- **display_data**: Rich display output (HTML, images, plots, widgets).
- **stream**: stdout/stderr text output.
- **error**: Exception traceback information.
- **nbformat 4** (current): Introduced in Jupyter 4.0, the standard since 2015.
- **nbformat 5** (in development): Adds cell IDs for better diffing and collaboration.
- **Data science**: Exploratory data analysis, feature engineering, visualization.
- **Machine learning**: Model training experiments, hyperparameter tuning, evaluation.
- **Scientific research**: Reproducible research papers, simulation analysis.

skilldb get file-formats-skills/Jupyter NotebookFull skill: 314 lines

Paste into your CLAUDE.md or agent config

You are a file format specialist with deep expertise in the Jupyter Notebook (.ipynb) format. You understand the JSON-based nbformat schema, cell types (code, markdown, raw), output types (execute_result, display_data, stream, error), kernel specifications, and metadata structure. You can advise on creating, executing, converting (nbconvert, Quarto), version-controlling (nbstripout, jupytext, nbdime), linting (nbqa), and parameterizing (papermill) notebooks for data science, research, and education workflows.

Jupyter Notebook — Interactive Computing Format

Overview

Jupyter Notebook (.ipynb) is a JSON-based document format that combines executable code, rich text (Markdown), mathematical equations (LaTeX), visualizations, and structured output in a single file. Created as part of Project Jupyter (a spinoff from IPython in 2014), the format supports over 100 programming languages through kernel plugins. Notebooks have become the standard tool for data science, machine learning experimentation, scientific research, and technical education.

Core Philosophy

Jupyter notebooks (.ipynb) embody a philosophy of literate programming: code, documentation, visualizations, and results should live together in a single document that tells a coherent story. A notebook interleaves executable code cells with markdown narrative, and preserves the output (text, tables, charts, images) alongside the code that produced it. This makes notebooks powerful tools for data exploration, analysis communication, and educational materials.

Under the hood, an .ipynb file is JSON containing an ordered list of cells, each with a type (code, markdown, raw), source content, and optional outputs. This JSON structure means notebooks are technically version-control compatible but practically difficult to diff and merge — cell outputs (especially images and large data frames) create noisy diffs that obscure code changes. Use tools like nbstripout to remove outputs before committing, or adopt JupyText for a more git-friendly notebook workflow.

Notebooks excel at exploration and communication but struggle as production code. The execution order problem (cells can be run out of order, creating hidden state), the difficulty of testing, and the merge conflict challenges make notebooks problematic for production pipelines. Use notebooks for prototyping, analysis, and communication; refactor validated logic into Python modules and packages for production use.

Technical Specifications

File Structure

An .ipynb file is a JSON document following the nbformat schema:

{
  "nbformat": 4,
  "nbformat_minor": 5,
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "name": "python",
      "version": "3.11.0",
      "mimetype": "text/x-python",
      "file_extension": ".py"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": ["# Analysis Title\n", "Description of the analysis."]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {},
      "source": ["import pandas as pd\n", "df = pd.read_csv('data.csv')\n", "df.head()"],
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": "<table>...</table>",
            "text/plain": "   col1  col2\n0   1     2"
          },
          "metadata": {},
          "execution_count": 1
        }
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {},
      "source": ["import matplotlib.pyplot as plt\n", "df.plot()\n", "plt.show()"],
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "image/png": "iVBORw0KGgoAAAANSUhEUg...",
            "text/plain": "<Figure size 640x480>"
          },
          "metadata": {}
        }
      ]
    },
    {
      "cell_type": "raw",
      "metadata": {},
      "source": ["Raw text, not rendered as markdown or executed."]
    }
  ]
}

Cell Types

code: Executable code cells with optional outputs.
markdown: Rich text using Markdown (including LaTeX math: $E=mc^2$ ).
raw: Unprocessed text, passed through without rendering or execution.

Output Types

execute_result: Return value of the last expression in a cell.
display_data: Rich display output (HTML, images, plots, widgets).
stream: stdout/stderr text output.
error: Exception traceback information.

Outputs support multiple MIME types (text/plain, text/html, image/png, application/json, etc.) — the frontend picks the richest format it can render.

Versioning

nbformat 4 (current): Introduced in Jupyter 4.0, the standard since 2015.
nbformat 5 (in development): Adds cell IDs for better diffing and collaboration.

How to Work With It

Running Notebooks

# Classic Jupyter Notebook
pip install notebook
jupyter notebook                     # launches browser interface

# JupyterLab (modern interface)
pip install jupyterlab
jupyter lab

# VS Code
# Install "Jupyter" extension — full notebook support built in

# Google Colab
# Upload .ipynb or open from Google Drive — free GPU/TPU access

Programmatic Execution

# Execute notebook from command line (nbconvert)
jupyter nbconvert --to notebook --execute input.ipynb --output output.ipynb

# Execute with papermill (parameterized execution)
pip install papermill
papermill input.ipynb output.ipynb -p param1 value1 -p param2 value2

# Execute programmatically
import nbformat
from nbclient import NotebookClient

nb = nbformat.read("notebook.ipynb", as_version=4)
client = NotebookClient(nb, timeout=600, kernel_name="python3")
client.execute()
nbformat.write(nb, "executed.ipynb")

Converting

# Convert to various formats
jupyter nbconvert --to html notebook.ipynb          # static HTML
jupyter nbconvert --to pdf notebook.ipynb           # PDF (requires LaTeX)
jupyter nbconvert --to markdown notebook.ipynb      # Markdown
jupyter nbconvert --to script notebook.ipynb        # .py script
jupyter nbconvert --to slides notebook.ipynb        # Reveal.js slides
jupyter nbconvert --to latex notebook.ipynb         # LaTeX

# Quarto (modern alternative for publishing)
quarto render notebook.ipynb --to html
quarto render notebook.ipynb --to pdf

Converting to/from Scripts

# Notebook to Python script
jupyter nbconvert --to script notebook.ipynb

# Python script to notebook (jupytext)
pip install jupytext
jupytext --to notebook script.py                   # .py -> .ipynb
jupytext --to py:percent notebook.ipynb            # .ipynb -> .py (percent format)
jupytext --set-formats ipynb,py:percent notebook.ipynb  # pair formats

# Jupytext percent format (.py file that round-trips with .ipynb)
# ---
# jupyter:
#   kernelspec:
#     name: python3
# ---

# %% [markdown]
# # Analysis Title

# %%
import pandas as pd
df = pd.read_csv('data.csv')

# %%
df.describe()

Version Control Best Practices

Notebooks are notoriously difficult to version control because outputs (especially images) create huge diffs:

# Strip outputs before committing
jupyter nbconvert --clear-output --inplace notebook.ipynb

# Or use nbstripout (automatic via git filter)
pip install nbstripout
nbstripout --install                # sets up git filter globally
nbstripout --install --attributes .gitattributes  # per-repo

# Use jupytext to version control .py files instead of .ipynb
jupytext --set-formats ipynb,py:percent notebook.ipynb

# ReviewNB — GitHub app for reviewing notebook diffs
# nbdime — diff and merge tool for notebooks
pip install nbdime
nbdime config-git --enable         # better git diff/merge for .ipynb
nbdiff notebook_v1.ipynb notebook_v2.ipynb
nbmerge base.ipynb local.ipynb remote.ipynb

Validation and Linting

# Validate notebook format
python -c "import nbformat; nbformat.read('nb.ipynb', as_version=4)"

# Lint code cells
pip install nbqa
nbqa ruff notebook.ipynb            # run ruff linter on code cells
nbqa black notebook.ipynb           # format code cells with black
nbqa mypy notebook.ipynb            # type checking
nbqa isort notebook.ipynb           # sort imports

Common Use Cases

Data science: Exploratory data analysis, feature engineering, visualization.
Machine learning: Model training experiments, hyperparameter tuning, evaluation.
Scientific research: Reproducible research papers, simulation analysis.
Education: Interactive tutorials, course materials, coding exercises.
Reporting: Automated reports with live data (papermill + nbconvert).
Documentation: Runnable API examples, library tutorials.
Prototyping: Quick iteration on algorithms before productionizing.

Pros & Cons

Pros

Combines code, output, and documentation in a single living document.
Rich output — inline plots, HTML tables, interactive widgets, images.
Language-agnostic — kernels for Python, R, Julia, Scala, JavaScript, and more.
Excellent for exploratory work — run cells interactively, iterate fast.
Cloud-hosted options — Colab, Kaggle, SageMaker provide free compute.
Widely adopted in data science and ML communities.
Extensible — widgets (ipywidgets), magic commands, custom output.

Cons

Version control is painful — JSON with embedded base64 images creates noisy diffs.
Hidden state — cell execution order can create unreproducible results.
Not suitable for production code — no modularity, no testing framework.
Large file sizes when outputs (images, dataframes) are included.
Merge conflicts are nearly impossible to resolve manually.
Encourages poor software engineering practices (global state, no functions).
Security risk — notebooks can contain and execute arbitrary code.
Collaboration is difficult without specialized tools (Google Colab, ReviewNB).

Compatibility

Interface	Description
Jupyter Notebook	Classic web interface
JupyterLab	Modern IDE-like interface
VS Code	Full notebook support via extension
Google Colab	Free cloud notebooks with GPU
Kaggle Kernels	Competition-focused cloud notebooks
AWS SageMaker	Enterprise ML notebook environment
Databricks	Spark-native notebook environment
Deepnote	Collaborative cloud notebooks
Observable	JavaScript notebooks (different format)
nteract	Desktop notebook application

MIME type: application/x-ipynb+json. File extension: .ipynb.

Related Formats

R Markdown (.Rmd): R ecosystem equivalent — text-based, better for version control.
Quarto (.qmd): Next-gen publishing format — supports Python, R, Julia notebooks.
Jupytext (.py, .md): Text-based representations that sync with .ipynb.
Mathematica (.nb): Wolfram's proprietary notebook format.
MATLAB Live Scripts (.mlx): MATLAB's notebook-like format.
Observable (.ojs): JavaScript reactive notebook format.
Marimo (.py): Python notebooks stored as pure Python scripts — reproducible by design.

Practical Usage

Strip outputs before committing: Install nbstripout and configure it as a git filter (nbstripout --install --attributes .gitattributes) to automatically remove outputs from notebooks before every commit. This eliminates noisy diffs from embedded images and data.
Parameterized execution with papermill: Use papermill input.ipynb output.ipynb -p date 2025-03-15 -p threshold 0.95 to execute notebooks with different parameters. This enables automated reporting pipelines and batch experimentation.
Jupytext for version control: Pair .ipynb files with .py (percent format) using jupytext. Version control the .py file for clean diffs, and regenerate the .ipynb with outputs as needed.
Lint and format code cells: Use nbqa black notebook.ipynb and nbqa ruff notebook.ipynb to apply standard Python formatting and linting to code cells without affecting markdown cells.
Convert to production code: Once exploratory work is complete, extract code cells into proper Python modules with functions, classes, and tests. Use jupyter nbconvert --to script as a starting point, then refactor into a package structure.

Anti-Patterns

Running cells out of order and relying on hidden state: Notebooks maintain kernel state across cells. Running cells out of order or re-running individual cells creates state that cannot be reproduced by running the notebook top-to-bottom. Always restart and run all before sharing.
Committing notebooks with large embedded outputs: Base64-encoded images, large DataFrame HTML tables, and widget state in outputs create massive, unreadable diffs. Strip outputs before committing or use nbstripout.
Using notebooks as production code: Notebooks lack modularity, are difficult to test, and encourage global state. Extract reusable logic into Python packages and use notebooks only for exploration, prototyping, and presentation.
Storing credentials in notebook cells: Code cells with API keys, passwords, or tokens persist in the JSON file and version history. Use environment variables or credential files loaded at runtime, never hardcoded values.
Ignoring the kernel specification: Notebooks are tied to a specific kernel. Sharing a notebook that specifies python3.9 with a recipient who has only python3.11 causes kernel-not-found errors. Document the required environment or use a requirements.txt / environment.yml alongside the notebook.

Install this skill directly: skilldb add file-formats-skills

Get CLI access →

Jupyter Notebook — Interactive Computing Format

Overview

Core Philosophy

Technical Specifications

File Structure

Cell Types

Output Types

Versioning

How to Work With It

Running Notebooks

Classic Jupyter Notebook

JupyterLab (modern interface)

VS Code

Install "Jupyter" extension — full notebook support built in

Google Colab

Upload .ipynb or open from Google Drive — free GPU/TPU access

Programmatic Execution

Execute notebook from command line (nbconvert)

Execute with papermill (parameterized execution)

Execute programmatically

Converting

Convert to various formats

Quarto (modern alternative for publishing)

Converting to/from Scripts

Notebook to Python script

Python script to notebook (jupytext)

Jupytext percent format (.py file that round-trips with .ipynb)

---

jupyter:

kernelspec:

name: python3

---

%% [markdown]

# Analysis Title

Version Control Best Practices

Strip outputs before committing

Or use nbstripout (automatic via git filter)

Use jupytext to version control .py files instead of .ipynb

ReviewNB — GitHub app for reviewing notebook diffs

nbdime — diff and merge tool for notebooks

Validation and Linting

Validate notebook format

Lint code cells

Common Use Cases

Pros & Cons

Pros

Cons

Compatibility

Related Formats

Practical Usage

Anti-Patterns

Details

Pack: file-formats-skills
File: ipynb.md
Lines: 314
Category: Technology & Engineering

Download via CLI

Pro

$ skilldb add file-formats-skills

Installs the full File Formats pack to your project.

Jupyter Notebook

Jupyter Notebook — Interactive Computing Format

Overview

Core Philosophy

Technical Specifications

File Structure

Cell Types

Output Types

Versioning

How to Work With It

Running Notebooks

Programmatic Execution

Converting

Converting to/from Scripts

Version Control Best Practices

Validation and Linting

Common Use Cases

Pros & Cons

Pros

Cons

Compatibility

Related Formats

Practical Usage

Anti-Patterns

Related Skills

3MF 3D Manufacturing Format

7-Zip Compressed Archive

AAC (Advanced Audio Coding)

AC3 (Dolby Digital)

AI Adobe Illustrator Format

AIFF (Audio Interchange File Format)