Matplotlib
Expert guidance on Matplotlib for creating static, animated, and interactive visualizations in Python.
You are an expert in Matplotlib for data analysis and science.
## Key Points
- **Use the object-oriented API** (`fig, ax = plt.subplots()`) instead of the pyplot state machine (`plt.plot()`). It is clearer and avoids accidental cross-talk between figures.
- **Always call `plt.tight_layout()`** or use `constrained_layout=True` to prevent label overlap.
- **Use `bbox_inches="tight"`** when saving to avoid cropped labels.
- **Label everything**: axes, legends, titles. A plot without labels is incomplete.
- **Choose colormaps carefully**: use perceptually uniform colormaps (`viridis`, `plasma`) instead of `jet` or `rainbow`.
- **Close figures** with `plt.close(fig)` in loops to avoid memory leaks.
- **Forgetting `plt.show()`** in scripts (not needed in Jupyter).
- **Mixing pyplot and OO API** causes confusing state. Pick one — prefer OO.
- **Not closing figures in loops** leads to memory exhaustion when generating many plots.
- **Using `plt.subplot` instead of `plt.subplots`** — the plural form returns a figure and array of axes in one call, which is almost always what you want.
- **Overlapping labels on tight layouts** — use `fig.autofmt_xdate()` for date axes or rotate ticks manually.
## Quick Example
```python
fig.savefig("figure.pdf", bbox_inches="tight", dpi=300)
fig.savefig("figure.svg", bbox_inches="tight")
fig.savefig("figure.png", bbox_inches="tight", dpi=300, transparent=True)
```skilldb get data-science-skills/MatplotlibFull skill: 164 linesMatplotlib — Data Science
You are an expert in Matplotlib for data analysis and science.
Overview
Matplotlib is Python's most widely used plotting library. It provides fine-grained control over every aspect of a figure — from axes layout to tick formatting. While higher-level libraries like Seaborn build on it, understanding Matplotlib's object-oriented API is essential for customizing any Python visualization.
Core Concepts
Figure and Axes
Every plot lives inside a Figure containing one or more Axes objects.
import matplotlib.pyplot as plt
import numpy as np
# Preferred: explicit Figure + Axes
fig, ax = plt.subplots(figsize=(8, 5))
ax.plot([1, 2, 3], [4, 5, 6])
ax.set_xlabel("X")
ax.set_ylabel("Y")
ax.set_title("Simple Line Plot")
plt.tight_layout()
plt.savefig("plot.png", dpi=150)
plt.show()
Subplots
fig, axes = plt.subplots(2, 2, figsize=(10, 8))
axes[0, 0].bar(["A", "B", "C"], [3, 7, 5])
axes[0, 1].scatter(np.random.randn(50), np.random.randn(50))
axes[1, 0].hist(np.random.randn(1000), bins=30, edgecolor="black")
axes[1, 1].boxplot([np.random.randn(100) for _ in range(4)])
for ax in axes.flat:
ax.set_xlabel("x")
plt.tight_layout()
Plot Types
x = np.linspace(0, 10, 100)
# Line
ax.plot(x, np.sin(x), label="sin", linestyle="--", color="steelblue")
# Scatter
ax.scatter(x, y, c=colors, s=sizes, alpha=0.6, cmap="viridis")
# Bar
ax.bar(categories, values, color="coral", edgecolor="black")
ax.barh(categories, values) # horizontal
# Histogram
ax.hist(data, bins=30, density=True, alpha=0.7)
# Heatmap
im = ax.imshow(matrix, cmap="coolwarm", aspect="auto")
fig.colorbar(im, ax=ax)
# Fill between
ax.fill_between(x, y_lower, y_upper, alpha=0.3)
Implementation Patterns
Style and Theming
# Use a built-in style
plt.style.use("seaborn-v0_8-whitegrid")
# Custom rcParams
plt.rcParams.update({
"font.size": 12,
"axes.labelsize": 14,
"figure.dpi": 100,
"axes.spines.top": False,
"axes.spines.right": False,
})
Annotations and Text
ax.annotate(
"Peak",
xy=(peak_x, peak_y),
xytext=(peak_x + 1, peak_y + 5),
arrowprops=dict(arrowstyle="->", color="red"),
fontsize=12,
)
ax.text(0.05, 0.95, "R² = 0.93", transform=ax.transAxes, va="top")
Twin Axes (Dual Y-Axis)
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.plot(dates, revenue, color="blue", label="Revenue")
ax2.plot(dates, users, color="orange", label="Users")
ax1.set_ylabel("Revenue ($)", color="blue")
ax2.set_ylabel("Users", color="orange")
Saving Publication-Quality Figures
fig.savefig("figure.pdf", bbox_inches="tight", dpi=300)
fig.savefig("figure.svg", bbox_inches="tight")
fig.savefig("figure.png", bbox_inches="tight", dpi=300, transparent=True)
Best Practices
- Use the object-oriented API (
fig, ax = plt.subplots()) instead of the pyplot state machine (plt.plot()). It is clearer and avoids accidental cross-talk between figures. - Always call
plt.tight_layout()or useconstrained_layout=Trueto prevent label overlap. - Use
bbox_inches="tight"when saving to avoid cropped labels. - Label everything: axes, legends, titles. A plot without labels is incomplete.
- Choose colormaps carefully: use perceptually uniform colormaps (
viridis,plasma) instead ofjetorrainbow. - Close figures with
plt.close(fig)in loops to avoid memory leaks.
Core Philosophy
A visualization exists to communicate an insight, not to demonstrate technical skill. The best Matplotlib plots are the ones where the reader immediately grasps the pattern, trend, or comparison without needing to decode the chart mechanics. Every element -- color, label, annotation, axis range -- should serve the message. If an element does not help the reader understand the data, it is clutter.
Matplotlib's strength is control. Unlike higher-level libraries that impose opinionated defaults, Matplotlib lets you adjust every pixel. This power comes with responsibility: the defaults are often not publication-ready, so you must deliberately set font sizes, remove unnecessary spines, choose appropriate colormaps, and add clear labels. Treating these adjustments as mandatory rather than optional is what separates informative plots from confusing ones.
Prefer the object-oriented API from the start. The pyplot state machine (plt.plot(), plt.xlabel()) is convenient for quick one-offs, but it becomes a source of subtle bugs as soon as you have multiple figures or subplots. Building the habit of working with explicit fig and ax objects eliminates an entire category of errors and makes code easier to refactor into functions.
Anti-Patterns
-
Rainbow colormaps on sequential data: Using
jetorrainbowcolormaps for continuous data distorts perception because they are not perceptually uniform. Viewers see false boundaries where hue transitions occur. Useviridis,plasma, orinfernoinstead. -
Unlabeled axes and missing legends: Producing plots without axis labels, titles, or legends. A chart that requires the reader to guess what the axes represent or which color corresponds to which series has failed at its communication purpose.
-
Generating plots in a loop without closing figures: Creating figures inside a loop with
plt.subplots()but never callingplt.close(fig), causing memory to grow unbounded. In long-running processes this leads to crashes. -
Mixing pyplot and object-oriented API: Alternating between
plt.plot()andax.plot()in the same script, creating confusion about which figure or axes is being modified. Pick the OO API and use it consistently. -
Hardcoding figure aesthetics instead of using rcParams or style sheets: Setting font size, line width, and colors individually on every plot call rather than configuring them once via
plt.rcParamsor a style file. This makes it tedious to maintain visual consistency across a project.
Common Pitfalls
- Forgetting
plt.show()in scripts (not needed in Jupyter). - Mixing pyplot and OO API causes confusing state. Pick one — prefer OO.
- Not closing figures in loops leads to memory exhaustion when generating many plots.
- Using
plt.subplotinstead ofplt.subplots— the plural form returns a figure and array of axes in one call, which is almost always what you want. - Overlapping labels on tight layouts — use
fig.autofmt_xdate()for date axes or rotate ticks manually.
Install this skill directly: skilldb add data-science-skills
Related Skills
Data Cleaning
Expert guidance on data cleaning and preprocessing techniques for preparing raw data for analysis and modeling.
Feature Engineering
Expert guidance on feature engineering patterns for transforming raw data into predictive ML features.
Jupyter
Expert guidance on Jupyter notebooks for interactive data exploration, documentation, and reproducible analysis.
Numpy
Expert guidance on NumPy for numerical computing, array operations, and linear algebra in Python.
Pandas
Expert guidance on Pandas for tabular data manipulation, transformation, and analysis in Python.
Polars
Expert guidance on Polars for high-performance dataframe operations with a lazy query engine in Python.