Weather and Climate Prediction
Weather and climate prediction represents one of humanity's most successful forecasting endeavors, blending physics-based numerical models, ensemble methods, machine learning, and massive observational networks. Modern weather forecasts are accurate up to 10-15 days, while climate projections extend decades. This skill covers numerical weather prediction (NWP) basics, ensemble weather models, climate modeling fundamentals, extreme event forecasting, uncertainty quantification, and skill scores for evaluating prediction quality. ## Key Points - Sensitive to initial conditions - Predictable for ~10-15 days - Question: "Will it rain next Tuesday?" - Sensitive to boundary conditions (CO2, solar, land use) - Predictable for decades (statistical properties) - Question: "Will summers be hotter in 2050?" 1. Numerical Weather Prediction solves fundamental physics equations on a 3D grid; accuracy depends critically on initial conditions (data assimilation) 2. Ensemble forecasting is essential because weather is chaotic; running 50+ perturbed forecasts quantifies forecast uncertainty 3. Weather is an initial value problem (sensitive to starting conditions), while climate is a boundary value problem (sensitive to forcings like CO2) 4. The Anomaly Correlation Coefficient (ACC) is the standard NWP skill metric; forecasts are useful when ACC > 0.6 (typically out to 10-15 days) 5. CRPS is the proper scoring rule for probabilistic continuous forecasts, generalizing the Brier Score beyond binary outcomes 6. Extreme value analysis (GEV distributions) estimates return periods for unprecedented events by fitting distributions to historical extremes ## Quick Example ``` Momentum: dv/dt = -∇p/ρ - 2Ω×v + g + F_friction Mass: ∂ρ/∂t + ∇·(ρv) = 0 Energy: dT/dt = Q/cp + (RT/cpp)·dp/dt Moisture: dq/dt = E - C (evaporation minus condensation) ```
skilldb get prediction-skills/weather-and-climate-predictionFull skill: 628 linesWeather and Climate Prediction
Overview
Weather and climate prediction represents one of humanity's most successful forecasting endeavors, blending physics-based numerical models, ensemble methods, machine learning, and massive observational networks. Modern weather forecasts are accurate up to 10-15 days, while climate projections extend decades. This skill covers numerical weather prediction (NWP) basics, ensemble weather models, climate modeling fundamentals, extreme event forecasting, uncertainty quantification, and skill scores for evaluating prediction quality.
Numerical Weather Prediction Basics
The Physics
Weather prediction solves the primitive equations of atmospheric dynamics — conservation of momentum, mass, energy, and moisture on a rotating sphere:
Momentum: dv/dt = -∇p/ρ - 2Ω×v + g + F_friction
Mass: ∂ρ/∂t + ∇·(ρv) = 0
Energy: dT/dt = Q/cp + (RT/cpp)·dp/dt
Moisture: dq/dt = E - C (evaporation minus condensation)
These are solved on a 3D grid covering the entire atmosphere.
import numpy as np
class SimpleAtmosphericModel:
"""
Highly simplified atmospheric model to illustrate NWP concepts.
Real models (GFS, ECMWF IFS) use spectral methods on grids
with ~9km horizontal resolution and 137 vertical levels.
"""
def __init__(self, grid_size: int = 100, n_levels: int = 10):
self.nx = grid_size
self.ny = grid_size
self.nz = n_levels
self.dx = 100e3 # 100 km grid spacing
# State variables
self.temperature = np.full((self.nx, self.ny, self.nz), 288.0) # Kelvin
self.pressure = np.full((self.nx, self.ny, self.nz), 101325.0) # Pa
self.u_wind = np.zeros((self.nx, self.ny, self.nz)) # m/s east-west
self.v_wind = np.zeros((self.nx, self.ny, self.nz)) # m/s north-south
self.humidity = np.full((self.nx, self.ny, self.nz), 0.01) # kg/kg
# Constants
self.R = 287.05 # Gas constant for dry air
self.cp = 1004.0 # Specific heat at constant pressure
self.g = 9.81 # Gravitational acceleration
self.omega = 7.292e-5 # Earth rotation rate
def advection(self, field: np.ndarray, u: np.ndarray,
v: np.ndarray, dt: float) -> np.ndarray:
"""
Simple upstream advection: transport a field by wind.
dF/dt = -u·∂F/∂x - v·∂F/∂y
"""
dFdx = np.gradient(field, self.dx, axis=0)
dFdy = np.gradient(field, self.dx, axis=1)
return field - dt * (u * dFdx + v * dFdy)
def pressure_gradient_force(self) -> tuple:
"""
Wind is driven by pressure gradients.
F = -(1/ρ)·∇p
"""
dpdx = np.gradient(self.pressure, self.dx, axis=0)
dpdy = np.gradient(self.pressure, self.dx, axis=1)
rho = self.pressure / (self.R * self.temperature)
ax = -dpdx / rho
ay = -dpdy / rho
return ax, ay
def coriolis_force(self, lat: float = 45.0) -> tuple:
"""
Coriolis effect: deflects moving air on rotating Earth.
f = 2Ω·sin(lat)
"""
f = 2 * self.omega * np.sin(np.radians(lat))
ax = f * self.v_wind
ay = -f * self.u_wind
return ax, ay
def step(self, dt: float = 600):
"""Advance the model by one time step (default 10 minutes)."""
# Pressure gradient force
pgf_x, pgf_y = self.pressure_gradient_force()
# Coriolis
cor_x, cor_y = self.coriolis_force()
# Update winds
self.u_wind += dt * (pgf_x + cor_x)
self.v_wind += dt * (pgf_y + cor_y)
# Advect temperature and moisture
self.temperature = self.advection(self.temperature, self.u_wind, self.v_wind, dt)
self.humidity = self.advection(self.humidity, self.u_wind, self.v_wind, dt)
# Simple radiation (cooling/heating)
solar_heating = 2.0 / 86400 # ~2K/day heating
ir_cooling = 1.5 / 86400 # ~1.5K/day cooling
self.temperature += dt * (solar_heating - ir_cooling)
def forecast(self, hours: int = 48, dt: float = 600) -> dict:
"""Run a forecast for the specified number of hours."""
n_steps = int(hours * 3600 / dt)
snapshots = []
for step in range(n_steps):
self.step(dt)
if step % (3600 / dt) == 0: # Hourly snapshots
snapshots.append({
'hour': step * dt / 3600,
'temperature': self.temperature[:, :, 0].copy(),
'pressure': self.pressure[:, :, 0].copy(),
'wind_speed': np.sqrt(self.u_wind[:, :, 0]**2 + self.v_wind[:, :, 0]**2).copy()
})
return {
'forecast_hours': hours,
'n_snapshots': len(snapshots),
'snapshots': snapshots
}
Data Assimilation
The critical step between observations and forecasts: combining observations with a prior model state to create the best possible initial conditions.
class DataAssimilation:
"""
Simplified data assimilation: combine observations with model background.
Real systems use 4D-Var or Ensemble Kalman Filter.
"""
def __init__(self):
self.observations = []
def add_observation(self, location: tuple, variable: str,
value: float, error: float, time: float):
self.observations.append({
'location': location,
'variable': variable,
'value': value,
'error_variance': error**2,
'time': time
})
def optimal_interpolation(self, background: np.ndarray,
background_error: float,
grid_locations: np.ndarray) -> np.ndarray:
"""
Optimal Interpolation (OI): the simplest data assimilation method.
Analysis = Background + K × (Observation - H(Background))
K = Kalman gain = B / (B + R)
"""
analysis = background.copy()
for obs in self.observations:
# Find nearest grid point
distances = np.sqrt(
(grid_locations[:, 0] - obs['location'][0])**2 +
(grid_locations[:, 1] - obs['location'][1])**2
)
nearest_idx = np.argmin(distances)
# Observation increment
innovation = obs['value'] - background[nearest_idx]
# Kalman gain
B = background_error**2
R = obs['error_variance']
K = B / (B + R)
# Update analysis with spatial correlation weighting
correlation_length = 200e3 # 200 km
weights = np.exp(-distances**2 / (2 * correlation_length**2))
analysis += K * innovation * weights
return analysis
def ensemble_kalman_filter(self, ensemble: np.ndarray,
observations: list) -> np.ndarray:
"""
Ensemble Kalman Filter (EnKF): use ensemble spread to estimate
background error covariance. The standard for modern NWP.
ensemble: (n_members, n_gridpoints)
"""
n_members, n_grid = ensemble.shape
ensemble_mean = np.mean(ensemble, axis=0)
# Ensemble perturbation matrix
A = ensemble - ensemble_mean # (n_members, n_grid)
# Background error covariance (estimated from ensemble)
P_b = A.T @ A / (n_members - 1)
for obs in observations:
nearest_idx = obs['grid_index']
H = np.zeros(n_grid)
H[nearest_idx] = 1 # Observation operator
# Innovation
y_obs = obs['value']
R = obs['error_variance']
# Kalman gain
HP = H @ P_b
HPH_R = HP @ H + R
K = P_b @ H / HPH_R
# Update each ensemble member
for m in range(n_members):
obs_perturbed = y_obs + np.random.normal(0, np.sqrt(R))
innovation = obs_perturbed - H @ ensemble[m]
ensemble[m] += K * innovation
return ensemble
Ensemble Weather Models
Why Ensembles Are Essential for Weather
Weather is a chaotic system: tiny errors in initial conditions grow exponentially. Ensemble forecasting runs the model many times with slightly different starting conditions to quantify forecast uncertainty.
class EnsembleWeatherForecast:
"""
Ensemble weather prediction system.
Run N perturbed forecasts to capture uncertainty.
"""
def __init__(self, n_members: int = 50):
self.n_members = n_members
self.members = []
def generate_initial_perturbations(self, analysis: np.ndarray,
analysis_error: np.ndarray) -> list:
"""
Generate perturbed initial conditions for ensemble members.
Methods: Bred vectors, Singular vectors, or EnKF perturbations.
"""
perturbations = []
for _ in range(self.n_members):
perturbation = np.random.normal(0, analysis_error)
perturbed_state = analysis + perturbation
perturbations.append(perturbed_state)
return perturbations
def run_ensemble(self, model, initial_states: list,
forecast_hours: int = 240) -> dict:
"""Run all ensemble members."""
forecasts = []
for i, initial_state in enumerate(initial_states):
model.reset(initial_state)
forecast = model.forecast(hours=forecast_hours)
forecasts.append(forecast)
return self._analyze_ensemble(forecasts)
def _analyze_ensemble(self, forecasts: list) -> dict:
"""Analyze ensemble spread and compute probabilistic forecasts."""
n_times = len(forecasts[0]['snapshots'])
analysis = {
'ensemble_mean': [],
'ensemble_spread': [],
'probability_of_rain': [],
'spaghetti_data': []
}
for t in range(n_times):
temps = np.array([f['snapshots'][t]['temperature'] for f in forecasts])
analysis['ensemble_mean'].append(np.mean(temps, axis=0))
analysis['ensemble_spread'].append(np.std(temps, axis=0))
return analysis
def probability_forecast(self, ensemble_values: np.ndarray,
threshold: float) -> float:
"""
Probability that a variable exceeds a threshold.
P(X > threshold) = fraction of ensemble members exceeding it.
"""
exceedances = np.sum(ensemble_values > threshold)
return exceedances / len(ensemble_values)
def quantile_forecast(self, ensemble_values: np.ndarray,
quantiles: list = None) -> dict:
"""Compute quantile forecasts from ensemble."""
if quantiles is None:
quantiles = [0.10, 0.25, 0.50, 0.75, 0.90]
return {
f'q{int(q*100)}': np.percentile(ensemble_values, q * 100)
for q in quantiles
}
Climate Modeling Fundamentals
The Difference Between Weather and Climate
Weather: Initial value problem
- Sensitive to initial conditions
- Predictable for ~10-15 days
- Question: "Will it rain next Tuesday?"
Climate: Boundary value problem
- Sensitive to boundary conditions (CO2, solar, land use)
- Predictable for decades (statistical properties)
- Question: "Will summers be hotter in 2050?"
class SimpleClimateModel:
"""
Energy balance climate model.
Illustrates the fundamental physics of climate prediction.
"""
def __init__(self):
self.solar_constant = 1361 # W/m² (incoming solar)
self.albedo = 0.30 # Fraction reflected
self.sigma = 5.67e-8 # Stefan-Boltzmann constant
self.heat_capacity = 4.2e8 # J/m²/K (ocean mixed layer)
def equilibrium_temperature(self, co2_ppm: float = 280) -> float:
"""
Compute equilibrium surface temperature for given CO2.
Uses logarithmic radiative forcing.
"""
# Incoming solar (absorbed)
S_absorbed = self.solar_constant / 4 * (1 - self.albedo)
# Greenhouse effect: radiative forcing from CO2
# ΔF = 5.35 × ln(C/C₀) W/m²
co2_baseline = 280 # Pre-industrial
delta_F = 5.35 * np.log(co2_ppm / co2_baseline)
# Climate sensitivity parameter
lambda_param = 0.8 # K per W/m² (includes feedbacks)
# Pre-industrial equilibrium temperature
T_0 = ((S_absorbed) / (self.sigma * 0.61)) ** 0.25 # ~288K with greenhouse
# Temperature change
delta_T = lambda_param * delta_F
return T_0 + delta_T
def transient_response(self, co2_trajectory: list,
years: int = 100) -> list:
"""
Simulate transient temperature response to changing CO2.
Includes ocean thermal inertia (lagged response).
"""
T = 288.0 # Starting temperature
results = []
dt = 3.156e7 # One year in seconds
for year in range(years):
co2 = co2_trajectory[year] if year < len(co2_trajectory) else co2_trajectory[-1]
T_eq = self.equilibrium_temperature(co2)
# Ocean inertia: temperature relaxes toward equilibrium
tau = 30 * 3.156e7 # 30-year time constant
dT = (T_eq - T) * dt / tau
T += dT
results.append({
'year': year,
'co2_ppm': co2,
'temperature': T,
'equilibrium_temperature': T_eq,
'warming_from_preindustrial': T - 288.0
})
return results
def scenario_projections(self) -> dict:
"""
Project temperature under different emissions scenarios.
Simplified versions of IPCC SSP scenarios.
"""
scenarios = {}
# SSP1-2.6: Strong mitigation, peak and decline
ssp126 = [280 + (420-280) * min(y/20, 1) - max(0, (y-40)*2) for y in range(100)]
ssp126 = [max(280, c) for c in ssp126]
scenarios['SSP1-2.6'] = {
'description': 'Strong mitigation, net-zero by 2070',
'projections': self.transient_response(ssp126)
}
# SSP2-4.5: Middle of the road
ssp245 = [280 + (600-280) * min(y/80, 1) for y in range(100)]
scenarios['SSP2-4.5'] = {
'description': 'Middle of the road, moderate action',
'projections': self.transient_response(ssp245)
}
# SSP5-8.5: Fossil-fueled development
ssp585 = [280 + y * 8 for y in range(100)]
scenarios['SSP5-8.5'] = {
'description': 'High emissions, minimal policy',
'projections': self.transient_response(ssp585)
}
return scenarios
Extreme Event Forecasting
class ExtremeEventForecaster:
"""Forecast and characterize extreme weather events."""
def __init__(self):
self.historical_extremes = []
def return_period_analysis(self, annual_maxima: np.ndarray) -> dict:
"""
Extreme value analysis using Generalized Extreme Value distribution.
Estimate return periods for extreme events.
"""
from scipy.stats import genextreme
# Fit GEV distribution
shape, loc, scale = genextreme.fit(annual_maxima)
return_periods = [2, 5, 10, 25, 50, 100, 500]
return_levels = {}
for rp in return_periods:
# Return level = value expected to be exceeded once in rp years
prob = 1 - 1/rp
level = genextreme.ppf(prob, shape, loc, scale)
return_levels[f'{rp}_year'] = level
return {
'gev_shape': shape,
'gev_location': loc,
'gev_scale': scale,
'return_levels': return_levels,
'distribution_type': (
'Frechet (heavy tail)' if shape > 0
else 'Weibull (bounded)' if shape < 0
else 'Gumbel (exponential tail)'
)
}
def heatwave_probability(self, temperature_data: np.ndarray,
threshold: float, duration: int = 3) -> dict:
"""
Estimate probability of a heatwave (consecutive days above threshold).
"""
above = temperature_data > threshold
consecutive = 0
max_consecutive = 0
heatwave_count = 0
for day in above:
if day:
consecutive += 1
if consecutive == duration:
heatwave_count += 1
else:
max_consecutive = max(max_consecutive, consecutive)
consecutive = 0
n_years = len(temperature_data) / 365
return {
'heatwave_frequency': heatwave_count / n_years,
'annual_probability': 1 - (1 - heatwave_count / n_years),
'longest_streak': max_consecutive,
'days_above_threshold': int(np.sum(above)),
'fraction_above': np.mean(above)
}
def attribution_analysis(self, observed_event: float,
natural_distribution: tuple,
anthropogenic_distribution: tuple) -> dict:
"""
Simplified attribution: compare event probability
in natural vs anthropogenic climate.
"""
from scipy.stats import norm
# Probability in natural climate
p_natural = 1 - norm.cdf(observed_event, *natural_distribution)
# Probability in current climate
p_current = 1 - norm.cdf(observed_event, *anthropogenic_distribution)
# Risk ratio
risk_ratio = p_current / max(p_natural, 1e-10)
# Fraction Attributable Risk
far = 1 - 1/risk_ratio if risk_ratio > 1 else 0
return {
'probability_natural': p_natural,
'probability_current': p_current,
'risk_ratio': risk_ratio,
'fraction_attributable': far,
'interpretation': (
f'Event is {risk_ratio:.1f}x more likely due to climate change. '
f'{far:.0%} of the risk is attributable to human influence.'
)
}
Skill Scores for Weather Prediction
class WeatherSkillScores:
"""Standard verification metrics for weather forecasts."""
@staticmethod
def rmse(forecast: np.ndarray, observed: np.ndarray) -> float:
"""Root Mean Square Error."""
return np.sqrt(np.mean((forecast - observed)**2))
@staticmethod
def mae(forecast: np.ndarray, observed: np.ndarray) -> float:
"""Mean Absolute Error."""
return np.mean(np.abs(forecast - observed))
@staticmethod
def anomaly_correlation(forecast: np.ndarray, observed: np.ndarray,
climatology: np.ndarray) -> float:
"""
Anomaly Correlation Coefficient (ACC).
The standard metric for NWP skill. ACC > 0.6 is considered useful.
"""
f_anom = forecast - climatology
o_anom = observed - climatology
numerator = np.sum(f_anom * o_anom)
denominator = np.sqrt(np.sum(f_anom**2) * np.sum(o_anom**2))
return numerator / denominator if denominator > 0 else 0
@staticmethod
def brier_score_weather(probability_forecasts: np.ndarray,
binary_observations: np.ndarray) -> float:
"""Brier Score for probabilistic weather forecasts (e.g., P(rain))."""
return np.mean((probability_forecasts - binary_observations)**2)
@staticmethod
def crps(ensemble_forecasts: np.ndarray, observation: float) -> float:
"""
Continuous Ranked Probability Score (CRPS).
Generalizes Brier Score to continuous variables.
Lower is better. Compares full probability distribution to observation.
"""
sorted_ensemble = np.sort(ensemble_forecasts)
n = len(sorted_ensemble)
crps_value = 0
for i in range(n):
cumulative_prob = (i + 1) / n
indicator = 1 if observation <= sorted_ensemble[i] else 0
crps_value += (cumulative_prob - indicator)**2
return crps_value / n
@staticmethod
def reliability_diagram(probability_forecasts: np.ndarray,
observations: np.ndarray,
n_bins: int = 10) -> dict:
"""Generate reliability diagram data for probabilistic forecasts."""
bins = np.linspace(0, 1, n_bins + 1)
points = []
for i in range(n_bins):
mask = (probability_forecasts >= bins[i]) & (probability_forecasts < bins[i+1])
if np.sum(mask) > 0:
forecast_mean = np.mean(probability_forecasts[mask])
observed_freq = np.mean(observations[mask])
count = np.sum(mask)
points.append({
'forecast_probability': forecast_mean,
'observed_frequency': observed_freq,
'count': int(count)
})
return {
'reliability_points': points,
'perfect_line': [(x, x) for x in np.linspace(0, 1, 11)]
}
@staticmethod
def spread_skill_ratio(ensemble_spread: float, rmse: float) -> float:
"""
Spread-Skill Ratio: ensemble spread should match RMSE.
Ratio = 1: well-calibrated ensemble
Ratio < 1: overconfident (spread too narrow)
Ratio > 1: underconfident (spread too wide)
"""
return ensemble_spread / rmse if rmse > 0 else float('inf')
Key Takeaways
- Numerical Weather Prediction solves fundamental physics equations on a 3D grid; accuracy depends critically on initial conditions (data assimilation)
- Ensemble forecasting is essential because weather is chaotic; running 50+ perturbed forecasts quantifies forecast uncertainty
- Weather is an initial value problem (sensitive to starting conditions), while climate is a boundary value problem (sensitive to forcings like CO2)
- The Anomaly Correlation Coefficient (ACC) is the standard NWP skill metric; forecasts are useful when ACC > 0.6 (typically out to 10-15 days)
- CRPS is the proper scoring rule for probabilistic continuous forecasts, generalizing the Brier Score beyond binary outcomes
- Extreme value analysis (GEV distributions) estimates return periods for unprecedented events by fitting distributions to historical extremes
- The Spread-Skill Ratio checks ensemble calibration: if spread consistently underestimates RMSE, the ensemble is overconfident
- Climate attribution analysis quantifies how much climate change increases the probability of specific extreme events
Install this skill directly: skilldb add prediction-skills