Skip to main content
Autonomous AgentsPrediction628 lines

Weather and Climate Prediction

Quick Summary27 lines
Weather and climate prediction represents one of humanity's most successful forecasting endeavors, blending physics-based numerical models, ensemble methods, machine learning, and massive observational networks. Modern weather forecasts are accurate up to 10-15 days, while climate projections extend decades. This skill covers numerical weather prediction (NWP) basics, ensemble weather models, climate modeling fundamentals, extreme event forecasting, uncertainty quantification, and skill scores for evaluating prediction quality.

## Key Points

- Sensitive to initial conditions
- Predictable for ~10-15 days
- Question: "Will it rain next Tuesday?"
- Sensitive to boundary conditions (CO2, solar, land use)
- Predictable for decades (statistical properties)
- Question: "Will summers be hotter in 2050?"
1. Numerical Weather Prediction solves fundamental physics equations on a 3D grid; accuracy depends critically on initial conditions (data assimilation)
2. Ensemble forecasting is essential because weather is chaotic; running 50+ perturbed forecasts quantifies forecast uncertainty
3. Weather is an initial value problem (sensitive to starting conditions), while climate is a boundary value problem (sensitive to forcings like CO2)
4. The Anomaly Correlation Coefficient (ACC) is the standard NWP skill metric; forecasts are useful when ACC > 0.6 (typically out to 10-15 days)
5. CRPS is the proper scoring rule for probabilistic continuous forecasts, generalizing the Brier Score beyond binary outcomes
6. Extreme value analysis (GEV distributions) estimates return periods for unprecedented events by fitting distributions to historical extremes

## Quick Example

```
Momentum:    dv/dt = -∇p/ρ - 2Ω×v + g + F_friction
Mass:        ∂ρ/∂t + ∇·(ρv) = 0
Energy:      dT/dt = Q/cp + (RT/cpp)·dp/dt
Moisture:    dq/dt = E - C  (evaporation minus condensation)
```
skilldb get prediction-skills/weather-and-climate-predictionFull skill: 628 lines
Paste into your CLAUDE.md or agent config

Weather and Climate Prediction

Overview

Weather and climate prediction represents one of humanity's most successful forecasting endeavors, blending physics-based numerical models, ensemble methods, machine learning, and massive observational networks. Modern weather forecasts are accurate up to 10-15 days, while climate projections extend decades. This skill covers numerical weather prediction (NWP) basics, ensemble weather models, climate modeling fundamentals, extreme event forecasting, uncertainty quantification, and skill scores for evaluating prediction quality.

Numerical Weather Prediction Basics

The Physics

Weather prediction solves the primitive equations of atmospheric dynamics — conservation of momentum, mass, energy, and moisture on a rotating sphere:

Momentum:    dv/dt = -∇p/ρ - 2Ω×v + g + F_friction
Mass:        ∂ρ/∂t + ∇·(ρv) = 0
Energy:      dT/dt = Q/cp + (RT/cpp)·dp/dt
Moisture:    dq/dt = E - C  (evaporation minus condensation)

These are solved on a 3D grid covering the entire atmosphere.

import numpy as np

class SimpleAtmosphericModel:
    """
    Highly simplified atmospheric model to illustrate NWP concepts.
    Real models (GFS, ECMWF IFS) use spectral methods on grids
    with ~9km horizontal resolution and 137 vertical levels.
    """

    def __init__(self, grid_size: int = 100, n_levels: int = 10):
        self.nx = grid_size
        self.ny = grid_size
        self.nz = n_levels
        self.dx = 100e3  # 100 km grid spacing

        # State variables
        self.temperature = np.full((self.nx, self.ny, self.nz), 288.0)  # Kelvin
        self.pressure = np.full((self.nx, self.ny, self.nz), 101325.0)  # Pa
        self.u_wind = np.zeros((self.nx, self.ny, self.nz))  # m/s east-west
        self.v_wind = np.zeros((self.nx, self.ny, self.nz))  # m/s north-south
        self.humidity = np.full((self.nx, self.ny, self.nz), 0.01)  # kg/kg

        # Constants
        self.R = 287.05   # Gas constant for dry air
        self.cp = 1004.0  # Specific heat at constant pressure
        self.g = 9.81     # Gravitational acceleration
        self.omega = 7.292e-5  # Earth rotation rate

    def advection(self, field: np.ndarray, u: np.ndarray,
                  v: np.ndarray, dt: float) -> np.ndarray:
        """
        Simple upstream advection: transport a field by wind.
        dF/dt = -u·∂F/∂x - v·∂F/∂y
        """
        dFdx = np.gradient(field, self.dx, axis=0)
        dFdy = np.gradient(field, self.dx, axis=1)
        return field - dt * (u * dFdx + v * dFdy)

    def pressure_gradient_force(self) -> tuple:
        """
        Wind is driven by pressure gradients.
        F = -(1/ρ)·∇p
        """
        dpdx = np.gradient(self.pressure, self.dx, axis=0)
        dpdy = np.gradient(self.pressure, self.dx, axis=1)
        rho = self.pressure / (self.R * self.temperature)

        ax = -dpdx / rho
        ay = -dpdy / rho
        return ax, ay

    def coriolis_force(self, lat: float = 45.0) -> tuple:
        """
        Coriolis effect: deflects moving air on rotating Earth.
        f = 2Ω·sin(lat)
        """
        f = 2 * self.omega * np.sin(np.radians(lat))
        ax = f * self.v_wind
        ay = -f * self.u_wind
        return ax, ay

    def step(self, dt: float = 600):
        """Advance the model by one time step (default 10 minutes)."""
        # Pressure gradient force
        pgf_x, pgf_y = self.pressure_gradient_force()

        # Coriolis
        cor_x, cor_y = self.coriolis_force()

        # Update winds
        self.u_wind += dt * (pgf_x + cor_x)
        self.v_wind += dt * (pgf_y + cor_y)

        # Advect temperature and moisture
        self.temperature = self.advection(self.temperature, self.u_wind, self.v_wind, dt)
        self.humidity = self.advection(self.humidity, self.u_wind, self.v_wind, dt)

        # Simple radiation (cooling/heating)
        solar_heating = 2.0 / 86400  # ~2K/day heating
        ir_cooling = 1.5 / 86400    # ~1.5K/day cooling
        self.temperature += dt * (solar_heating - ir_cooling)

    def forecast(self, hours: int = 48, dt: float = 600) -> dict:
        """Run a forecast for the specified number of hours."""
        n_steps = int(hours * 3600 / dt)
        snapshots = []

        for step in range(n_steps):
            self.step(dt)
            if step % (3600 / dt) == 0:  # Hourly snapshots
                snapshots.append({
                    'hour': step * dt / 3600,
                    'temperature': self.temperature[:, :, 0].copy(),
                    'pressure': self.pressure[:, :, 0].copy(),
                    'wind_speed': np.sqrt(self.u_wind[:, :, 0]**2 + self.v_wind[:, :, 0]**2).copy()
                })

        return {
            'forecast_hours': hours,
            'n_snapshots': len(snapshots),
            'snapshots': snapshots
        }

Data Assimilation

The critical step between observations and forecasts: combining observations with a prior model state to create the best possible initial conditions.

class DataAssimilation:
    """
    Simplified data assimilation: combine observations with model background.
    Real systems use 4D-Var or Ensemble Kalman Filter.
    """

    def __init__(self):
        self.observations = []

    def add_observation(self, location: tuple, variable: str,
                        value: float, error: float, time: float):
        self.observations.append({
            'location': location,
            'variable': variable,
            'value': value,
            'error_variance': error**2,
            'time': time
        })

    def optimal_interpolation(self, background: np.ndarray,
                               background_error: float,
                               grid_locations: np.ndarray) -> np.ndarray:
        """
        Optimal Interpolation (OI): the simplest data assimilation method.
        Analysis = Background + K × (Observation - H(Background))
        K = Kalman gain = B / (B + R)
        """
        analysis = background.copy()

        for obs in self.observations:
            # Find nearest grid point
            distances = np.sqrt(
                (grid_locations[:, 0] - obs['location'][0])**2 +
                (grid_locations[:, 1] - obs['location'][1])**2
            )
            nearest_idx = np.argmin(distances)

            # Observation increment
            innovation = obs['value'] - background[nearest_idx]

            # Kalman gain
            B = background_error**2
            R = obs['error_variance']
            K = B / (B + R)

            # Update analysis with spatial correlation weighting
            correlation_length = 200e3  # 200 km
            weights = np.exp(-distances**2 / (2 * correlation_length**2))
            analysis += K * innovation * weights

        return analysis

    def ensemble_kalman_filter(self, ensemble: np.ndarray,
                                observations: list) -> np.ndarray:
        """
        Ensemble Kalman Filter (EnKF): use ensemble spread to estimate
        background error covariance. The standard for modern NWP.

        ensemble: (n_members, n_gridpoints)
        """
        n_members, n_grid = ensemble.shape
        ensemble_mean = np.mean(ensemble, axis=0)

        # Ensemble perturbation matrix
        A = ensemble - ensemble_mean  # (n_members, n_grid)

        # Background error covariance (estimated from ensemble)
        P_b = A.T @ A / (n_members - 1)

        for obs in observations:
            nearest_idx = obs['grid_index']
            H = np.zeros(n_grid)
            H[nearest_idx] = 1  # Observation operator

            # Innovation
            y_obs = obs['value']
            R = obs['error_variance']

            # Kalman gain
            HP = H @ P_b
            HPH_R = HP @ H + R
            K = P_b @ H / HPH_R

            # Update each ensemble member
            for m in range(n_members):
                obs_perturbed = y_obs + np.random.normal(0, np.sqrt(R))
                innovation = obs_perturbed - H @ ensemble[m]
                ensemble[m] += K * innovation

        return ensemble

Ensemble Weather Models

Why Ensembles Are Essential for Weather

Weather is a chaotic system: tiny errors in initial conditions grow exponentially. Ensemble forecasting runs the model many times with slightly different starting conditions to quantify forecast uncertainty.

class EnsembleWeatherForecast:
    """
    Ensemble weather prediction system.
    Run N perturbed forecasts to capture uncertainty.
    """

    def __init__(self, n_members: int = 50):
        self.n_members = n_members
        self.members = []

    def generate_initial_perturbations(self, analysis: np.ndarray,
                                        analysis_error: np.ndarray) -> list:
        """
        Generate perturbed initial conditions for ensemble members.
        Methods: Bred vectors, Singular vectors, or EnKF perturbations.
        """
        perturbations = []
        for _ in range(self.n_members):
            perturbation = np.random.normal(0, analysis_error)
            perturbed_state = analysis + perturbation
            perturbations.append(perturbed_state)
        return perturbations

    def run_ensemble(self, model, initial_states: list,
                     forecast_hours: int = 240) -> dict:
        """Run all ensemble members."""
        forecasts = []
        for i, initial_state in enumerate(initial_states):
            model.reset(initial_state)
            forecast = model.forecast(hours=forecast_hours)
            forecasts.append(forecast)

        return self._analyze_ensemble(forecasts)

    def _analyze_ensemble(self, forecasts: list) -> dict:
        """Analyze ensemble spread and compute probabilistic forecasts."""
        n_times = len(forecasts[0]['snapshots'])
        analysis = {
            'ensemble_mean': [],
            'ensemble_spread': [],
            'probability_of_rain': [],
            'spaghetti_data': []
        }

        for t in range(n_times):
            temps = np.array([f['snapshots'][t]['temperature'] for f in forecasts])

            analysis['ensemble_mean'].append(np.mean(temps, axis=0))
            analysis['ensemble_spread'].append(np.std(temps, axis=0))

        return analysis

    def probability_forecast(self, ensemble_values: np.ndarray,
                              threshold: float) -> float:
        """
        Probability that a variable exceeds a threshold.
        P(X > threshold) = fraction of ensemble members exceeding it.
        """
        exceedances = np.sum(ensemble_values > threshold)
        return exceedances / len(ensemble_values)

    def quantile_forecast(self, ensemble_values: np.ndarray,
                           quantiles: list = None) -> dict:
        """Compute quantile forecasts from ensemble."""
        if quantiles is None:
            quantiles = [0.10, 0.25, 0.50, 0.75, 0.90]

        return {
            f'q{int(q*100)}': np.percentile(ensemble_values, q * 100)
            for q in quantiles
        }

Climate Modeling Fundamentals

The Difference Between Weather and Climate

Weather: Initial value problem
  - Sensitive to initial conditions
  - Predictable for ~10-15 days
  - Question: "Will it rain next Tuesday?"

Climate: Boundary value problem
  - Sensitive to boundary conditions (CO2, solar, land use)
  - Predictable for decades (statistical properties)
  - Question: "Will summers be hotter in 2050?"
class SimpleClimateModel:
    """
    Energy balance climate model.
    Illustrates the fundamental physics of climate prediction.
    """

    def __init__(self):
        self.solar_constant = 1361  # W/m² (incoming solar)
        self.albedo = 0.30         # Fraction reflected
        self.sigma = 5.67e-8      # Stefan-Boltzmann constant
        self.heat_capacity = 4.2e8  # J/m²/K (ocean mixed layer)

    def equilibrium_temperature(self, co2_ppm: float = 280) -> float:
        """
        Compute equilibrium surface temperature for given CO2.
        Uses logarithmic radiative forcing.
        """
        # Incoming solar (absorbed)
        S_absorbed = self.solar_constant / 4 * (1 - self.albedo)

        # Greenhouse effect: radiative forcing from CO2
        # ΔF = 5.35 × ln(C/C₀) W/m²
        co2_baseline = 280  # Pre-industrial
        delta_F = 5.35 * np.log(co2_ppm / co2_baseline)

        # Climate sensitivity parameter
        lambda_param = 0.8  # K per W/m² (includes feedbacks)

        # Pre-industrial equilibrium temperature
        T_0 = ((S_absorbed) / (self.sigma * 0.61)) ** 0.25  # ~288K with greenhouse

        # Temperature change
        delta_T = lambda_param * delta_F

        return T_0 + delta_T

    def transient_response(self, co2_trajectory: list,
                           years: int = 100) -> list:
        """
        Simulate transient temperature response to changing CO2.
        Includes ocean thermal inertia (lagged response).
        """
        T = 288.0  # Starting temperature
        results = []
        dt = 3.156e7  # One year in seconds

        for year in range(years):
            co2 = co2_trajectory[year] if year < len(co2_trajectory) else co2_trajectory[-1]

            T_eq = self.equilibrium_temperature(co2)
            # Ocean inertia: temperature relaxes toward equilibrium
            tau = 30 * 3.156e7  # 30-year time constant
            dT = (T_eq - T) * dt / tau
            T += dT

            results.append({
                'year': year,
                'co2_ppm': co2,
                'temperature': T,
                'equilibrium_temperature': T_eq,
                'warming_from_preindustrial': T - 288.0
            })

        return results

    def scenario_projections(self) -> dict:
        """
        Project temperature under different emissions scenarios.
        Simplified versions of IPCC SSP scenarios.
        """
        scenarios = {}

        # SSP1-2.6: Strong mitigation, peak and decline
        ssp126 = [280 + (420-280) * min(y/20, 1) - max(0, (y-40)*2) for y in range(100)]
        ssp126 = [max(280, c) for c in ssp126]
        scenarios['SSP1-2.6'] = {
            'description': 'Strong mitigation, net-zero by 2070',
            'projections': self.transient_response(ssp126)
        }

        # SSP2-4.5: Middle of the road
        ssp245 = [280 + (600-280) * min(y/80, 1) for y in range(100)]
        scenarios['SSP2-4.5'] = {
            'description': 'Middle of the road, moderate action',
            'projections': self.transient_response(ssp245)
        }

        # SSP5-8.5: Fossil-fueled development
        ssp585 = [280 + y * 8 for y in range(100)]
        scenarios['SSP5-8.5'] = {
            'description': 'High emissions, minimal policy',
            'projections': self.transient_response(ssp585)
        }

        return scenarios

Extreme Event Forecasting

class ExtremeEventForecaster:
    """Forecast and characterize extreme weather events."""

    def __init__(self):
        self.historical_extremes = []

    def return_period_analysis(self, annual_maxima: np.ndarray) -> dict:
        """
        Extreme value analysis using Generalized Extreme Value distribution.
        Estimate return periods for extreme events.
        """
        from scipy.stats import genextreme

        # Fit GEV distribution
        shape, loc, scale = genextreme.fit(annual_maxima)

        return_periods = [2, 5, 10, 25, 50, 100, 500]
        return_levels = {}

        for rp in return_periods:
            # Return level = value expected to be exceeded once in rp years
            prob = 1 - 1/rp
            level = genextreme.ppf(prob, shape, loc, scale)
            return_levels[f'{rp}_year'] = level

        return {
            'gev_shape': shape,
            'gev_location': loc,
            'gev_scale': scale,
            'return_levels': return_levels,
            'distribution_type': (
                'Frechet (heavy tail)' if shape > 0
                else 'Weibull (bounded)' if shape < 0
                else 'Gumbel (exponential tail)'
            )
        }

    def heatwave_probability(self, temperature_data: np.ndarray,
                              threshold: float, duration: int = 3) -> dict:
        """
        Estimate probability of a heatwave (consecutive days above threshold).
        """
        above = temperature_data > threshold
        consecutive = 0
        max_consecutive = 0
        heatwave_count = 0

        for day in above:
            if day:
                consecutive += 1
                if consecutive == duration:
                    heatwave_count += 1
            else:
                max_consecutive = max(max_consecutive, consecutive)
                consecutive = 0

        n_years = len(temperature_data) / 365

        return {
            'heatwave_frequency': heatwave_count / n_years,
            'annual_probability': 1 - (1 - heatwave_count / n_years),
            'longest_streak': max_consecutive,
            'days_above_threshold': int(np.sum(above)),
            'fraction_above': np.mean(above)
        }

    def attribution_analysis(self, observed_event: float,
                              natural_distribution: tuple,
                              anthropogenic_distribution: tuple) -> dict:
        """
        Simplified attribution: compare event probability
        in natural vs anthropogenic climate.
        """
        from scipy.stats import norm

        # Probability in natural climate
        p_natural = 1 - norm.cdf(observed_event, *natural_distribution)

        # Probability in current climate
        p_current = 1 - norm.cdf(observed_event, *anthropogenic_distribution)

        # Risk ratio
        risk_ratio = p_current / max(p_natural, 1e-10)

        # Fraction Attributable Risk
        far = 1 - 1/risk_ratio if risk_ratio > 1 else 0

        return {
            'probability_natural': p_natural,
            'probability_current': p_current,
            'risk_ratio': risk_ratio,
            'fraction_attributable': far,
            'interpretation': (
                f'Event is {risk_ratio:.1f}x more likely due to climate change. '
                f'{far:.0%} of the risk is attributable to human influence.'
            )
        }

Skill Scores for Weather Prediction

class WeatherSkillScores:
    """Standard verification metrics for weather forecasts."""

    @staticmethod
    def rmse(forecast: np.ndarray, observed: np.ndarray) -> float:
        """Root Mean Square Error."""
        return np.sqrt(np.mean((forecast - observed)**2))

    @staticmethod
    def mae(forecast: np.ndarray, observed: np.ndarray) -> float:
        """Mean Absolute Error."""
        return np.mean(np.abs(forecast - observed))

    @staticmethod
    def anomaly_correlation(forecast: np.ndarray, observed: np.ndarray,
                           climatology: np.ndarray) -> float:
        """
        Anomaly Correlation Coefficient (ACC).
        The standard metric for NWP skill. ACC > 0.6 is considered useful.
        """
        f_anom = forecast - climatology
        o_anom = observed - climatology

        numerator = np.sum(f_anom * o_anom)
        denominator = np.sqrt(np.sum(f_anom**2) * np.sum(o_anom**2))

        return numerator / denominator if denominator > 0 else 0

    @staticmethod
    def brier_score_weather(probability_forecasts: np.ndarray,
                            binary_observations: np.ndarray) -> float:
        """Brier Score for probabilistic weather forecasts (e.g., P(rain))."""
        return np.mean((probability_forecasts - binary_observations)**2)

    @staticmethod
    def crps(ensemble_forecasts: np.ndarray, observation: float) -> float:
        """
        Continuous Ranked Probability Score (CRPS).
        Generalizes Brier Score to continuous variables.
        Lower is better. Compares full probability distribution to observation.
        """
        sorted_ensemble = np.sort(ensemble_forecasts)
        n = len(sorted_ensemble)

        crps_value = 0
        for i in range(n):
            cumulative_prob = (i + 1) / n
            indicator = 1 if observation <= sorted_ensemble[i] else 0
            crps_value += (cumulative_prob - indicator)**2

        return crps_value / n

    @staticmethod
    def reliability_diagram(probability_forecasts: np.ndarray,
                            observations: np.ndarray,
                            n_bins: int = 10) -> dict:
        """Generate reliability diagram data for probabilistic forecasts."""
        bins = np.linspace(0, 1, n_bins + 1)
        points = []

        for i in range(n_bins):
            mask = (probability_forecasts >= bins[i]) & (probability_forecasts < bins[i+1])
            if np.sum(mask) > 0:
                forecast_mean = np.mean(probability_forecasts[mask])
                observed_freq = np.mean(observations[mask])
                count = np.sum(mask)
                points.append({
                    'forecast_probability': forecast_mean,
                    'observed_frequency': observed_freq,
                    'count': int(count)
                })

        return {
            'reliability_points': points,
            'perfect_line': [(x, x) for x in np.linspace(0, 1, 11)]
        }

    @staticmethod
    def spread_skill_ratio(ensemble_spread: float, rmse: float) -> float:
        """
        Spread-Skill Ratio: ensemble spread should match RMSE.
        Ratio = 1: well-calibrated ensemble
        Ratio < 1: overconfident (spread too narrow)
        Ratio > 1: underconfident (spread too wide)
        """
        return ensemble_spread / rmse if rmse > 0 else float('inf')

Key Takeaways

  1. Numerical Weather Prediction solves fundamental physics equations on a 3D grid; accuracy depends critically on initial conditions (data assimilation)
  2. Ensemble forecasting is essential because weather is chaotic; running 50+ perturbed forecasts quantifies forecast uncertainty
  3. Weather is an initial value problem (sensitive to starting conditions), while climate is a boundary value problem (sensitive to forcings like CO2)
  4. The Anomaly Correlation Coefficient (ACC) is the standard NWP skill metric; forecasts are useful when ACC > 0.6 (typically out to 10-15 days)
  5. CRPS is the proper scoring rule for probabilistic continuous forecasts, generalizing the Brier Score beyond binary outcomes
  6. Extreme value analysis (GEV distributions) estimates return periods for unprecedented events by fitting distributions to historical extremes
  7. The Spread-Skill Ratio checks ensemble calibration: if spread consistently underestimates RMSE, the ensemble is overconfident
  8. Climate attribution analysis quantifies how much climate change increases the probability of specific extreme events

Install this skill directly: skilldb add prediction-skills

Get CLI access →