Skip to main content
Autonomous AgentsPrediction577 lines

Financial Market Prediction

Quick Summary14 lines
Financial market prediction combines technical analysis (price patterns), fundamental analysis (intrinsic value), sentiment analysis (market psychology), and quantitative methods (statistical/ML models) to forecast asset prices, volatility, and market regime changes. While markets are notoriously difficult to predict due to the efficient market hypothesis, exploitable edges exist in volatility forecasting, sentiment-driven mispricings, and machine learning approaches that process alternative data at scale.

## Key Points

1. Technical analysis identifies patterns in price and volume; it works best as a timing tool on top of fundamental views
2. Fundamental analysis (DCF, relative valuation) provides the "what to buy" while technicals provide the "when to buy"
3. Sentiment indicators (VIX, put/call ratio, Fear & Greed) are most valuable as contrarian signals at extremes
4. GARCH models capture volatility clustering and provide calibrated volatility forecasts essential for risk management
5. Options prices contain the market's probability distribution for future prices; the Breeden-Litzenberger formula extracts it
6. Machine learning for financial prediction requires walk-forward validation to avoid look-ahead bias; random train/test splits are invalid
7. The most robust predictive features tend to be simple: momentum, mean reversion, volume, and volatility, not complex patterns
8. Transaction costs, slippage, and market impact are the reality check; many "profitable" strategies evaporate after costs
skilldb get prediction-skills/financial-market-predictionFull skill: 577 lines
Paste into your CLAUDE.md or agent config

Financial Market Prediction

Overview

Financial market prediction combines technical analysis (price patterns), fundamental analysis (intrinsic value), sentiment analysis (market psychology), and quantitative methods (statistical/ML models) to forecast asset prices, volatility, and market regime changes. While markets are notoriously difficult to predict due to the efficient market hypothesis, exploitable edges exist in volatility forecasting, sentiment-driven mispricings, and machine learning approaches that process alternative data at scale.

Technical Analysis

Price Pattern Recognition

import numpy as np
import pandas as pd

class TechnicalAnalyzer:
    """Core technical analysis indicators for market prediction."""

    def __init__(self, prices: pd.DataFrame):
        """prices: DataFrame with columns ['open', 'high', 'low', 'close', 'volume']"""
        self.df = prices.copy()

    def moving_averages(self, short_window: int = 20, long_window: int = 50):
        """Compute moving average crossover signals."""
        self.df['sma_short'] = self.df['close'].rolling(short_window).mean()
        self.df['sma_long'] = self.df['close'].rolling(long_window).mean()
        self.df['ema_short'] = self.df['close'].ewm(span=short_window).mean()

        # Golden cross (bullish) / Death cross (bearish)
        self.df['ma_signal'] = np.where(
            self.df['sma_short'] > self.df['sma_long'], 1, -1
        )

        # Crossover detection
        self.df['crossover'] = self.df['ma_signal'].diff()

        return self.df[['sma_short', 'sma_long', 'ma_signal', 'crossover']]

    def rsi(self, period: int = 14) -> pd.Series:
        """Relative Strength Index: momentum oscillator (0-100)."""
        delta = self.df['close'].diff()
        gain = delta.clip(lower=0)
        loss = (-delta).clip(lower=0)

        avg_gain = gain.rolling(period).mean()
        avg_loss = loss.rolling(period).mean()

        rs = avg_gain / avg_loss.replace(0, np.nan)
        rsi = 100 - (100 / (1 + rs))

        self.df['rsi'] = rsi
        return rsi

    def bollinger_bands(self, period: int = 20, std_dev: float = 2.0):
        """Bollinger Bands: volatility-based envelope around price."""
        sma = self.df['close'].rolling(period).mean()
        std = self.df['close'].rolling(period).std()

        self.df['bb_upper'] = sma + std_dev * std
        self.df['bb_lower'] = sma - std_dev * std
        self.df['bb_middle'] = sma
        self.df['bb_width'] = (self.df['bb_upper'] - self.df['bb_lower']) / sma
        self.df['bb_position'] = (self.df['close'] - self.df['bb_lower']) / \
                                  (self.df['bb_upper'] - self.df['bb_lower'])

        return self.df[['bb_upper', 'bb_lower', 'bb_middle', 'bb_width', 'bb_position']]

    def macd(self, fast: int = 12, slow: int = 26, signal: int = 9):
        """MACD: trend-following momentum indicator."""
        ema_fast = self.df['close'].ewm(span=fast).mean()
        ema_slow = self.df['close'].ewm(span=slow).mean()

        self.df['macd'] = ema_fast - ema_slow
        self.df['macd_signal'] = self.df['macd'].ewm(span=signal).mean()
        self.df['macd_histogram'] = self.df['macd'] - self.df['macd_signal']

        return self.df[['macd', 'macd_signal', 'macd_histogram']]

    def volume_profile(self, window: int = 20):
        """Volume-weighted analysis."""
        self.df['vwap'] = (
            (self.df['close'] * self.df['volume']).rolling(window).sum() /
            self.df['volume'].rolling(window).sum()
        )
        self.df['volume_sma'] = self.df['volume'].rolling(window).mean()
        self.df['volume_ratio'] = self.df['volume'] / self.df['volume_sma']

        return self.df[['vwap', 'volume_sma', 'volume_ratio']]

    def generate_signals(self) -> pd.DataFrame:
        """Combine indicators into a composite signal."""
        self.moving_averages()
        self.rsi()
        self.bollinger_bands()
        self.macd()
        self.volume_profile()

        signals = pd.DataFrame(index=self.df.index)

        # Trend signal
        signals['trend'] = self.df['ma_signal']

        # Momentum signal
        signals['momentum'] = np.where(self.df['rsi'] < 30, 1,
                               np.where(self.df['rsi'] > 70, -1, 0))

        # Mean reversion signal
        signals['mean_reversion'] = np.where(self.df['bb_position'] < 0.1, 1,
                                    np.where(self.df['bb_position'] > 0.9, -1, 0))

        # MACD signal
        signals['macd_signal'] = np.where(self.df['macd_histogram'] > 0, 1, -1)

        # Volume confirmation
        signals['volume_confirm'] = np.where(self.df['volume_ratio'] > 1.5, 1, 0)

        # Composite
        signals['composite'] = (
            0.3 * signals['trend'] +
            0.2 * signals['momentum'] +
            0.2 * signals['mean_reversion'] +
            0.2 * signals['macd_signal'] +
            0.1 * signals['volume_confirm']
        )

        return signals

Fundamental Analysis

Intrinsic Value Estimation

class FundamentalAnalyzer:
    """Fundamental analysis for equity valuation."""

    def dcf_valuation(self, free_cash_flows: list, growth_rate: float,
                      terminal_growth: float, discount_rate: float,
                      shares_outstanding: int) -> dict:
        """
        Discounted Cash Flow valuation.
        Projects future cash flows and discounts to present value.
        """
        # Project cash flows
        projected = []
        last_fcf = free_cash_flows[-1]
        for year in range(1, 6):  # 5-year projection
            projected_fcf = last_fcf * (1 + growth_rate) ** year
            pv = projected_fcf / (1 + discount_rate) ** year
            projected.append({'year': year, 'fcf': projected_fcf, 'pv': pv})

        # Terminal value
        terminal_fcf = projected[-1]['fcf'] * (1 + terminal_growth)
        terminal_value = terminal_fcf / (discount_rate - terminal_growth)
        terminal_pv = terminal_value / (1 + discount_rate) ** 5

        # Enterprise value
        enterprise_value = sum(p['pv'] for p in projected) + terminal_pv

        # Equity value per share
        equity_per_share = enterprise_value / shares_outstanding

        return {
            'enterprise_value': enterprise_value,
            'equity_per_share': equity_per_share,
            'terminal_value_fraction': terminal_pv / enterprise_value,
            'projected_cash_flows': projected
        }

    def relative_valuation(self, target: dict, comparables: list) -> dict:
        """
        Relative valuation using comparable company multiples.
        """
        multiples = {
            'pe_ratio': [],
            'ev_ebitda': [],
            'ps_ratio': [],
            'pb_ratio': []
        }

        for comp in comparables:
            if comp.get('pe_ratio'):
                multiples['pe_ratio'].append(comp['pe_ratio'])
            if comp.get('ev_ebitda'):
                multiples['ev_ebitda'].append(comp['ev_ebitda'])
            if comp.get('ps_ratio'):
                multiples['ps_ratio'].append(comp['ps_ratio'])
            if comp.get('pb_ratio'):
                multiples['pb_ratio'].append(comp['pb_ratio'])

        implied_values = {}

        if multiples['pe_ratio'] and target.get('eps'):
            median_pe = np.median(multiples['pe_ratio'])
            implied_values['pe_implied'] = target['eps'] * median_pe

        if multiples['ev_ebitda'] and target.get('ebitda'):
            median_ev_ebitda = np.median(multiples['ev_ebitda'])
            implied_values['ev_ebitda_implied'] = target['ebitda'] * median_ev_ebitda

        if multiples['ps_ratio'] and target.get('revenue_per_share'):
            median_ps = np.median(multiples['ps_ratio'])
            implied_values['ps_implied'] = target['revenue_per_share'] * median_ps

        if implied_values:
            avg_implied = np.mean(list(implied_values.values()))
            return {
                'implied_values': implied_values,
                'average_implied_value': avg_implied,
                'current_price': target.get('current_price', 0),
                'upside': (avg_implied / target.get('current_price', avg_implied) - 1) * 100
            }

        return {'error': 'Insufficient data'}

Sentiment Analysis for Markets

class MarketSentimentAnalyzer:
    """Analyze market sentiment from multiple data sources."""

    def __init__(self):
        self.sentiment_sources = {}

    def fear_greed_index(self, market_data: dict) -> dict:
        """
        Compute a CNN-style Fear & Greed Index from market indicators.
        0 = Extreme Fear, 100 = Extreme Greed.
        """
        indicators = {}

        # 1. Market momentum (S&P vs 125-day MA)
        if 'sp500' in market_data and 'sp500_ma125' in market_data:
            ratio = market_data['sp500'] / market_data['sp500_ma125']
            indicators['momentum'] = min(100, max(0, (ratio - 0.95) / 0.10 * 100))

        # 2. VIX (Volatility Index)
        if 'vix' in market_data:
            vix = market_data['vix']
            indicators['volatility'] = min(100, max(0, (50 - vix) / 50 * 100))

        # 3. Put/Call ratio
        if 'put_call_ratio' in market_data:
            pcr = market_data['put_call_ratio']
            indicators['put_call'] = min(100, max(0, (1.2 - pcr) / 0.8 * 100))

        # 4. Junk bond demand (spread over treasuries)
        if 'high_yield_spread' in market_data:
            spread = market_data['high_yield_spread']
            indicators['junk_bond'] = min(100, max(0, (8 - spread) / 6 * 100))

        # 5. Market breadth (advance/decline)
        if 'advance_decline' in market_data:
            ad = market_data['advance_decline']
            indicators['breadth'] = min(100, max(0, (ad + 1) / 2 * 100))

        if not indicators:
            return {'error': 'Insufficient market data'}

        index = np.mean(list(indicators.values()))

        return {
            'fear_greed_index': index,
            'label': self._label_sentiment(index),
            'components': indicators,
            'contrarian_signal': 'buy' if index < 25 else 'sell' if index > 75 else 'neutral'
        }

    def _label_sentiment(self, index: float) -> str:
        if index < 20: return 'Extreme Fear'
        if index < 40: return 'Fear'
        if index < 60: return 'Neutral'
        if index < 80: return 'Greed'
        return 'Extreme Greed'

    def news_sentiment(self, headlines: list) -> dict:
        """Analyze sentiment from news headlines."""
        positive_words = {'surge', 'rally', 'gain', 'jump', 'soar', 'boom',
                         'record', 'bullish', 'upgrade', 'beat', 'strong'}
        negative_words = {'crash', 'plunge', 'drop', 'fall', 'sink', 'bear',
                         'recession', 'crisis', 'downgrade', 'miss', 'weak'}

        scores = []
        for headline in headlines:
            words = set(headline.lower().split())
            pos = len(words & positive_words)
            neg = len(words & negative_words)
            if pos + neg > 0:
                score = (pos - neg) / (pos + neg)
            else:
                score = 0
            scores.append(score)

        avg_sentiment = np.mean(scores) if scores else 0

        return {
            'average_sentiment': avg_sentiment,
            'label': 'positive' if avg_sentiment > 0.1 else 'negative' if avg_sentiment < -0.1 else 'neutral',
            'n_headlines': len(headlines),
            'extreme_negative_count': sum(1 for s in scores if s < -0.5),
            'extreme_positive_count': sum(1 for s in scores if s > 0.5)
        }

Volatility Forecasting (GARCH)

class GARCHModel:
    """
    GARCH(1,1) for volatility forecasting.
    Captures volatility clustering: large moves follow large moves.

    sigma²_t = omega + alpha * r²_{t-1} + beta * sigma²_{t-1}
    """

    def __init__(self, omega: float = 0.00001, alpha: float = 0.1,
                 beta: float = 0.85):
        self.omega = omega
        self.alpha = alpha
        self.beta = beta
        self.fitted = False

    def fit(self, returns: np.ndarray):
        """Fit GARCH(1,1) using maximum likelihood."""
        from scipy.optimize import minimize

        def neg_log_likelihood(params):
            omega, alpha, beta = params
            if omega <= 0 or alpha < 0 or beta < 0 or alpha + beta >= 1:
                return 1e10

            n = len(returns)
            sigma2 = np.zeros(n)
            sigma2[0] = np.var(returns)

            for t in range(1, n):
                sigma2[t] = omega + alpha * returns[t-1]**2 + beta * sigma2[t-1]
                sigma2[t] = max(sigma2[t], 1e-10)

            ll = -0.5 * np.sum(np.log(2 * np.pi * sigma2) + returns**2 / sigma2)
            return -ll

        result = minimize(
            neg_log_likelihood,
            x0=[self.omega, self.alpha, self.beta],
            method='Nelder-Mead'
        )

        self.omega, self.alpha, self.beta = result.x
        self.fitted = True
        self._returns = returns

        # Compute final conditional variance
        n = len(returns)
        sigma2 = np.zeros(n)
        sigma2[0] = np.var(returns)
        for t in range(1, n):
            sigma2[t] = self.omega + self.alpha * returns[t-1]**2 + self.beta * sigma2[t-1]

        self._sigma2 = sigma2
        return self

    def forecast_volatility(self, steps: int = 1) -> np.ndarray:
        """Forecast conditional volatility for future periods."""
        forecasts = np.zeros(steps)
        last_sigma2 = self._sigma2[-1]
        last_return2 = self._returns[-1]**2

        for h in range(steps):
            if h == 0:
                forecasts[h] = self.omega + self.alpha * last_return2 + self.beta * last_sigma2
            else:
                forecasts[h] = self.omega + (self.alpha + self.beta) * forecasts[h-1]

        return np.sqrt(forecasts)  # Return as volatility (std dev)

    def long_run_volatility(self) -> float:
        """Unconditional (long-run) volatility."""
        long_run_var = self.omega / (1 - self.alpha - self.beta)
        return np.sqrt(long_run_var)

    def half_life(self) -> float:
        """Half-life of volatility shocks (days to decay by 50%)."""
        persistence = self.alpha + self.beta
        if persistence >= 1:
            return float('inf')
        return np.log(0.5) / np.log(persistence)

Options-Implied Probabilities

class OptionsImpliedProbability:
    """Extract probability distributions from option prices."""

    @staticmethod
    def implied_move(atm_straddle_price: float, stock_price: float,
                     days_to_expiry: int) -> dict:
        """
        The at-the-money straddle price implies the expected move.
        """
        implied_move_pct = atm_straddle_price / stock_price
        annualized_vol = implied_move_pct * np.sqrt(252 / days_to_expiry)

        return {
            'implied_move_pct': implied_move_pct * 100,
            'implied_move_dollars': atm_straddle_price,
            'implied_range': (
                stock_price * (1 - implied_move_pct),
                stock_price * (1 + implied_move_pct)
            ),
            'annualized_vol': annualized_vol * 100,
            'one_std_dev_range': (
                stock_price * (1 - annualized_vol * np.sqrt(days_to_expiry/252)),
                stock_price * (1 + annualized_vol * np.sqrt(days_to_expiry/252))
            )
        }

    @staticmethod
    def probability_above(current_price: float, strike: float,
                          implied_vol: float, days: int,
                          risk_free_rate: float = 0.05) -> float:
        """
        Probability that price exceeds a given strike at expiration
        (using Black-Scholes N(d2)).
        """
        from scipy.stats import norm

        T = days / 252
        d2 = (np.log(current_price / strike) +
              (risk_free_rate - 0.5 * implied_vol**2) * T) / \
             (implied_vol * np.sqrt(T))

        return norm.cdf(d2)

    @staticmethod
    def risk_neutral_density(strikes: np.ndarray, call_prices: np.ndarray,
                             risk_free_rate: float, T: float) -> dict:
        """
        Extract risk-neutral probability density from option prices
        using Breeden-Litzenberger formula.
        """
        # Second derivative of call price w.r.t. strike = risk-neutral density
        dK = np.diff(strikes)
        dC = np.diff(call_prices)
        first_deriv = dC / dK

        d2K = (dK[:-1] + dK[1:]) / 2
        d2C = np.diff(first_deriv)
        density = np.exp(risk_free_rate * T) * d2C / d2K

        mid_strikes = strikes[1:-1]

        return {
            'strikes': mid_strikes,
            'density': density,
            'mean': np.sum(mid_strikes * np.abs(density)) / np.sum(np.abs(density)),
            'std': np.sqrt(
                np.sum(mid_strikes**2 * np.abs(density)) / np.sum(np.abs(density)) -
                (np.sum(mid_strikes * np.abs(density)) / np.sum(np.abs(density)))**2
            )
        }

Machine Learning for Alpha Generation

class MLAlphaModel:
    """Machine learning pipeline for financial prediction."""

    def __init__(self):
        self.feature_generators = []
        self.model = None

    def generate_features(self, df: pd.DataFrame) -> pd.DataFrame:
        """Generate predictive features from OHLCV data."""
        features = pd.DataFrame(index=df.index)

        # Returns
        for period in [1, 5, 10, 20, 60]:
            features[f'return_{period}d'] = df['close'].pct_change(period)

        # Volatility
        for period in [5, 10, 20]:
            features[f'volatility_{period}d'] = df['close'].pct_change().rolling(period).std()

        # Volume features
        features['volume_ratio_20d'] = df['volume'] / df['volume'].rolling(20).mean()
        features['volume_trend'] = df['volume'].rolling(5).mean() / df['volume'].rolling(20).mean()

        # Price relative to moving averages
        for period in [10, 20, 50, 200]:
            features[f'price_vs_ma{period}'] = df['close'] / df['close'].rolling(period).mean() - 1

        # RSI
        delta = df['close'].diff()
        gain = delta.clip(lower=0).rolling(14).mean()
        loss = (-delta.clip(upper=0)).rolling(14).mean()
        features['rsi_14'] = 100 - (100 / (1 + gain / loss.replace(0, np.nan)))

        # Bollinger band position
        sma20 = df['close'].rolling(20).mean()
        std20 = df['close'].rolling(20).std()
        features['bb_position'] = (df['close'] - sma20) / (2 * std20)

        # Day of week, month
        if isinstance(df.index, pd.DatetimeIndex):
            features['day_of_week'] = df.index.dayofweek
            features['month'] = df.index.month

        return features.dropna()

    def walk_forward_backtest(self, features: pd.DataFrame, target: pd.Series,
                               train_size: int = 252, step_size: int = 21) -> dict:
        """
        Walk-forward validation: train on past, predict future, step forward.
        Prevents look-ahead bias.
        """
        from sklearn.ensemble import GradientBoostingClassifier
        from sklearn.metrics import accuracy_score, roc_auc_score

        results = []
        n = len(features)

        for start in range(train_size, n - step_size, step_size):
            X_train = features.iloc[:start]
            y_train = target.iloc[:start]
            X_test = features.iloc[start:start + step_size]
            y_test = target.iloc[start:start + step_size]

            model = GradientBoostingClassifier(
                n_estimators=100, max_depth=3, learning_rate=0.1
            )
            model.fit(X_train, y_train)

            predictions = model.predict(X_test)
            probabilities = model.predict_proba(X_test)[:, 1]

            accuracy = accuracy_score(y_test, predictions)
            try:
                auc = roc_auc_score(y_test, probabilities)
            except ValueError:
                auc = 0.5

            results.append({
                'period_start': features.index[start],
                'accuracy': accuracy,
                'auc': auc,
                'n_predictions': len(y_test)
            })

        return {
            'mean_accuracy': np.mean([r['accuracy'] for r in results]),
            'mean_auc': np.mean([r['auc'] for r in results]),
            'n_periods': len(results),
            'results': results
        }

Key Takeaways

  1. Technical analysis identifies patterns in price and volume; it works best as a timing tool on top of fundamental views
  2. Fundamental analysis (DCF, relative valuation) provides the "what to buy" while technicals provide the "when to buy"
  3. Sentiment indicators (VIX, put/call ratio, Fear & Greed) are most valuable as contrarian signals at extremes
  4. GARCH models capture volatility clustering and provide calibrated volatility forecasts essential for risk management
  5. Options prices contain the market's probability distribution for future prices; the Breeden-Litzenberger formula extracts it
  6. Machine learning for financial prediction requires walk-forward validation to avoid look-ahead bias; random train/test splits are invalid
  7. The most robust predictive features tend to be simple: momentum, mean reversion, volume, and volatility, not complex patterns
  8. Transaction costs, slippage, and market impact are the reality check; many "profitable" strategies evaporate after costs

Install this skill directly: skilldb add prediction-skills

Get CLI access →