Skip to main content
Technology & EngineeringEvent Sourcing186 lines

Event Versioning

Evolve event schemas safely over time using upcasting, weak schema strategies, and version negotiation

Quick Summary17 lines
You are an expert in event versioning and schema evolution for building event-sourced systems.

## Key Points

- **Never modify stored events.** The event log is immutable. All transformations happen at read time through upcasting.
- **Start with weak schemas and escalate to upcasters when needed.** Adding optional fields with defaults is the simplest evolution path.
- **Version every event type from the start.** Even if you do not need it yet, including a `version` field in the event envelope costs nothing and avoids retrofitting later.
- **Keep upcasters simple and composable.** Each upcaster transforms exactly one version to the next. Chain them to handle multi-step jumps.
- **Store the schema version in the event envelope**, not in the payload. This makes the version available before deserialization.
- **Maintain a schema registry or changelog** that documents every version of every event type, what changed, and when.
- **Mutating events in the store.** Running UPDATE queries against the event table destroys the audit trail and risks data corruption.
- **Skipping version numbers.** If you have v1 and v3 but no v2 upcaster, events written as v2 cannot be read. Keep the chain continuous.
- **Putting business logic in upcasters.** Upcasters should perform mechanical data transformations (renaming fields, providing defaults), not recalculate domain values.
- **Ignoring downstream consumers.** When event schemas change, all projections and subscribers must also be updated or tested for compatibility.
- **Not testing with historical events.** Unit tests should replay real v1 events through the upcaster chain and assert the resulting shape matches the current version.
skilldb get event-sourcing-skills/Event VersioningFull skill: 186 lines
Paste into your CLAUDE.md or agent config

Event Schema Evolution — Event Sourcing

You are an expert in event versioning and schema evolution for building event-sourced systems.

Core Philosophy

Overview

In an event-sourced system, events are immutable and stored forever. As the domain evolves, the shape of events must also change — new fields are added, old fields become irrelevant, and events may be split or merged. Event versioning provides strategies to evolve schemas without breaking existing consumers or corrupting the historical event log.

Core Concepts

Event Version: A numeric or semantic version associated with each event type (e.g., OrderPlaced_v1, OrderPlaced_v2). The version indicates the schema of the event's payload.

Upcasting: The process of transforming an older event version into a newer one at read time. The stored bytes are never modified — instead, a chain of upcasters converts old shapes into the current shape when the event is loaded.

Weak Schema: A strategy where consumers tolerate missing or extra fields. New fields have defaults; removed fields are ignored. This delays the need for explicit versioning.

Strong Schema: A strategy where event schemas are explicitly versioned and validated. Consumers know exactly which version they are handling.

Copy-and-Transform Migration: For extreme schema changes, a new stream is written by replaying and transforming the old stream. The old stream is archived.

Implementation Patterns

Upcaster Chain (Python)

class Upcaster:
    """Transforms events from one version to the next."""
    def can_upcast(self, event_type: str, version: int) -> bool:
        raise NotImplementedError

    def upcast(self, event: dict) -> dict:
        raise NotImplementedError


class OrderPlacedV1ToV2(Upcaster):
    """v1 had 'amount'; v2 splits it into 'subtotal' and 'tax'."""

    def can_upcast(self, event_type: str, version: int) -> bool:
        return event_type == "OrderPlaced" and version == 1

    def upcast(self, event: dict) -> dict:
        data = event["data"].copy()
        amount = data.pop("amount", 0)
        data["subtotal"] = amount
        data["tax"] = 0  # Tax was not tracked in v1
        return {**event, "data": data, "version": 2}


class OrderPlacedV2ToV3(Upcaster):
    """v2 had no currency; v3 adds a 'currency' field defaulting to USD."""

    def can_upcast(self, event_type: str, version: int) -> bool:
        return event_type == "OrderPlaced" and version == 2

    def upcast(self, event: dict) -> dict:
        data = {**event["data"], "currency": "USD"}
        return {**event, "data": data, "version": 3}


class UpcasterPipeline:
    def __init__(self, upcasters: list[Upcaster]):
        self._upcasters = upcasters

    def upcast(self, event: dict) -> dict:
        changed = True
        while changed:
            changed = False
            for upcaster in self._upcasters:
                if upcaster.can_upcast(event.get("event_type"), event.get("version", 1)):
                    event = upcaster.upcast(event)
                    changed = True
        return event

# Usage
pipeline = UpcasterPipeline([OrderPlacedV1ToV2(), OrderPlacedV2ToV3()])
raw_event = {"event_type": "OrderPlaced", "version": 1, "data": {"amount": 100}}
current_event = pipeline.upcast(raw_event)
# current_event version is now 3 with subtotal, tax, and currency fields

Weak Schema with Defaults (TypeScript)

interface OrderPlaced {
  orderId: string;
  customerId: string;
  subtotal: number;
  tax: number;
  currency: string;
}

function deserializeOrderPlaced(raw: Record<string, unknown>): OrderPlaced {
  return {
    orderId: raw.orderId as string,
    customerId: raw.customerId as string,
    // Handle v1 shape where only 'amount' existed
    subtotal: (raw.subtotal as number) ?? (raw.amount as number) ?? 0,
    tax: (raw.tax as number) ?? 0,
    // Added in v3
    currency: (raw.currency as string) ?? "USD",
  };
}

Versioned Event Type Registry

class EventRegistry:
    def __init__(self):
        self._types: dict[tuple[str, int], type] = {}
        self._upcasters = UpcasterPipeline([])

    def register(self, event_type: str, version: int, cls: type):
        self._types[(event_type, version)] = cls

    def deserialize(self, raw: dict) -> object:
        # First upcast to the latest version
        upcasted = self._upcasters.upcast(raw)
        key = (upcasted["event_type"], upcasted["version"])
        cls = self._types.get(key)
        if not cls:
            raise UnknownEventError(f"No class for {key}")
        return cls(**upcasted["data"])

Schema Validation at Write Time

import jsonschema

EVENT_SCHEMAS = {
    ("OrderPlaced", 3): {
        "type": "object",
        "required": ["orderId", "customerId", "subtotal", "tax", "currency"],
        "properties": {
            "orderId": {"type": "string"},
            "customerId": {"type": "string"},
            "subtotal": {"type": "number"},
            "tax": {"type": "number"},
            "currency": {"type": "string", "minLength": 3, "maxLength": 3},
        },
        "additionalProperties": False,
    }
}

def validate_event(event_type: str, version: int, data: dict) -> None:
    schema = EVENT_SCHEMAS.get((event_type, version))
    if schema:
        jsonschema.validate(data, schema)

Best Practices

  • Never modify stored events. The event log is immutable. All transformations happen at read time through upcasting.
  • Start with weak schemas and escalate to upcasters when needed. Adding optional fields with defaults is the simplest evolution path.
  • Version every event type from the start. Even if you do not need it yet, including a version field in the event envelope costs nothing and avoids retrofitting later.
  • Keep upcasters simple and composable. Each upcaster transforms exactly one version to the next. Chain them to handle multi-step jumps.
  • Store the schema version in the event envelope, not in the payload. This makes the version available before deserialization.
  • Maintain a schema registry or changelog that documents every version of every event type, what changed, and when.

Common Pitfalls

  • Mutating events in the store. Running UPDATE queries against the event table destroys the audit trail and risks data corruption.
  • Skipping version numbers. If you have v1 and v3 but no v2 upcaster, events written as v2 cannot be read. Keep the chain continuous.
  • Putting business logic in upcasters. Upcasters should perform mechanical data transformations (renaming fields, providing defaults), not recalculate domain values.
  • Ignoring downstream consumers. When event schemas change, all projections and subscribers must also be updated or tested for compatibility.
  • Not testing with historical events. Unit tests should replay real v1 events through the upcaster chain and assert the resulting shape matches the current version.

Anti-Patterns

Over-engineering for hypothetical scale. Building for millions of users when you have hundreds adds complexity without value. Solve today's problems first.

Ignoring the existing ecosystem. Reinventing functionality that mature libraries already provide well wastes time and introduces unnecessary risk.

Premature abstraction. Creating elaborate frameworks and utilities before you have enough concrete cases to know what the abstraction should look like produces the wrong abstraction.

Neglecting error handling at boundaries. Internal code can trust its inputs, but system boundaries (user input, APIs, file I/O) require defensive validation.

Skipping documentation for obvious code. What is obvious to you today will not be obvious to your colleague next month or to you next year.

Install this skill directly: skilldb add event-sourcing-skills

Get CLI access →