Event Versioning
Evolve event schemas safely over time using upcasting, weak schema strategies, and version negotiation
You are an expert in event versioning and schema evolution for building event-sourced systems. ## Key Points - **Never modify stored events.** The event log is immutable. All transformations happen at read time through upcasting. - **Start with weak schemas and escalate to upcasters when needed.** Adding optional fields with defaults is the simplest evolution path. - **Version every event type from the start.** Even if you do not need it yet, including a `version` field in the event envelope costs nothing and avoids retrofitting later. - **Keep upcasters simple and composable.** Each upcaster transforms exactly one version to the next. Chain them to handle multi-step jumps. - **Store the schema version in the event envelope**, not in the payload. This makes the version available before deserialization. - **Maintain a schema registry or changelog** that documents every version of every event type, what changed, and when. - **Mutating events in the store.** Running UPDATE queries against the event table destroys the audit trail and risks data corruption. - **Skipping version numbers.** If you have v1 and v3 but no v2 upcaster, events written as v2 cannot be read. Keep the chain continuous. - **Putting business logic in upcasters.** Upcasters should perform mechanical data transformations (renaming fields, providing defaults), not recalculate domain values. - **Ignoring downstream consumers.** When event schemas change, all projections and subscribers must also be updated or tested for compatibility. - **Not testing with historical events.** Unit tests should replay real v1 events through the upcaster chain and assert the resulting shape matches the current version.
skilldb get event-sourcing-skills/Event VersioningFull skill: 186 linesEvent Schema Evolution — Event Sourcing
You are an expert in event versioning and schema evolution for building event-sourced systems.
Core Philosophy
Overview
In an event-sourced system, events are immutable and stored forever. As the domain evolves, the shape of events must also change — new fields are added, old fields become irrelevant, and events may be split or merged. Event versioning provides strategies to evolve schemas without breaking existing consumers or corrupting the historical event log.
Core Concepts
Event Version: A numeric or semantic version associated with each event type (e.g., OrderPlaced_v1, OrderPlaced_v2). The version indicates the schema of the event's payload.
Upcasting: The process of transforming an older event version into a newer one at read time. The stored bytes are never modified — instead, a chain of upcasters converts old shapes into the current shape when the event is loaded.
Weak Schema: A strategy where consumers tolerate missing or extra fields. New fields have defaults; removed fields are ignored. This delays the need for explicit versioning.
Strong Schema: A strategy where event schemas are explicitly versioned and validated. Consumers know exactly which version they are handling.
Copy-and-Transform Migration: For extreme schema changes, a new stream is written by replaying and transforming the old stream. The old stream is archived.
Implementation Patterns
Upcaster Chain (Python)
class Upcaster:
"""Transforms events from one version to the next."""
def can_upcast(self, event_type: str, version: int) -> bool:
raise NotImplementedError
def upcast(self, event: dict) -> dict:
raise NotImplementedError
class OrderPlacedV1ToV2(Upcaster):
"""v1 had 'amount'; v2 splits it into 'subtotal' and 'tax'."""
def can_upcast(self, event_type: str, version: int) -> bool:
return event_type == "OrderPlaced" and version == 1
def upcast(self, event: dict) -> dict:
data = event["data"].copy()
amount = data.pop("amount", 0)
data["subtotal"] = amount
data["tax"] = 0 # Tax was not tracked in v1
return {**event, "data": data, "version": 2}
class OrderPlacedV2ToV3(Upcaster):
"""v2 had no currency; v3 adds a 'currency' field defaulting to USD."""
def can_upcast(self, event_type: str, version: int) -> bool:
return event_type == "OrderPlaced" and version == 2
def upcast(self, event: dict) -> dict:
data = {**event["data"], "currency": "USD"}
return {**event, "data": data, "version": 3}
class UpcasterPipeline:
def __init__(self, upcasters: list[Upcaster]):
self._upcasters = upcasters
def upcast(self, event: dict) -> dict:
changed = True
while changed:
changed = False
for upcaster in self._upcasters:
if upcaster.can_upcast(event.get("event_type"), event.get("version", 1)):
event = upcaster.upcast(event)
changed = True
return event
# Usage
pipeline = UpcasterPipeline([OrderPlacedV1ToV2(), OrderPlacedV2ToV3()])
raw_event = {"event_type": "OrderPlaced", "version": 1, "data": {"amount": 100}}
current_event = pipeline.upcast(raw_event)
# current_event version is now 3 with subtotal, tax, and currency fields
Weak Schema with Defaults (TypeScript)
interface OrderPlaced {
orderId: string;
customerId: string;
subtotal: number;
tax: number;
currency: string;
}
function deserializeOrderPlaced(raw: Record<string, unknown>): OrderPlaced {
return {
orderId: raw.orderId as string,
customerId: raw.customerId as string,
// Handle v1 shape where only 'amount' existed
subtotal: (raw.subtotal as number) ?? (raw.amount as number) ?? 0,
tax: (raw.tax as number) ?? 0,
// Added in v3
currency: (raw.currency as string) ?? "USD",
};
}
Versioned Event Type Registry
class EventRegistry:
def __init__(self):
self._types: dict[tuple[str, int], type] = {}
self._upcasters = UpcasterPipeline([])
def register(self, event_type: str, version: int, cls: type):
self._types[(event_type, version)] = cls
def deserialize(self, raw: dict) -> object:
# First upcast to the latest version
upcasted = self._upcasters.upcast(raw)
key = (upcasted["event_type"], upcasted["version"])
cls = self._types.get(key)
if not cls:
raise UnknownEventError(f"No class for {key}")
return cls(**upcasted["data"])
Schema Validation at Write Time
import jsonschema
EVENT_SCHEMAS = {
("OrderPlaced", 3): {
"type": "object",
"required": ["orderId", "customerId", "subtotal", "tax", "currency"],
"properties": {
"orderId": {"type": "string"},
"customerId": {"type": "string"},
"subtotal": {"type": "number"},
"tax": {"type": "number"},
"currency": {"type": "string", "minLength": 3, "maxLength": 3},
},
"additionalProperties": False,
}
}
def validate_event(event_type: str, version: int, data: dict) -> None:
schema = EVENT_SCHEMAS.get((event_type, version))
if schema:
jsonschema.validate(data, schema)
Best Practices
- Never modify stored events. The event log is immutable. All transformations happen at read time through upcasting.
- Start with weak schemas and escalate to upcasters when needed. Adding optional fields with defaults is the simplest evolution path.
- Version every event type from the start. Even if you do not need it yet, including a
versionfield in the event envelope costs nothing and avoids retrofitting later. - Keep upcasters simple and composable. Each upcaster transforms exactly one version to the next. Chain them to handle multi-step jumps.
- Store the schema version in the event envelope, not in the payload. This makes the version available before deserialization.
- Maintain a schema registry or changelog that documents every version of every event type, what changed, and when.
Common Pitfalls
- Mutating events in the store. Running UPDATE queries against the event table destroys the audit trail and risks data corruption.
- Skipping version numbers. If you have v1 and v3 but no v2 upcaster, events written as v2 cannot be read. Keep the chain continuous.
- Putting business logic in upcasters. Upcasters should perform mechanical data transformations (renaming fields, providing defaults), not recalculate domain values.
- Ignoring downstream consumers. When event schemas change, all projections and subscribers must also be updated or tested for compatibility.
- Not testing with historical events. Unit tests should replay real v1 events through the upcaster chain and assert the resulting shape matches the current version.
Anti-Patterns
Over-engineering for hypothetical scale. Building for millions of users when you have hundreds adds complexity without value. Solve today's problems first.
Ignoring the existing ecosystem. Reinventing functionality that mature libraries already provide well wastes time and introduces unnecessary risk.
Premature abstraction. Creating elaborate frameworks and utilities before you have enough concrete cases to know what the abstraction should look like produces the wrong abstraction.
Neglecting error handling at boundaries. Internal code can trust its inputs, but system boundaries (user input, APIs, file I/O) require defensive validation.
Skipping documentation for obvious code. What is obvious to you today will not be obvious to your colleague next month or to you next year.
Install this skill directly: skilldb add event-sourcing-skills
Related Skills
Cqrs
Implement Command Query Responsibility Segregation to separate write and read models in event-sourced architectures
Event Store
Design and implement event stores for persisting domain events with append-only semantics and optimistic concurrency
Eventual Consistency
Handle eventual consistency challenges in distributed event-sourced systems with practical patterns and strategies
Projections
Build and maintain read model projections from event streams for optimized query performance
Sagas
Coordinate long-running business processes across aggregates and services using sagas and process managers
Snapshots
Optimize aggregate loading with snapshot patterns to avoid replaying long event streams from the beginning