Eventual Consistency
Handle eventual consistency challenges in distributed event-sourced systems with practical patterns and strategies
You are an expert in eventual consistency patterns for building event-sourced systems. ## Key Points - **Make the consistency window visible to users.** Show "pending" or "processing" states rather than hiding the delay. Users understand "Your order is being confirmed" better than a stale page. - **Use consistency tokens (positions) for read-your-own-writes.** This is the lightest mechanism that solves the most common UX problem. - **Design idempotent projections so reprocessing events is safe.** At-least-once delivery is the norm in distributed systems. - **Set SLAs on propagation delay.** Monitor projection lag and alert when it exceeds acceptable bounds (e.g., 2 seconds for user-facing views, 30 seconds for analytics). - **Accept that cross-aggregate queries may be stale.** Design business processes to tolerate this, using sagas to coordinate when strong ordering is needed. - **Document which read models are eventually consistent** and which are strongly consistent so that API consumers set correct expectations. - **Treating eventual consistency as a defect.** Fighting it with synchronous coupling or distributed transactions reintroduces the scaling problems CQRS was meant to solve. - **Assuming read models are immediately up-to-date after a write.** This is the most common source of bugs in event-sourced UIs. Always account for the delay. - **Using unbounded waits for consistency.** Polling without a timeout can hang requests. Always set a deadline and fall back to returning stale data with a flag. - **Ignoring ordering guarantees.** If events within a stream are processed out of order, projections will be incorrect. Ensure per-stream ordering in your event consumer.
skilldb get event-sourcing-skills/Eventual ConsistencyFull skill: 197 linesEventual Consistency Handling — Event Sourcing
You are an expert in eventual consistency patterns for building event-sourced systems.
Core Philosophy
Overview
In an event-sourced, CQRS-based system, the write side and read side are eventually consistent. After a command produces new events, projections must process those events before queries reflect the change. This delay — often milliseconds, sometimes seconds — requires deliberate design in the UI, API, and infrastructure layers. Eventual consistency is not a bug to work around; it is a fundamental property to design for.
Core Concepts
Consistency Boundary: The scope within which strong consistency is guaranteed. In event sourcing, a single aggregate is the consistency boundary. Anything spanning multiple aggregates or services is eventually consistent.
Read-Your-Own-Writes: A guarantee that after a user performs a write, their subsequent reads will reflect that write. This does not require global strong consistency — only that the specific user sees their own changes.
Causal Consistency: Events that are causally related are processed in order. If event B was caused by event A, every consumer sees A before B.
Convergence: All read models will eventually converge to the same state given the same set of events, regardless of the order they are processed (assuming idempotent handlers).
Propagation Delay: The time between an event being committed to the event store and a projection reflecting it. This is the "window of inconsistency."
Implementation Patterns
Polling for Read-Your-Own-Writes
class ConsistencyAwareQueryHandler:
"""Waits for a projection to catch up to a known position before querying."""
def __init__(self, read_db, checkpoint_store):
self._db = read_db
self._checkpoints = checkpoint_store
def query(self, query, min_position: int | None = None, timeout: float = 2.0):
"""
Execute a query, optionally waiting until the projection has caught up
to min_position (returned from the write side after appending events).
"""
if min_position is not None:
deadline = time.time() + timeout
while time.time() < deadline:
current = self._checkpoints.get("OrderSummaryProjector") or 0
if current >= min_position:
break
time.sleep(0.05)
else:
# Timed out — return potentially stale data with a warning
pass
return self._db.execute(query)
Returning Write Position from Command Handler
class PlaceOrderHandler:
def handle(self, command: PlaceOrder) -> CommandResult:
order = Order.create(command.order_id)
order.place(customer_id=command.customer_id, items=command.items)
position = self._store.append(
stream_id=f"order-{command.order_id}",
events=order.pending_events,
expected_version=0
)
# Return the global position so the caller can do read-your-own-writes
return CommandResult(
aggregate_id=command.order_id,
global_position=position
)
API-Level Consistency Token
from fastapi import FastAPI, Response, Request
app = FastAPI()
@app.post("/orders")
def place_order(body: PlaceOrderRequest, response: Response):
result = command_bus.dispatch(PlaceOrder(
order_id=body.order_id,
customer_id=body.customer_id,
items=body.items
))
# Return the consistency token as a header
response.headers["X-Consistency-Position"] = str(result.global_position)
return {"order_id": body.order_id, "status": "accepted"}
@app.get("/orders/{order_id}")
def get_order(order_id: str, request: Request):
# Client passes the token from a previous write
min_position = request.headers.get("X-Consistency-Position")
if min_position:
min_position = int(min_position)
result = query_handler.query(
GetOrderSummary(order_id=order_id),
min_position=min_position
)
return result
Optimistic UI Pattern (Frontend)
async function placeOrder(order: OrderRequest): Promise<void> {
// 1. Optimistically update the local UI state
localStore.addOrder({
...order,
status: "placed",
isPending: true,
});
// 2. Send the command to the server
const response = await fetch("/orders", {
method: "POST",
body: JSON.stringify(order),
});
const consistencyPosition = response.headers.get("X-Consistency-Position");
// 3. Poll the read model until it catches up
let attempts = 0;
while (attempts < 20) {
const orderData = await fetch(`/orders/${order.orderId}`, {
headers: { "X-Consistency-Position": consistencyPosition ?? "" },
});
if (orderData.ok) {
const confirmed = await orderData.json();
localStore.updateOrder(order.orderId, { ...confirmed, isPending: false });
return;
}
await new Promise((r) => setTimeout(r, 200));
attempts++;
}
}
Subscription-Based Consistency (WebSocket)
class EventNotifier:
"""Pushes projection updates to connected clients via WebSocket."""
def __init__(self):
self._subscribers: dict[str, list] = {}
def subscribe(self, entity_id: str, websocket):
self._subscribers.setdefault(entity_id, []).append(websocket)
async def notify(self, entity_id: str, updated_data: dict):
for ws in self._subscribers.get(entity_id, []):
await ws.send_json({"type": "updated", "data": updated_data})
# In the projection runner, after updating a read model:
async def on_event_projected(event, read_model_row):
entity_id = event["data"]["order_id"]
await notifier.notify(entity_id, read_model_row)
Best Practices
- Make the consistency window visible to users. Show "pending" or "processing" states rather than hiding the delay. Users understand "Your order is being confirmed" better than a stale page.
- Use consistency tokens (positions) for read-your-own-writes. This is the lightest mechanism that solves the most common UX problem.
- Design idempotent projections so reprocessing events is safe. At-least-once delivery is the norm in distributed systems.
- Set SLAs on propagation delay. Monitor projection lag and alert when it exceeds acceptable bounds (e.g., 2 seconds for user-facing views, 30 seconds for analytics).
- Accept that cross-aggregate queries may be stale. Design business processes to tolerate this, using sagas to coordinate when strong ordering is needed.
- Document which read models are eventually consistent and which are strongly consistent so that API consumers set correct expectations.
Common Pitfalls
- Treating eventual consistency as a defect. Fighting it with synchronous coupling or distributed transactions reintroduces the scaling problems CQRS was meant to solve.
- Assuming read models are immediately up-to-date after a write. This is the most common source of bugs in event-sourced UIs. Always account for the delay.
- Using unbounded waits for consistency. Polling without a timeout can hang requests. Always set a deadline and fall back to returning stale data with a flag.
- Ignoring ordering guarantees. If events within a stream are processed out of order, projections will be incorrect. Ensure per-stream ordering in your event consumer.
- Compensating at the wrong level. Eventual consistency issues in the UI should be solved in the UI (optimistic updates, polling). Do not add synchronous coupling to the backend to fix a frontend display problem.
Anti-Patterns
Over-engineering for hypothetical scale. Building for millions of users when you have hundreds adds complexity without value. Solve today's problems first.
Ignoring the existing ecosystem. Reinventing functionality that mature libraries already provide well wastes time and introduces unnecessary risk.
Premature abstraction. Creating elaborate frameworks and utilities before you have enough concrete cases to know what the abstraction should look like produces the wrong abstraction.
Neglecting error handling at boundaries. Internal code can trust its inputs, but system boundaries (user input, APIs, file I/O) require defensive validation.
Skipping documentation for obvious code. What is obvious to you today will not be obvious to your colleague next month or to you next year.
Install this skill directly: skilldb add event-sourcing-skills
Related Skills
Cqrs
Implement Command Query Responsibility Segregation to separate write and read models in event-sourced architectures
Event Store
Design and implement event stores for persisting domain events with append-only semantics and optimistic concurrency
Event Versioning
Evolve event schemas safely over time using upcasting, weak schema strategies, and version negotiation
Projections
Build and maintain read model projections from event streams for optimized query performance
Sagas
Coordinate long-running business processes across aggregates and services using sagas and process managers
Snapshots
Optimize aggregate loading with snapshot patterns to avoid replaying long event streams from the beginning