Distributed Caching
Distributed caching strategies for reducing latency and database load in large-scale systems
You are an expert in Distributed Caching for designing scalable distributed systems. ## Key Points - **L1 (Local/In-Process)**: Fastest, no network hop, but per-instance and limited in size. Good for hot, rarely changing data. - **L2 (Distributed)**: Shared across instances, survives restarts, larger capacity. Redis or Memcached. - **L3 (Database)**: Source of truth, slowest, most durable. - **Cache-Aside (Lazy Loading)**: Application checks cache first; on miss, reads from DB and populates cache. Most common pattern. - **Write-Through**: Every write goes to the cache and the database simultaneously. Cache is always consistent but writes are slower. - **Write-Behind (Write-Back)**: Writes go to the cache first, then asynchronously flushed to the database. Fast writes but risk of data loss. - **Read-Through**: Cache itself fetches from the database on a miss, abstracting the load logic from the application. - **Refresh-Ahead**: Proactively reload cache entries before they expire, based on predicted access patterns. - LRU (Least Recently Used): Default for most systems. Evicts the entry not accessed for the longest time. - LFU (Least Frequently Used): Evicts entries with the fewest accesses. Better for skewed access patterns. - TTL (Time-To-Live): Entries expire after a fixed duration regardless of access. - **Locking**: Only one request fetches from DB; others wait.
skilldb get system-design-skills/Distributed CachingFull skill: 110 linesDistributed Caching — System Design
You are an expert in Distributed Caching for designing scalable distributed systems.
Core Philosophy
Caching is not a performance optimization bolted on at the end — it is an architectural decision that shapes how data flows through the system. Every cache entry represents a contract: the system accepts slightly stale data in exchange for dramatically lower latency and reduced backend load. Making that contract explicit — what staleness is acceptable, for whom, and for how long — is the foundation of a sound caching strategy.
The hardest problem in caching is not storing data; it is knowing when to invalidate it. Cache invalidation is difficult because it requires understanding the relationship between cached data and the source-of-truth mutations that affect it. The simplest approach that works — TTL-based expiration with cache-aside reads — should be the starting point. More sophisticated invalidation (event-driven, write-through) should be adopted only when TTL-based staleness is provably insufficient.
A cache should always be a performance layer, never a durability layer. If losing the cache causes data loss, the architecture is wrong. The system must function correctly (if slowly) with an empty cache. This property also enables operational simplicity: caches can be flushed, resized, or restarted without data integrity concerns.
Overview
Distributed caching places frequently accessed data in fast, in-memory stores (Redis, Memcached) shared across service instances. It reduces database load, lowers response latency, and absorbs traffic spikes. Effective caching strategy is often the difference between a system that scales and one that collapses under load.
Core Concepts
Cache Hierarchy
[Client] --> [CDN Cache] --> [API Gateway Cache] --> [Application]
|
[L1: Local Cache]
|
[L2: Distributed Cache (Redis)]
|
[L3: Database]
- L1 (Local/In-Process): Fastest, no network hop, but per-instance and limited in size. Good for hot, rarely changing data.
- L2 (Distributed): Shared across instances, survives restarts, larger capacity. Redis or Memcached.
- L3 (Database): Source of truth, slowest, most durable.
Cache Strategies
- Cache-Aside (Lazy Loading): Application checks cache first; on miss, reads from DB and populates cache. Most common pattern.
- Write-Through: Every write goes to the cache and the database simultaneously. Cache is always consistent but writes are slower.
- Write-Behind (Write-Back): Writes go to the cache first, then asynchronously flushed to the database. Fast writes but risk of data loss.
- Read-Through: Cache itself fetches from the database on a miss, abstracting the load logic from the application.
- Refresh-Ahead: Proactively reload cache entries before they expire, based on predicted access patterns.
Eviction Policies
- LRU (Least Recently Used): Default for most systems. Evicts the entry not accessed for the longest time.
- LFU (Least Frequently Used): Evicts entries with the fewest accesses. Better for skewed access patterns.
- TTL (Time-To-Live): Entries expire after a fixed duration regardless of access.
Implementation Patterns
Cache Key Design
Use structured, namespaced keys: user:{id}:profile, product:{id}:details:v2. Include version numbers so schema changes can coexist. Avoid overly broad keys that serialize unrelated invalidations.
Consistent Hashing for Sharding
Distribute keys across multiple cache nodes using consistent hashing. When a node is added or removed, only a fraction of keys need to migrate, minimizing cache miss storms.
Cache Stampede Prevention
When a popular key expires, many concurrent requests may hit the database simultaneously. Mitigations:
- Locking: Only one request fetches from DB; others wait.
- Probabilistic early expiration: Each reader has a small chance of refreshing the entry before TTL expires.
- Background refresh: A separate process refreshes hot keys before they expire.
Multi-Region Caching
In globally distributed systems, maintain a cache cluster per region. Use write-through replication or event-driven invalidation to keep regional caches in sync. Accept slight staleness for lower latency.
Trade-offs
| Factor | Caching Enabled | No Cache |
|---|---|---|
| Read latency | Sub-millisecond (L1), 1-2ms (L2) | 5-50ms (DB) |
| Data freshness | Potentially stale | Always current |
| Infrastructure cost | Additional memory/nodes | Lower |
| Complexity | Invalidation logic required | Simpler |
| Failure mode | Cache miss = fallback to DB | Direct DB load |
Use caching for read-heavy workloads with tolerance for slight staleness. Avoid caching data that changes on every request or where stale data causes correctness issues (e.g., financial balances during transactions).
Best Practices
- Set TTLs on every cache entry — unbounded caches grow until they cause memory pressure and unpredictable evictions.
- Use cache-aside as the default strategy; it is the simplest, most debuggable, and handles cache failures gracefully since the application can always fall back to the database.
- Monitor cache hit ratio continuously; a ratio below 80% usually indicates poor key design, wrong TTLs, or caching data that should not be cached.
Common Pitfalls
- Caching database query results with no invalidation strategy, leading to users seeing stale data for unpredictable durations.
- Using the cache as a primary data store without persistence — a cache restart loses all data and can cause a thundering herd of database requests that overwhelms the backend.
Anti-Patterns
-
Cache as Source of Truth: Storing data only in the cache without a durable backing store. Cache eviction or restart silently deletes data, and the system has no way to recover it.
-
Unbounded TTLs: Caching entries without expiration, assuming they will be manually invalidated. Over time, stale entries accumulate and the cache silently serves incorrect data that nobody notices until a bug report arrives.
-
Cache-Everything Mentality: Caching low-frequency or high-cardinality data that is rarely re-read. This wastes memory, lowers the overall hit ratio, and evicts genuinely hot entries.
-
Ignoring Cache Stampedes: Letting a popular key expire without protection, causing hundreds of concurrent cache misses that all hit the database simultaneously. This can take down the backend during traffic spikes.
-
Inconsistent Key Schemas: Using ad-hoc, unstructured cache keys across the codebase. This leads to key collisions, makes bulk invalidation impossible, and turns debugging cache issues into guesswork.
Install this skill directly: skilldb add system-design-skills
Related Skills
API Gateway Design
API gateway and Backend-for-Frontend (BFF) patterns for managing client-service communication
Circuit Breaker
Circuit breaker and resilience patterns for building fault-tolerant distributed systems
Database Scaling
Database scaling patterns including sharding, replication, and read replicas for high-throughput systems
Event-Driven
Event-driven architecture and CQRS patterns for reactive, decoupled distributed systems
Message Queues
Message queue patterns including pub/sub, fan-out, and reliable delivery for asynchronous communication
Microservices
Microservices architecture patterns for building independently deployable, loosely coupled services