Skip to main content
Technology & EngineeringSystem Design110 lines

Distributed Caching

Distributed caching strategies for reducing latency and database load in large-scale systems

Quick Summary18 lines
You are an expert in Distributed Caching for designing scalable distributed systems.

## Key Points

- **L1 (Local/In-Process)**: Fastest, no network hop, but per-instance and limited in size. Good for hot, rarely changing data.
- **L2 (Distributed)**: Shared across instances, survives restarts, larger capacity. Redis or Memcached.
- **L3 (Database)**: Source of truth, slowest, most durable.
- **Cache-Aside (Lazy Loading)**: Application checks cache first; on miss, reads from DB and populates cache. Most common pattern.
- **Write-Through**: Every write goes to the cache and the database simultaneously. Cache is always consistent but writes are slower.
- **Write-Behind (Write-Back)**: Writes go to the cache first, then asynchronously flushed to the database. Fast writes but risk of data loss.
- **Read-Through**: Cache itself fetches from the database on a miss, abstracting the load logic from the application.
- **Refresh-Ahead**: Proactively reload cache entries before they expire, based on predicted access patterns.
- LRU (Least Recently Used): Default for most systems. Evicts the entry not accessed for the longest time.
- LFU (Least Frequently Used): Evicts entries with the fewest accesses. Better for skewed access patterns.
- TTL (Time-To-Live): Entries expire after a fixed duration regardless of access.
- **Locking**: Only one request fetches from DB; others wait.
skilldb get system-design-skills/Distributed CachingFull skill: 110 lines
Paste into your CLAUDE.md or agent config

Distributed Caching — System Design

You are an expert in Distributed Caching for designing scalable distributed systems.

Core Philosophy

Caching is not a performance optimization bolted on at the end — it is an architectural decision that shapes how data flows through the system. Every cache entry represents a contract: the system accepts slightly stale data in exchange for dramatically lower latency and reduced backend load. Making that contract explicit — what staleness is acceptable, for whom, and for how long — is the foundation of a sound caching strategy.

The hardest problem in caching is not storing data; it is knowing when to invalidate it. Cache invalidation is difficult because it requires understanding the relationship between cached data and the source-of-truth mutations that affect it. The simplest approach that works — TTL-based expiration with cache-aside reads — should be the starting point. More sophisticated invalidation (event-driven, write-through) should be adopted only when TTL-based staleness is provably insufficient.

A cache should always be a performance layer, never a durability layer. If losing the cache causes data loss, the architecture is wrong. The system must function correctly (if slowly) with an empty cache. This property also enables operational simplicity: caches can be flushed, resized, or restarted without data integrity concerns.

Overview

Distributed caching places frequently accessed data in fast, in-memory stores (Redis, Memcached) shared across service instances. It reduces database load, lowers response latency, and absorbs traffic spikes. Effective caching strategy is often the difference between a system that scales and one that collapses under load.

Core Concepts

Cache Hierarchy

[Client] --> [CDN Cache] --> [API Gateway Cache] --> [Application]
                                                         |
                                                    [L1: Local Cache]
                                                         |
                                                    [L2: Distributed Cache (Redis)]
                                                         |
                                                    [L3: Database]
  • L1 (Local/In-Process): Fastest, no network hop, but per-instance and limited in size. Good for hot, rarely changing data.
  • L2 (Distributed): Shared across instances, survives restarts, larger capacity. Redis or Memcached.
  • L3 (Database): Source of truth, slowest, most durable.

Cache Strategies

  • Cache-Aside (Lazy Loading): Application checks cache first; on miss, reads from DB and populates cache. Most common pattern.
  • Write-Through: Every write goes to the cache and the database simultaneously. Cache is always consistent but writes are slower.
  • Write-Behind (Write-Back): Writes go to the cache first, then asynchronously flushed to the database. Fast writes but risk of data loss.
  • Read-Through: Cache itself fetches from the database on a miss, abstracting the load logic from the application.
  • Refresh-Ahead: Proactively reload cache entries before they expire, based on predicted access patterns.

Eviction Policies

  • LRU (Least Recently Used): Default for most systems. Evicts the entry not accessed for the longest time.
  • LFU (Least Frequently Used): Evicts entries with the fewest accesses. Better for skewed access patterns.
  • TTL (Time-To-Live): Entries expire after a fixed duration regardless of access.

Implementation Patterns

Cache Key Design

Use structured, namespaced keys: user:{id}:profile, product:{id}:details:v2. Include version numbers so schema changes can coexist. Avoid overly broad keys that serialize unrelated invalidations.

Consistent Hashing for Sharding

Distribute keys across multiple cache nodes using consistent hashing. When a node is added or removed, only a fraction of keys need to migrate, minimizing cache miss storms.

Cache Stampede Prevention

When a popular key expires, many concurrent requests may hit the database simultaneously. Mitigations:

  • Locking: Only one request fetches from DB; others wait.
  • Probabilistic early expiration: Each reader has a small chance of refreshing the entry before TTL expires.
  • Background refresh: A separate process refreshes hot keys before they expire.

Multi-Region Caching

In globally distributed systems, maintain a cache cluster per region. Use write-through replication or event-driven invalidation to keep regional caches in sync. Accept slight staleness for lower latency.

Trade-offs

FactorCaching EnabledNo Cache
Read latencySub-millisecond (L1), 1-2ms (L2)5-50ms (DB)
Data freshnessPotentially staleAlways current
Infrastructure costAdditional memory/nodesLower
ComplexityInvalidation logic requiredSimpler
Failure modeCache miss = fallback to DBDirect DB load

Use caching for read-heavy workloads with tolerance for slight staleness. Avoid caching data that changes on every request or where stale data causes correctness issues (e.g., financial balances during transactions).

Best Practices

  • Set TTLs on every cache entry — unbounded caches grow until they cause memory pressure and unpredictable evictions.
  • Use cache-aside as the default strategy; it is the simplest, most debuggable, and handles cache failures gracefully since the application can always fall back to the database.
  • Monitor cache hit ratio continuously; a ratio below 80% usually indicates poor key design, wrong TTLs, or caching data that should not be cached.

Common Pitfalls

  • Caching database query results with no invalidation strategy, leading to users seeing stale data for unpredictable durations.
  • Using the cache as a primary data store without persistence — a cache restart loses all data and can cause a thundering herd of database requests that overwhelms the backend.

Anti-Patterns

  • Cache as Source of Truth: Storing data only in the cache without a durable backing store. Cache eviction or restart silently deletes data, and the system has no way to recover it.

  • Unbounded TTLs: Caching entries without expiration, assuming they will be manually invalidated. Over time, stale entries accumulate and the cache silently serves incorrect data that nobody notices until a bug report arrives.

  • Cache-Everything Mentality: Caching low-frequency or high-cardinality data that is rarely re-read. This wastes memory, lowers the overall hit ratio, and evicts genuinely hot entries.

  • Ignoring Cache Stampedes: Letting a popular key expire without protection, causing hundreds of concurrent cache misses that all hit the database simultaneously. This can take down the backend during traffic spikes.

  • Inconsistent Key Schemas: Using ad-hoc, unstructured cache keys across the codebase. This leads to key collisions, makes bulk invalidation impossible, and turns debugging cache issues into guesswork.

Install this skill directly: skilldb add system-design-skills

Get CLI access →