Technology & EngineeringSystem Design110 lines

Distributed Caching

Distributed caching strategies for reducing latency and database load in large-scale systems

Quick Summary18 lines

You are an expert in Distributed Caching for designing scalable distributed systems.

## Key Points

- **L1 (Local/In-Process)**: Fastest, no network hop, but per-instance and limited in size. Good for hot, rarely changing data.
- **L2 (Distributed)**: Shared across instances, survives restarts, larger capacity. Redis or Memcached.
- **L3 (Database)**: Source of truth, slowest, most durable.
- **Cache-Aside (Lazy Loading)**: Application checks cache first; on miss, reads from DB and populates cache. Most common pattern.
- **Write-Through**: Every write goes to the cache and the database simultaneously. Cache is always consistent but writes are slower.
- **Write-Behind (Write-Back)**: Writes go to the cache first, then asynchronously flushed to the database. Fast writes but risk of data loss.
- **Read-Through**: Cache itself fetches from the database on a miss, abstracting the load logic from the application.
- **Refresh-Ahead**: Proactively reload cache entries before they expire, based on predicted access patterns.
- LRU (Least Recently Used): Default for most systems. Evicts the entry not accessed for the longest time.
- LFU (Least Frequently Used): Evicts entries with the fewest accesses. Better for skewed access patterns.
- TTL (Time-To-Live): Entries expire after a fixed duration regardless of access.
- **Locking**: Only one request fetches from DB; others wait.

skilldb get system-design-skills/Distributed CachingFull skill: 110 lines

Paste into your CLAUDE.md or agent config

Distributed Caching — System Design

You are an expert in Distributed Caching for designing scalable distributed systems.

Core Philosophy

Caching is not a performance optimization bolted on at the end — it is an architectural decision that shapes how data flows through the system. Every cache entry represents a contract: the system accepts slightly stale data in exchange for dramatically lower latency and reduced backend load. Making that contract explicit — what staleness is acceptable, for whom, and for how long — is the foundation of a sound caching strategy.

The hardest problem in caching is not storing data; it is knowing when to invalidate it. Cache invalidation is difficult because it requires understanding the relationship between cached data and the source-of-truth mutations that affect it. The simplest approach that works — TTL-based expiration with cache-aside reads — should be the starting point. More sophisticated invalidation (event-driven, write-through) should be adopted only when TTL-based staleness is provably insufficient.

A cache should always be a performance layer, never a durability layer. If losing the cache causes data loss, the architecture is wrong. The system must function correctly (if slowly) with an empty cache. This property also enables operational simplicity: caches can be flushed, resized, or restarted without data integrity concerns.

Overview

Distributed caching places frequently accessed data in fast, in-memory stores (Redis, Memcached) shared across service instances. It reduces database load, lowers response latency, and absorbs traffic spikes. Effective caching strategy is often the difference between a system that scales and one that collapses under load.

Core Concepts

Cache Hierarchy

[Client] --> [CDN Cache] --> [API Gateway Cache] --> [Application]
                                                         |
                                                    [L1: Local Cache]
                                                         |
                                                    [L2: Distributed Cache (Redis)]
                                                         |
                                                    [L3: Database]

L1 (Local/In-Process): Fastest, no network hop, but per-instance and limited in size. Good for hot, rarely changing data.
L2 (Distributed): Shared across instances, survives restarts, larger capacity. Redis or Memcached.
L3 (Database): Source of truth, slowest, most durable.

Cache Strategies

Cache-Aside (Lazy Loading): Application checks cache first; on miss, reads from DB and populates cache. Most common pattern.
Write-Through: Every write goes to the cache and the database simultaneously. Cache is always consistent but writes are slower.
Write-Behind (Write-Back): Writes go to the cache first, then asynchronously flushed to the database. Fast writes but risk of data loss.
Read-Through: Cache itself fetches from the database on a miss, abstracting the load logic from the application.
Refresh-Ahead: Proactively reload cache entries before they expire, based on predicted access patterns.

Eviction Policies

LRU (Least Recently Used): Default for most systems. Evicts the entry not accessed for the longest time.
LFU (Least Frequently Used): Evicts entries with the fewest accesses. Better for skewed access patterns.
TTL (Time-To-Live): Entries expire after a fixed duration regardless of access.

Implementation Patterns

Cache Key Design

Use structured, namespaced keys: user:{id}:profile, product:{id}:details:v2. Include version numbers so schema changes can coexist. Avoid overly broad keys that serialize unrelated invalidations.

Consistent Hashing for Sharding

Distribute keys across multiple cache nodes using consistent hashing. When a node is added or removed, only a fraction of keys need to migrate, minimizing cache miss storms.

Cache Stampede Prevention

When a popular key expires, many concurrent requests may hit the database simultaneously. Mitigations:

Locking: Only one request fetches from DB; others wait.
Probabilistic early expiration: Each reader has a small chance of refreshing the entry before TTL expires.
Background refresh: A separate process refreshes hot keys before they expire.

Multi-Region Caching

In globally distributed systems, maintain a cache cluster per region. Use write-through replication or event-driven invalidation to keep regional caches in sync. Accept slight staleness for lower latency.

Trade-offs

Factor	Caching Enabled	No Cache
Read latency	Sub-millisecond (L1), 1-2ms (L2)	5-50ms (DB)
Data freshness	Potentially stale	Always current
Infrastructure cost	Additional memory/nodes	Lower
Complexity	Invalidation logic required	Simpler
Failure mode	Cache miss = fallback to DB	Direct DB load

Use caching for read-heavy workloads with tolerance for slight staleness. Avoid caching data that changes on every request or where stale data causes correctness issues (e.g., financial balances during transactions).

Best Practices

Set TTLs on every cache entry — unbounded caches grow until they cause memory pressure and unpredictable evictions.
Use cache-aside as the default strategy; it is the simplest, most debuggable, and handles cache failures gracefully since the application can always fall back to the database.
Monitor cache hit ratio continuously; a ratio below 80% usually indicates poor key design, wrong TTLs, or caching data that should not be cached.

Common Pitfalls

Caching database query results with no invalidation strategy, leading to users seeing stale data for unpredictable durations.
Using the cache as a primary data store without persistence — a cache restart loses all data and can cause a thundering herd of database requests that overwhelms the backend.

Anti-Patterns

Cache as Source of Truth: Storing data only in the cache without a durable backing store. Cache eviction or restart silently deletes data, and the system has no way to recover it.
Unbounded TTLs: Caching entries without expiration, assuming they will be manually invalidated. Over time, stale entries accumulate and the cache silently serves incorrect data that nobody notices until a bug report arrives.
Cache-Everything Mentality: Caching low-frequency or high-cardinality data that is rarely re-read. This wastes memory, lowers the overall hit ratio, and evicts genuinely hot entries.
Ignoring Cache Stampedes: Letting a popular key expire without protection, causing hundreds of concurrent cache misses that all hit the database simultaneously. This can take down the backend during traffic spikes.
Inconsistent Key Schemas: Using ad-hoc, unstructured cache keys across the codebase. This leads to key collisions, makes bulk invalidation impossible, and turns debugging cache issues into guesswork.

Install this skill directly: skilldb add system-design-skills

Get CLI access →

Distributed Caching

Distributed Caching — System Design

Core Philosophy

Overview

Core Concepts

Implementation Patterns

Trade-offs

Best Practices

Common Pitfalls

Anti-Patterns

Related Skills

API Gateway Design

Circuit Breaker

Database Scaling

Event-Driven

Message Queues

Microservices