Technology & EngineeringRedis211 lines

Sentinel Cluster

Redis Sentinel and Cluster configurations for high availability and horizontal scaling

Quick Summary18 lines

You are an expert in Redis Sentinel and Redis Cluster for building highly available and horizontally scalable Redis deployments.

## Key Points

- A set of Sentinel processes (minimum three, on separate machines) monitors Redis primary and replica instances.
- Sentinels use a quorum vote to agree that the primary is down before triggering failover.
- On failover, a replica is promoted to primary, other replicas are reconfigured, and clients are notified.
- Sentinel provides service discovery: clients connect to Sentinels first to learn the current primary address.
- Data is divided into 16,384 hash slots. Each primary owns a subset of slots.
- Keys are mapped to slots via `CRC16(key) mod 16384`.
- Each primary has one or more replicas. If a primary fails, its replica is promoted automatically by cluster consensus.
- Clients must handle `MOVED` and `ASK` redirections (handled automatically by cluster-aware clients like ioredis).
- **Deploy an odd number of Sentinels** (3 or 5) across separate failure domains. The quorum should be `(n / 2) + 1` (e.g., 2 out of 3).
- **Set `min-replicas-to-write`** on the primary to avoid accepting writes when replicas are unreachable, reducing split-brain risk.
- **Use `scaleReads: "slave"` in Cluster** for read-heavy workloads. This distributes reads across replicas and reduces primary load.
- **Design key schemas around hash tags** before building on Cluster. Retrofitting hash tags into an existing key scheme is painful.

skilldb get redis-skills/Sentinel ClusterFull skill: 211 lines

Paste into your CLAUDE.md or agent config

Sentinel & Cluster — Redis

You are an expert in Redis Sentinel and Redis Cluster for building highly available and horizontally scalable Redis deployments.

Core Philosophy

Overview

Redis provides two distinct mechanisms for high availability. Sentinel monitors a primary-replica topology and performs automatic failover when the primary fails, without changing how data is distributed. Cluster partitions data across multiple primaries (sharding) and includes built-in failover for each shard. Sentinel suits smaller deployments where a single primary handles the full dataset; Cluster suits large datasets or high-throughput workloads that exceed a single node's capacity.

Core Concepts

Sentinel

A set of Sentinel processes (minimum three, on separate machines) monitors Redis primary and replica instances.
Sentinels use a quorum vote to agree that the primary is down before triggering failover.
On failover, a replica is promoted to primary, other replicas are reconfigured, and clients are notified.
Sentinel provides service discovery: clients connect to Sentinels first to learn the current primary address.

Cluster

Data is divided into 16,384 hash slots. Each primary owns a subset of slots.
Keys are mapped to slots via CRC16(key) mod 16384.
Each primary has one or more replicas. If a primary fails, its replica is promoted automatically by cluster consensus.
Clients must handle MOVED and ASK redirections (handled automatically by cluster-aware clients like ioredis).

Hash Tags

In Cluster, keys containing {...} are hashed only on the content inside the braces. This forces related keys onto the same slot, enabling multi-key operations.

Implementation Patterns

Connecting to Sentinel with ioredis

import Redis from "ioredis";

const redis = new Redis({
  sentinels: [
    { host: "sentinel-1.internal", port: 26379 },
    { host: "sentinel-2.internal", port: 26379 },
    { host: "sentinel-3.internal", port: 26379 },
  ],
  name: "mymaster", // Sentinel master group name
  password: "redis-password",
  sentinelPassword: "sentinel-password",
  db: 0,
  // Reconnect automatically on failover
  retryStrategy(times) {
    return Math.min(times * 100, 3000);
  },
});

redis.on("ready", () => console.log("Connected to primary via Sentinel"));
redis.on("error", (err) => console.error("Redis error:", err.message));

// Read from replicas for read-heavy workloads
const readReplica = new Redis({
  sentinels: [
    { host: "sentinel-1.internal", port: 26379 },
    { host: "sentinel-2.internal", port: 26379 },
    { host: "sentinel-3.internal", port: 26379 },
  ],
  name: "mymaster",
  role: "slave",
  preferredSlaves: [{ ip: "replica-1.internal", prio: 1 }],
});

Connecting to Cluster with ioredis

import Redis from "ioredis";

const cluster = new Redis.Cluster(
  [
    { host: "node-1.internal", port: 6379 },
    { host: "node-2.internal", port: 6379 },
    { host: "node-3.internal", port: 6379 },
  ],
  {
    redisOptions: {
      password: "cluster-password",
    },
    // Scale reads across replicas
    scaleReads: "slave",
    // Retry MOVED/ASK redirections
    clusterRetryStrategy(times) {
      return Math.min(times * 100, 3000);
    },
    // Refresh slot mapping on errors
    slotsRefreshTimeout: 2000,
    slotsRefreshInterval: 5000,
  }
);

cluster.on("ready", () => console.log("Cluster connected"));
cluster.on("+node", (node) => console.log(`Node added: ${node.options.host}`));
cluster.on("-node", (node) => console.log(`Node removed: ${node.options.host}`));

Using hash tags for multi-key operations

// These keys share the hash tag {user:42}, so they land on the same slot
await cluster.set("{user:42}:profile", JSON.stringify({ name: "Alice" }));
await cluster.set("{user:42}:prefs", JSON.stringify({ theme: "dark" }));
await cluster.set("{user:42}:session", "sess_abc123");

// Multi-key operations work because all keys are on the same slot
const pipeline = cluster.pipeline();
pipeline.get("{user:42}:profile");
pipeline.get("{user:42}:prefs");
pipeline.get("{user:42}:session");
const results = await pipeline.exec();

// WATCH/MULTI/EXEC also works within the same hash tag
const tx = cluster.multi();
tx.set("{user:42}:profile", JSON.stringify({ name: "Alice Updated" }));
tx.set("{user:42}:prefs", JSON.stringify({ theme: "light" }));
await tx.exec();

Health checking and monitoring

// Sentinel: check master status
async function checkSentinelHealth(sentinelHost: string) {
  const sentinel = new Redis({ host: sentinelHost, port: 26379 });

  const masterInfo = await sentinel.sentinel("MASTER", "mymaster");
  const replicas = await sentinel.sentinel("REPLICAS", "mymaster");
  const sentinels = await sentinel.sentinel("SENTINELS", "mymaster");

  console.log("Master:", parseInfoPairs(masterInfo));
  console.log("Replica count:", replicas.length);
  console.log("Sentinel count:", sentinels.length);

  await sentinel.quit();
}

// Cluster: inspect node and slot status
async function checkClusterHealth() {
  const info = await cluster.cluster("INFO");
  const nodes = await cluster.cluster("NODES");

  // Parse cluster_state from INFO
  const state = info.match(/cluster_state:(\w+)/)?.[1];
  console.log("Cluster state:", state); // "ok" or "fail"

  // Count master and replica nodes
  const lines = nodes.split("\n").filter(Boolean);
  const masters = lines.filter((l) => l.includes("master")).length;
  const slaves = lines.filter((l) => l.includes("slave")).length;
  console.log(`Nodes: ${masters} masters, ${slaves} replicas`);
}

Graceful failover handling

cluster.on("error", (err) => {
  console.error("Cluster error:", err.message);
});

// ioredis automatically retries on MOVED/ASK, but log for visibility
cluster.on("node error", (err, address) => {
  console.warn(`Node ${address} error: ${err.message}`);
});

// For Sentinel, detect failover events
const sentinelConn = new Redis({ host: "sentinel-1.internal", port: 26379 });
sentinelConn.subscribe("+switch-master");
sentinelConn.on("message", (channel, message) => {
  console.log(`Failover detected: ${message}`);
  // message = "mymaster old-ip old-port new-ip new-port"
});

Best Practices

Deploy an odd number of Sentinels (3 or 5) across separate failure domains. The quorum should be (n / 2) + 1 (e.g., 2 out of 3).
Set min-replicas-to-write on the primary to avoid accepting writes when replicas are unreachable, reducing split-brain risk.
Use scaleReads: "slave" in Cluster for read-heavy workloads. This distributes reads across replicas and reduces primary load.
Design key schemas around hash tags before building on Cluster. Retrofitting hash tags into an existing key scheme is painful.
Test failover in staging. Use redis-cli DEBUG SLEEP 60 or CLUSTER FAILOVER to simulate failures and verify that your application handles them gracefully.

Common Pitfalls

Hardcoding the primary address. After a Sentinel failover, the primary IP changes. Always connect through Sentinel, never directly to a specific node.
Running fewer than 3 Sentinels. With only 2 Sentinels, one failure means no quorum and no automatic failover.
Cross-slot operations in Cluster without hash tags. MGET key1 key2 fails if the keys are on different slots. Use hash tags or pipeline individual commands.
Ignoring replication lag. After failover, the new primary may be missing writes that the old primary accepted but did not replicate. Redis replication is asynchronous by default.
Using SELECT (multiple databases) with Cluster. Cluster only supports database 0. Applications relying on multiple databases must migrate to key prefixes.

Anti-Patterns

Over-engineering for hypothetical scale. Building for millions of users when you have hundreds adds complexity without value. Solve today's problems first.

Ignoring the existing ecosystem. Reinventing functionality that mature libraries already provide well wastes time and introduces unnecessary risk.

Premature abstraction. Creating elaborate frameworks and utilities before you have enough concrete cases to know what the abstraction should look like produces the wrong abstraction.

Neglecting error handling at boundaries. Internal code can trust its inputs, but system boundaries (user input, APIs, file I/O) require defensive validation.

Skipping documentation for obvious code. What is obvious to you today will not be obvious to your colleague next month or to you next year.

Install this skill directly: skilldb add redis-skills

Get CLI access →