Scaling Websockets
Scaling WebSocket applications with Redis pub/sub, sticky sessions, horizontal scaling, and load balancing strategies
You are an expert in scaling WebSocket applications across multiple servers for high availability and performance. ## Key Points - **File descriptors** — default OS limit is often 1024; production servers need 100k+ - **Memory** — each connection uses roughly 10-50 KB depending on buffering - **CPU** — serialization/deserialization and message routing consume CPU per message - **Use Redis pub/sub as the default coordination layer** — it is simple, battle-tested, and handles moderate scale (tens of thousands of messages/second) without tuning. - **Implement graceful shutdown** — on SIGTERM, stop accepting new connections, notify existing clients, drain in-flight work, and then exit. This prevents message loss during deployments. - **Monitor connection counts and message throughput per server** — use Prometheus/Grafana to track saturation and plan capacity. - **Use sticky sessions at the load balancer** — without them, the HTTP upgrade handshake may go to one server and subsequent frames to another, breaking the connection. - **Set `proxy_read_timeout` high** — the default Nginx read timeout (60s) will kill idle WebSocket connections. Set it to 24h or longer. - **Test at scale before deploying** — use tools like `artillery` or `k6` with WebSocket support to simulate thousands of concurrent connections and measure tail latency. - **Forgetting the Redis adapter** — without it, `io.to(room).emit()` only reaches clients on the local server. This is the single most common scaling mistake with Socket.IO. - **Not handling server restarts** — when a server pod is killed, all its connections drop. Clients must reconnect to a different server and resynchronize state. Test this path explicitly. - **Forgetting the Redis adapter for Socket.IO** — without the Redis adapter, `io.to(room).emit()` only reaches clients on the local server, which is the most common scaling mistake with Socket.IO.
skilldb get websocket-skills/Scaling WebsocketsFull skill: 393 linesScaling WebSockets — WebSockets & Real-Time
You are an expert in scaling WebSocket applications across multiple servers for high availability and performance.
Overview
A single server can handle tens of thousands of concurrent WebSocket connections, but production systems need horizontal scaling for redundancy, capacity, and geographic distribution. The fundamental challenge is that WebSocket connections are stateful and long-lived: a message sent to "room A" must reach all members of room A, even if they are connected to different servers. This requires a coordination layer between servers.
Core Concepts
The Multi-Server Problem
With a single server, broadcasting to a room is trivial — iterate over local connections. With N servers, each server only knows about its own connections. A broadcast must be fanned out to all servers so each can deliver to its local clients.
Pub/Sub Backbone
The standard solution is a pub/sub system (most commonly Redis) that all servers subscribe to. When server A needs to broadcast to a room, it publishes to a Redis channel. All servers (including A) receive the message and deliver it to their local clients in that room.
Sticky Sessions
WebSocket connections are long-lived and stateful. If a load balancer routes the initial HTTP upgrade to server A, all subsequent frames for that connection must also go to server A. This is called sticky sessions or session affinity.
Connection Limits
Each WebSocket connection consumes a file descriptor and memory. Key limits:
- File descriptors — default OS limit is often 1024; production servers need 100k+
- Memory — each connection uses roughly 10-50 KB depending on buffering
- CPU — serialization/deserialization and message routing consume CPU per message
Implementation Patterns
Socket.IO with Redis Adapter
import { Server } from 'socket.io';
import { createAdapter } from '@socket.io/redis-adapter';
import { createClient } from 'redis';
const io = new Server(httpServer, {
cors: { origin: '*' },
});
// Create Redis pub/sub clients
const pubClient = createClient({ url: 'redis://redis:6379' });
const subClient = pubClient.duplicate();
await Promise.all([pubClient.connect(), subClient.connect()]);
// Attach the Redis adapter
io.adapter(createAdapter(pubClient, subClient));
// Now io.to('room').emit() works across all server instances
io.on('connection', (socket) => {
socket.on('join', (room) => {
socket.join(room);
});
socket.on('message', ({ room, content }) => {
// This broadcast reaches clients on ALL servers
io.to(room).emit('message', { from: socket.id, content });
});
});
Raw WebSocket with Redis Pub/Sub
import { WebSocketServer } from 'ws';
import Redis from 'ioredis';
const wss = new WebSocketServer({ port: 8080 });
const pub = new Redis();
const sub = new Redis();
// Local room membership: room -> Set of local WebSocket clients
const localRooms = new Map();
// Subscribe to all room channels
sub.on('message', (channel, message) => {
const roomId = channel.replace('room:', '');
const clients = localRooms.get(roomId);
if (!clients) return;
for (const ws of clients) {
if (ws.readyState === 1) { // OPEN
ws.send(message);
}
}
});
wss.on('connection', (ws) => {
const rooms = new Set();
ws.on('message', (raw) => {
const msg = JSON.parse(raw);
switch (msg.type) {
case 'join': {
const { roomId } = msg;
rooms.add(roomId);
if (!localRooms.has(roomId)) {
localRooms.set(roomId, new Set());
sub.subscribe(`room:${roomId}`);
}
localRooms.get(roomId).add(ws);
break;
}
case 'message': {
const { roomId, content } = msg;
// Publish to Redis; all servers (including this one) will receive it
pub.publish(`room:${roomId}`, JSON.stringify({
type: 'message',
from: ws.userId,
content,
timestamp: Date.now(),
}));
break;
}
}
});
ws.on('close', () => {
for (const roomId of rooms) {
const clients = localRooms.get(roomId);
if (clients) {
clients.delete(ws);
if (clients.size === 0) {
localRooms.delete(roomId);
sub.unsubscribe(`room:${roomId}`);
}
}
}
});
});
Nginx Load Balancer with Sticky Sessions
upstream websocket_servers {
# IP hash for sticky sessions based on client IP
ip_hash;
server ws-server-1:8080;
server ws-server-2:8080;
server ws-server-3:8080;
}
# Alternative: cookie-based sticky sessions (more reliable behind NAT)
upstream websocket_servers_cookie {
server ws-server-1:8080;
server ws-server-2:8080;
server ws-server-3:8080;
sticky cookie srv_id expires=1h domain=.example.com path=/;
}
server {
listen 443 ssl;
server_name ws.example.com;
ssl_certificate /etc/ssl/certs/example.com.pem;
ssl_certificate_key /etc/ssl/private/example.com.key;
location /socket {
proxy_pass http://websocket_servers;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Timeouts for long-lived connections
proxy_read_timeout 86400s;
proxy_send_timeout 86400s;
# Disable buffering for real-time data
proxy_buffering off;
}
}
Kubernetes Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: websocket-server
spec:
replicas: 3
selector:
matchLabels:
app: websocket-server
template:
metadata:
labels:
app: websocket-server
spec:
containers:
- name: ws-server
image: myapp/ws-server:latest
ports:
- containerPort: 8080
env:
- name: REDIS_URL
value: redis://redis-service:6379
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
# Graceful shutdown: allow time for connections to drain
terminationGracePeriodSeconds: 60
---
apiVersion: v1
kind: Service
metadata:
name: websocket-service
spec:
type: ClusterIP
# Session affinity ensures the same client hits the same pod
sessionAffinity: ClientIP
sessionAffinityConfig:
clientIP:
timeoutSeconds: 3600
ports:
- port: 80
targetPort: 8080
selector:
app: websocket-server
Graceful Shutdown
// Server: drain connections before shutting down
process.on('SIGTERM', async () => {
console.log('SIGTERM received, starting graceful shutdown');
// 1. Stop accepting new connections
wss.close();
// 2. Notify all connected clients
for (const client of wss.clients) {
client.send(JSON.stringify({ type: 'server-shutdown', reconnectAfter: 5000 }));
client.close(4001, 'Server shutting down');
}
// 3. Wait for in-flight operations to complete
await flushPendingWrites();
// 4. Close Redis connections
await pub.quit();
await sub.quit();
// 5. Close HTTP server
httpServer.close(() => {
console.log('Server shut down cleanly');
process.exit(0);
});
// Force exit after timeout
setTimeout(() => process.exit(1), 30000);
});
Monitoring and Metrics
import { register, Gauge, Counter, Histogram } from 'prom-client';
const connectedClients = new Gauge({
name: 'ws_connected_clients',
help: 'Number of connected WebSocket clients',
});
const messagesTotal = new Counter({
name: 'ws_messages_total',
help: 'Total WebSocket messages processed',
labelNames: ['direction', 'type'],
});
const messageLatency = new Histogram({
name: 'ws_message_duration_seconds',
help: 'Message processing latency',
buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5],
});
wss.on('connection', (ws) => {
connectedClients.inc();
ws.on('message', (data) => {
messagesTotal.inc({ direction: 'inbound', type: 'message' });
const end = messageLatency.startTimer();
processMessage(data);
end();
});
ws.on('close', () => {
connectedClients.dec();
});
});
// Expose metrics endpoint for Prometheus
app.get('/metrics', async (req, res) => {
res.set('Content-Type', register.contentType);
res.end(await register.metrics());
});
OS Tuning for High Connection Counts
# Increase file descriptor limits
# /etc/security/limits.conf
# wsuser soft nofile 1000000
# wsuser hard nofile 1000000
# Increase system-wide file descriptor limit
# sysctl -w fs.file-max=2000000
# Tune TCP for many connections
# sysctl -w net.core.somaxconn=65535
# sysctl -w net.ipv4.tcp_max_syn_backlog=65535
# sysctl -w net.core.netdev_max_backlog=65535
# Increase ephemeral port range (for outbound connections to Redis, etc.)
# sysctl -w net.ipv4.ip_local_port_range="1024 65535"
# Enable TCP keepalive for detecting dead connections at the OS level
# sysctl -w net.ipv4.tcp_keepalive_time=60
# sysctl -w net.ipv4.tcp_keepalive_intvl=10
# sysctl -w net.ipv4.tcp_keepalive_probes=6
Best Practices
- Use Redis pub/sub as the default coordination layer — it is simple, battle-tested, and handles moderate scale (tens of thousands of messages/second) without tuning.
- Implement graceful shutdown — on SIGTERM, stop accepting new connections, notify existing clients, drain in-flight work, and then exit. This prevents message loss during deployments.
- Monitor connection counts and message throughput per server — use Prometheus/Grafana to track saturation and plan capacity.
- Use sticky sessions at the load balancer — without them, the HTTP upgrade handshake may go to one server and subsequent frames to another, breaking the connection.
- Set
proxy_read_timeouthigh — the default Nginx read timeout (60s) will kill idle WebSocket connections. Set it to 24h or longer. - Test at scale before deploying — use tools like
artilleryork6with WebSocket support to simulate thousands of concurrent connections and measure tail latency.
Common Pitfalls
- Forgetting the Redis adapter — without it,
io.to(room).emit()only reaches clients on the local server. This is the single most common scaling mistake with Socket.IO. - Publishing large payloads through Redis — Redis pub/sub is not designed for large messages. If you need to broadcast large blobs, publish a reference (e.g., S3 URL) and have clients fetch the payload separately.
- Single Redis instance as a bottleneck — for very high throughput, a single Redis instance becomes the bottleneck. Consider Redis Cluster, sharding rooms across multiple Redis instances, or alternatives like NATS or Kafka.
- Not handling server restarts — when a server pod is killed, all its connections drop. Clients must reconnect to a different server and resynchronize state. Test this path explicitly.
- Ignoring backpressure — if a client cannot consume messages fast enough, the server buffers them in memory. Without limits, a slow client can exhaust server memory. Implement per-connection buffer limits and drop or disconnect slow consumers.
- Over-scaling with WebSockets — before reaching for WebSockets, consider whether SSE or polling would suffice. WebSockets add operational complexity; use them when bidirectional, low-latency communication is genuinely needed.
Core Philosophy
Scaling WebSockets is fundamentally about solving the coordination problem: how do messages reach the right clients when those clients are spread across multiple server instances? The answer for most applications is a pub/sub backbone — typically Redis — that all servers subscribe to. When any server needs to broadcast to a room, it publishes to Redis, and every server delivers to its local clients. This pattern is simple, battle-tested, and handles moderate scale without tuning.
Graceful shutdown is the most underappreciated aspect of WebSocket scaling. In a containerized or autoscaling environment, server instances are created and destroyed regularly. A server that dies abruptly drops all its connections, causing a reconnection storm. A server that shuts down gracefully — stopping new connections, notifying existing clients, draining in-flight work — gives clients time to reconnect to healthy instances smoothly.
Monitoring is the foundation of scaling decisions. Track connected clients per server, message throughput, Redis pub/sub latency, and tail latency of message delivery. Without these metrics, you are guessing at capacity and reacting to outages instead of preventing them. Set up alerts for connection count approaching file descriptor limits, Redis memory usage, and message delivery latency exceeding your SLA.
Anti-Patterns
-
Forgetting the Redis adapter for Socket.IO — without the Redis adapter,
io.to(room).emit()only reaches clients on the local server, which is the most common scaling mistake with Socket.IO. -
Publishing large payloads through Redis pub/sub — Redis pub/sub is not designed for multi-megabyte messages; publish references (URLs or IDs) and let clients fetch the payload separately.
-
Not implementing graceful shutdown — killing server instances without draining connections causes reconnection storms that can cascade into further instance failures.
-
Ignoring backpressure from slow consumers — if a client cannot consume messages fast enough, the server buffers them in memory indefinitely; implement per-connection buffer limits and disconnect slow consumers.
-
Using a single Redis instance as the coordination layer at very high scale — a single Redis instance becomes a throughput bottleneck above tens of thousands of messages per second; consider Redis Cluster, sharding rooms across instances, or using NATS for higher throughput.
Install this skill directly: skilldb add websocket-skills
Related Skills
Chat Rooms
Chat room architecture covering message routing, history, moderation, and scalable room-based real-time systems
Collaborative Editing
Real-time collaborative editing with CRDTs and Operational Transform for conflict-free concurrent document editing
Presence
User presence system design for tracking online/offline status, typing indicators, and activity in real-time apps
Reconnection
Reconnection and offline resilience patterns for WebSocket apps including retry strategies and state synchronization
Server Sent Events
Server-Sent Events (SSE) patterns for efficient unidirectional real-time streaming from server to client
Socket Io
Socket.IO patterns for event-driven real-time communication with automatic reconnection and room management