Skip to main content
Technology & EngineeringWebsocket393 lines

Scaling Websockets

Scaling WebSocket applications with Redis pub/sub, sticky sessions, horizontal scaling, and load balancing strategies

Quick Summary18 lines
You are an expert in scaling WebSocket applications across multiple servers for high availability and performance.

## Key Points

- **File descriptors** — default OS limit is often 1024; production servers need 100k+
- **Memory** — each connection uses roughly 10-50 KB depending on buffering
- **CPU** — serialization/deserialization and message routing consume CPU per message
- **Use Redis pub/sub as the default coordination layer** — it is simple, battle-tested, and handles moderate scale (tens of thousands of messages/second) without tuning.
- **Implement graceful shutdown** — on SIGTERM, stop accepting new connections, notify existing clients, drain in-flight work, and then exit. This prevents message loss during deployments.
- **Monitor connection counts and message throughput per server** — use Prometheus/Grafana to track saturation and plan capacity.
- **Use sticky sessions at the load balancer** — without them, the HTTP upgrade handshake may go to one server and subsequent frames to another, breaking the connection.
- **Set `proxy_read_timeout` high** — the default Nginx read timeout (60s) will kill idle WebSocket connections. Set it to 24h or longer.
- **Test at scale before deploying** — use tools like `artillery` or `k6` with WebSocket support to simulate thousands of concurrent connections and measure tail latency.
- **Forgetting the Redis adapter** — without it, `io.to(room).emit()` only reaches clients on the local server. This is the single most common scaling mistake with Socket.IO.
- **Not handling server restarts** — when a server pod is killed, all its connections drop. Clients must reconnect to a different server and resynchronize state. Test this path explicitly.
- **Forgetting the Redis adapter for Socket.IO** — without the Redis adapter, `io.to(room).emit()` only reaches clients on the local server, which is the most common scaling mistake with Socket.IO.
skilldb get websocket-skills/Scaling WebsocketsFull skill: 393 lines
Paste into your CLAUDE.md or agent config

Scaling WebSockets — WebSockets & Real-Time

You are an expert in scaling WebSocket applications across multiple servers for high availability and performance.

Overview

A single server can handle tens of thousands of concurrent WebSocket connections, but production systems need horizontal scaling for redundancy, capacity, and geographic distribution. The fundamental challenge is that WebSocket connections are stateful and long-lived: a message sent to "room A" must reach all members of room A, even if they are connected to different servers. This requires a coordination layer between servers.

Core Concepts

The Multi-Server Problem

With a single server, broadcasting to a room is trivial — iterate over local connections. With N servers, each server only knows about its own connections. A broadcast must be fanned out to all servers so each can deliver to its local clients.

Pub/Sub Backbone

The standard solution is a pub/sub system (most commonly Redis) that all servers subscribe to. When server A needs to broadcast to a room, it publishes to a Redis channel. All servers (including A) receive the message and deliver it to their local clients in that room.

Sticky Sessions

WebSocket connections are long-lived and stateful. If a load balancer routes the initial HTTP upgrade to server A, all subsequent frames for that connection must also go to server A. This is called sticky sessions or session affinity.

Connection Limits

Each WebSocket connection consumes a file descriptor and memory. Key limits:

  • File descriptors — default OS limit is often 1024; production servers need 100k+
  • Memory — each connection uses roughly 10-50 KB depending on buffering
  • CPU — serialization/deserialization and message routing consume CPU per message

Implementation Patterns

Socket.IO with Redis Adapter

import { Server } from 'socket.io';
import { createAdapter } from '@socket.io/redis-adapter';
import { createClient } from 'redis';

const io = new Server(httpServer, {
  cors: { origin: '*' },
});

// Create Redis pub/sub clients
const pubClient = createClient({ url: 'redis://redis:6379' });
const subClient = pubClient.duplicate();

await Promise.all([pubClient.connect(), subClient.connect()]);

// Attach the Redis adapter
io.adapter(createAdapter(pubClient, subClient));

// Now io.to('room').emit() works across all server instances
io.on('connection', (socket) => {
  socket.on('join', (room) => {
    socket.join(room);
  });

  socket.on('message', ({ room, content }) => {
    // This broadcast reaches clients on ALL servers
    io.to(room).emit('message', { from: socket.id, content });
  });
});

Raw WebSocket with Redis Pub/Sub

import { WebSocketServer } from 'ws';
import Redis from 'ioredis';

const wss = new WebSocketServer({ port: 8080 });
const pub = new Redis();
const sub = new Redis();

// Local room membership: room -> Set of local WebSocket clients
const localRooms = new Map();

// Subscribe to all room channels
sub.on('message', (channel, message) => {
  const roomId = channel.replace('room:', '');
  const clients = localRooms.get(roomId);
  if (!clients) return;

  for (const ws of clients) {
    if (ws.readyState === 1) { // OPEN
      ws.send(message);
    }
  }
});

wss.on('connection', (ws) => {
  const rooms = new Set();

  ws.on('message', (raw) => {
    const msg = JSON.parse(raw);

    switch (msg.type) {
      case 'join': {
        const { roomId } = msg;
        rooms.add(roomId);
        if (!localRooms.has(roomId)) {
          localRooms.set(roomId, new Set());
          sub.subscribe(`room:${roomId}`);
        }
        localRooms.get(roomId).add(ws);
        break;
      }
      case 'message': {
        const { roomId, content } = msg;
        // Publish to Redis; all servers (including this one) will receive it
        pub.publish(`room:${roomId}`, JSON.stringify({
          type: 'message',
          from: ws.userId,
          content,
          timestamp: Date.now(),
        }));
        break;
      }
    }
  });

  ws.on('close', () => {
    for (const roomId of rooms) {
      const clients = localRooms.get(roomId);
      if (clients) {
        clients.delete(ws);
        if (clients.size === 0) {
          localRooms.delete(roomId);
          sub.unsubscribe(`room:${roomId}`);
        }
      }
    }
  });
});

Nginx Load Balancer with Sticky Sessions

upstream websocket_servers {
    # IP hash for sticky sessions based on client IP
    ip_hash;

    server ws-server-1:8080;
    server ws-server-2:8080;
    server ws-server-3:8080;
}

# Alternative: cookie-based sticky sessions (more reliable behind NAT)
upstream websocket_servers_cookie {
    server ws-server-1:8080;
    server ws-server-2:8080;
    server ws-server-3:8080;

    sticky cookie srv_id expires=1h domain=.example.com path=/;
}

server {
    listen 443 ssl;
    server_name ws.example.com;

    ssl_certificate /etc/ssl/certs/example.com.pem;
    ssl_certificate_key /etc/ssl/private/example.com.key;

    location /socket {
        proxy_pass http://websocket_servers;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Timeouts for long-lived connections
        proxy_read_timeout 86400s;
        proxy_send_timeout 86400s;

        # Disable buffering for real-time data
        proxy_buffering off;
    }
}

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: websocket-server
spec:
  replicas: 3
  selector:
    matchLabels:
      app: websocket-server
  template:
    metadata:
      labels:
        app: websocket-server
    spec:
      containers:
        - name: ws-server
          image: myapp/ws-server:latest
          ports:
            - containerPort: 8080
          env:
            - name: REDIS_URL
              value: redis://redis-service:6379
          resources:
            requests:
              memory: "256Mi"
              cpu: "250m"
            limits:
              memory: "512Mi"
              cpu: "500m"
          readinessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 5
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 10
      # Graceful shutdown: allow time for connections to drain
      terminationGracePeriodSeconds: 60
---
apiVersion: v1
kind: Service
metadata:
  name: websocket-service
spec:
  type: ClusterIP
  # Session affinity ensures the same client hits the same pod
  sessionAffinity: ClientIP
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 3600
  ports:
    - port: 80
      targetPort: 8080
  selector:
    app: websocket-server

Graceful Shutdown

// Server: drain connections before shutting down
process.on('SIGTERM', async () => {
  console.log('SIGTERM received, starting graceful shutdown');

  // 1. Stop accepting new connections
  wss.close();

  // 2. Notify all connected clients
  for (const client of wss.clients) {
    client.send(JSON.stringify({ type: 'server-shutdown', reconnectAfter: 5000 }));
    client.close(4001, 'Server shutting down');
  }

  // 3. Wait for in-flight operations to complete
  await flushPendingWrites();

  // 4. Close Redis connections
  await pub.quit();
  await sub.quit();

  // 5. Close HTTP server
  httpServer.close(() => {
    console.log('Server shut down cleanly');
    process.exit(0);
  });

  // Force exit after timeout
  setTimeout(() => process.exit(1), 30000);
});

Monitoring and Metrics

import { register, Gauge, Counter, Histogram } from 'prom-client';

const connectedClients = new Gauge({
  name: 'ws_connected_clients',
  help: 'Number of connected WebSocket clients',
});

const messagesTotal = new Counter({
  name: 'ws_messages_total',
  help: 'Total WebSocket messages processed',
  labelNames: ['direction', 'type'],
});

const messageLatency = new Histogram({
  name: 'ws_message_duration_seconds',
  help: 'Message processing latency',
  buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5],
});

wss.on('connection', (ws) => {
  connectedClients.inc();

  ws.on('message', (data) => {
    messagesTotal.inc({ direction: 'inbound', type: 'message' });
    const end = messageLatency.startTimer();
    processMessage(data);
    end();
  });

  ws.on('close', () => {
    connectedClients.dec();
  });
});

// Expose metrics endpoint for Prometheus
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});

OS Tuning for High Connection Counts

# Increase file descriptor limits
# /etc/security/limits.conf
# wsuser soft nofile 1000000
# wsuser hard nofile 1000000

# Increase system-wide file descriptor limit
# sysctl -w fs.file-max=2000000

# Tune TCP for many connections
# sysctl -w net.core.somaxconn=65535
# sysctl -w net.ipv4.tcp_max_syn_backlog=65535
# sysctl -w net.core.netdev_max_backlog=65535

# Increase ephemeral port range (for outbound connections to Redis, etc.)
# sysctl -w net.ipv4.ip_local_port_range="1024 65535"

# Enable TCP keepalive for detecting dead connections at the OS level
# sysctl -w net.ipv4.tcp_keepalive_time=60
# sysctl -w net.ipv4.tcp_keepalive_intvl=10
# sysctl -w net.ipv4.tcp_keepalive_probes=6

Best Practices

  • Use Redis pub/sub as the default coordination layer — it is simple, battle-tested, and handles moderate scale (tens of thousands of messages/second) without tuning.
  • Implement graceful shutdown — on SIGTERM, stop accepting new connections, notify existing clients, drain in-flight work, and then exit. This prevents message loss during deployments.
  • Monitor connection counts and message throughput per server — use Prometheus/Grafana to track saturation and plan capacity.
  • Use sticky sessions at the load balancer — without them, the HTTP upgrade handshake may go to one server and subsequent frames to another, breaking the connection.
  • Set proxy_read_timeout high — the default Nginx read timeout (60s) will kill idle WebSocket connections. Set it to 24h or longer.
  • Test at scale before deploying — use tools like artillery or k6 with WebSocket support to simulate thousands of concurrent connections and measure tail latency.

Common Pitfalls

  • Forgetting the Redis adapter — without it, io.to(room).emit() only reaches clients on the local server. This is the single most common scaling mistake with Socket.IO.
  • Publishing large payloads through Redis — Redis pub/sub is not designed for large messages. If you need to broadcast large blobs, publish a reference (e.g., S3 URL) and have clients fetch the payload separately.
  • Single Redis instance as a bottleneck — for very high throughput, a single Redis instance becomes the bottleneck. Consider Redis Cluster, sharding rooms across multiple Redis instances, or alternatives like NATS or Kafka.
  • Not handling server restarts — when a server pod is killed, all its connections drop. Clients must reconnect to a different server and resynchronize state. Test this path explicitly.
  • Ignoring backpressure — if a client cannot consume messages fast enough, the server buffers them in memory. Without limits, a slow client can exhaust server memory. Implement per-connection buffer limits and drop or disconnect slow consumers.
  • Over-scaling with WebSockets — before reaching for WebSockets, consider whether SSE or polling would suffice. WebSockets add operational complexity; use them when bidirectional, low-latency communication is genuinely needed.

Core Philosophy

Scaling WebSockets is fundamentally about solving the coordination problem: how do messages reach the right clients when those clients are spread across multiple server instances? The answer for most applications is a pub/sub backbone — typically Redis — that all servers subscribe to. When any server needs to broadcast to a room, it publishes to Redis, and every server delivers to its local clients. This pattern is simple, battle-tested, and handles moderate scale without tuning.

Graceful shutdown is the most underappreciated aspect of WebSocket scaling. In a containerized or autoscaling environment, server instances are created and destroyed regularly. A server that dies abruptly drops all its connections, causing a reconnection storm. A server that shuts down gracefully — stopping new connections, notifying existing clients, draining in-flight work — gives clients time to reconnect to healthy instances smoothly.

Monitoring is the foundation of scaling decisions. Track connected clients per server, message throughput, Redis pub/sub latency, and tail latency of message delivery. Without these metrics, you are guessing at capacity and reacting to outages instead of preventing them. Set up alerts for connection count approaching file descriptor limits, Redis memory usage, and message delivery latency exceeding your SLA.

Anti-Patterns

  • Forgetting the Redis adapter for Socket.IO — without the Redis adapter, io.to(room).emit() only reaches clients on the local server, which is the most common scaling mistake with Socket.IO.

  • Publishing large payloads through Redis pub/sub — Redis pub/sub is not designed for multi-megabyte messages; publish references (URLs or IDs) and let clients fetch the payload separately.

  • Not implementing graceful shutdown — killing server instances without draining connections causes reconnection storms that can cascade into further instance failures.

  • Ignoring backpressure from slow consumers — if a client cannot consume messages fast enough, the server buffers them in memory indefinitely; implement per-connection buffer limits and disconnect slow consumers.

  • Using a single Redis instance as the coordination layer at very high scale — a single Redis instance becomes a throughput bottleneck above tens of thousands of messages per second; consider Redis Cluster, sharding rooms across instances, or using NATS for higher throughput.

Install this skill directly: skilldb add websocket-skills

Get CLI access →