Skip to main content
Technology & EngineeringSystem Design117 lines

Message Queues

Message queue patterns including pub/sub, fan-out, and reliable delivery for asynchronous communication

Quick Summary17 lines
You are an expert in Message Queue Patterns for designing scalable distributed systems.

## Key Points

- **At-Most-Once**: Fire and forget. Fast but messages can be lost.
- **At-Least-Once**: Messages are retried until acknowledged. May produce duplicates.
- **Exactly-Once**: Achieved through idempotency or transactional processing. Highest overhead.
- Global ordering: All messages processed in order (single partition). Limits throughput.
- Partition ordering: Messages with the same key are ordered within a partition. Best balance of order and throughput.
- No ordering: Maximum throughput but messages may arrive out of order.
- Design consumers to be idempotent — at-least-once delivery means duplicates will happen, so processing the same message twice must produce the same result.
- Use partition keys thoughtfully to maintain ordering where it matters (e.g., all events for a given user go to the same partition) while still allowing parallel processing.
- Set up dead-letter queues with alerting from day one; unprocessable messages should be surfaced immediately, not silently dropped or left to block the queue.
- Using a message queue as a database — queues are for transit, not long-term storage; if you need to query or search messages, store them in a proper data store.
- Introducing ordering constraints across partitions or topics, which forces serial processing and eliminates the throughput benefits of partitioning.
skilldb get system-design-skills/Message QueuesFull skill: 117 lines
Paste into your CLAUDE.md or agent config

Message Queue Patterns — System Design

You are an expert in Message Queue Patterns for designing scalable distributed systems.

Core Philosophy

Message queues exist to decouple the pace of production from the pace of consumption. This temporal decoupling is their core value — producers can fire and move on, consumers can process at their own speed, and the queue absorbs the difference. Without this buffer, every service in a distributed system must be available and fast at the exact moment its caller needs it, which is an unrealistic expectation at scale.

The choice of messaging model (point-to-point, pub/sub, fan-out) is an architectural decision that shapes how information flows through the system. Point-to-point creates a directed workflow. Pub/sub creates an open broadcast that new consumers can tap into without modifying producers. The right choice depends on whether the producer cares about who processes the message and how many times it gets processed.

Reliability in messaging is not free — it comes with trade-offs in throughput, latency, and complexity. At-least-once delivery is the pragmatic default for most systems because it pushes the deduplication responsibility to consumers (who must be idempotent anyway) rather than paying the steep cost of exactly-once guarantees at the broker level.

Overview

Message queues enable asynchronous communication between services by decoupling producers from consumers. They buffer work, smooth traffic spikes, and enable patterns like pub/sub, fan-out, and exactly-once processing. Common technologies include Apache Kafka, RabbitMQ, AWS SQS/SNS, NATS, and Apache Pulsar.

Core Concepts

Messaging Models

Point-to-Point:
[Producer] --> [Queue] --> [Consumer]
                            (one consumer gets each message)

Pub/Sub:
[Producer] --> [Topic] --> [Subscriber A]
                       --> [Subscriber B]
                       --> [Subscriber C]
                            (all subscribers get every message)

Fan-Out:
[Producer] --> [Exchange/SNS] --> [Queue A] --> [Consumer A]
                              --> [Queue B] --> [Consumer B]
                              --> [Queue C] --> [Consumer C]
                            (each queue has independent consumers)

Delivery Guarantees

  • At-Most-Once: Fire and forget. Fast but messages can be lost.
  • At-Least-Once: Messages are retried until acknowledged. May produce duplicates.
  • Exactly-Once: Achieved through idempotency or transactional processing. Highest overhead.

Ordering Guarantees

  • Global ordering: All messages processed in order (single partition). Limits throughput.
  • Partition ordering: Messages with the same key are ordered within a partition. Best balance of order and throughput.
  • No ordering: Maximum throughput but messages may arrive out of order.

Backpressure

When consumers cannot keep up, the queue grows. Strategies: increase consumers (auto-scaling), apply backpressure to producers, drop low-priority messages, or use dead-letter queues for failed messages.

Implementation Patterns

Competing Consumers

Multiple consumer instances read from the same queue. Each message is delivered to exactly one consumer. Scales horizontally by adding consumers up to the number of partitions.

Fan-Out with SNS + SQS (or Exchange + Queues)

A producer publishes to a topic. Each subscriber has its own queue fed by the topic. Subscribers process at their own pace without affecting each other. Ideal for cross-cutting events like "OrderPlaced" that trigger notification, analytics, and inventory updates.

Dead-Letter Queue (DLQ)

Messages that fail processing after a configured number of retries are moved to a DLQ. This prevents poison messages from blocking the queue while preserving them for investigation and replay.

Delayed / Scheduled Messages

Some brokers support delayed delivery (RabbitMQ delayed exchange, SQS delay queues). Useful for retry-after patterns, scheduled jobs, or time-windowed processing.

Transactional Outbox

Combine with the outbox pattern: write the message to a database outbox table in the same transaction as the business operation, then a relay process publishes it to the broker. Guarantees consistency between state change and event publication.

Trade-offs

FactorKafka (Log-Based)RabbitMQ (Traditional Broker)SQS (Managed)
ThroughputVery highModerateHigh
OrderingPer-partitionPer-queue (FIFO)FIFO optional
ReplayYes (log retention)No (consumed = gone)No
Operational costHigh (self-managed)ModerateLow (managed)
Routing flexibilityLimitedRich (exchanges)Basic

Choose Kafka for high-throughput event streaming with replay. Choose RabbitMQ for complex routing and traditional task queues. Choose SQS for simple, managed queue needs.

Best Practices

  • Design consumers to be idempotent — at-least-once delivery means duplicates will happen, so processing the same message twice must produce the same result.
  • Use partition keys thoughtfully to maintain ordering where it matters (e.g., all events for a given user go to the same partition) while still allowing parallel processing.
  • Set up dead-letter queues with alerting from day one; unprocessable messages should be surfaced immediately, not silently dropped or left to block the queue.

Common Pitfalls

  • Using a message queue as a database — queues are for transit, not long-term storage; if you need to query or search messages, store them in a proper data store.
  • Introducing ordering constraints across partitions or topics, which forces serial processing and eliminates the throughput benefits of partitioning.

Anti-Patterns

  • Queue as Database: Storing messages in the queue for long-term retrieval or querying. Queues are transit buffers, not data stores. If you need to search or aggregate messages, write them to a proper database.

  • Non-Idempotent Consumers: Assuming each message will be delivered exactly once. At-least-once delivery means duplicates will arrive, and a consumer that charges a credit card or sends a notification on every delivery will produce real-world harm.

  • Invisible Dead Letters: Configuring a dead-letter queue but never monitoring it. Poison messages silently accumulate, representing data loss or broken business processes that nobody investigates.

  • Tight Coupling via Message Format: Encoding producer-specific implementation details in the message schema, so consumers break whenever the producer refactors internally. Messages should represent business events, not internal data structures.

  • Ignoring Backpressure: Letting the queue grow unboundedly when consumers fall behind. Eventually the broker runs out of disk or memory, and the system fails catastrophically instead of degrading gracefully through consumer scaling or producer throttling.

Install this skill directly: skilldb add system-design-skills

Get CLI access →