Autonomous AgentsAutonomous Agent111 lines

Queue Processing

Implementing job queues and background processing — queue selection, retry policies, dead letter queues, concurrency control, and reliable idempotent job execution.

Quick Summary18 lines

You are an AI agent that designs and implements robust job queue systems for background processing. You understand the trade-offs between different queue technologies, the importance of reliability guarantees, and how to handle the many failure modes of asynchronous work.

## Key Points

- **Redis-based (Bull, BullMQ, Sidekiq, Celery with Redis)**: Low latency, good for most workloads, requires Redis infrastructure. Excellent for web applications needing fast background jobs.
- **AWS SQS**: Managed, highly durable, scales to any volume. At-least-once delivery with standard queues, exactly-once with FIFO queues. No infrastructure to manage.
- **RabbitMQ**: Full AMQP support, flexible routing, dead letter exchanges built in. Good when you need complex routing or exchange patterns.
- Serialize job arguments as JSON — avoid passing complex objects, class instances, or closures
- Store identifiers (user ID, order ID) rather than full objects — the data may change between enqueue and processing
- Include a job type or name field to route to the correct handler
- Version your job payloads so workers can handle jobs enqueued by older code during deployments
- **Transient failures** (network timeout, temporary 503): Retry with exponential backoff. Start at 1 second, double each attempt, cap at a reasonable maximum (e.g., 5 minutes).
- **Permanent failures** (invalid input, missing resource): Do not retry. Route to dead letter queue immediately.
- **Ambiguous failures** (database connection lost mid-operation): Retry, but the handler must be idempotent since the operation may have partially completed.
- Store the original job payload, error messages, attempt history, and timestamps
- Monitor DLQ depth with alerts — a growing DLQ indicates a systemic problem

skilldb get autonomous-agent-skills/Queue ProcessingFull skill: 111 lines

Paste into your CLAUDE.md or agent config

Queue Processing

Philosophy

Job queues decouple work production from work execution. Instead of processing expensive operations inline (sending emails, generating reports, resizing images), the request enqueues a job and returns immediately. A worker picks up the job later and processes it independently. This improves response times, enables horizontal scaling of workers, and provides natural retry boundaries.

The fundamental contract of a queue is: every job that goes in must either complete successfully or be explicitly handled as a failure. Jobs must never silently disappear.

Techniques

Queue Technology Selection

Choose based on your requirements:

Redis-based (Bull, BullMQ, Sidekiq, Celery with Redis): Low latency, good for most workloads, requires Redis infrastructure. Excellent for web applications needing fast background jobs.
AWS SQS: Managed, highly durable, scales to any volume. At-least-once delivery with standard queues, exactly-once with FIFO queues. No infrastructure to manage.
RabbitMQ: Full AMQP support, flexible routing, dead letter exchanges built in. Good when you need complex routing or exchange patterns.
PostgreSQL-based (Graphile Worker, pg-boss): Uses your existing database as the queue. Transactional job enqueuing (enqueue within the same transaction as your write). Lower throughput ceiling but eliminates an infrastructure dependency.

Job Serialization

Jobs must be serializable to survive the queue:

Serialize job arguments as JSON — avoid passing complex objects, class instances, or closures
Store identifiers (user ID, order ID) rather than full objects — the data may change between enqueue and processing
Include a job type or name field to route to the correct handler
Version your job payloads so workers can handle jobs enqueued by older code during deployments

Retry Policies

Not all failures are alike. Design retry strategies accordingly:

Transient failures (network timeout, temporary 503): Retry with exponential backoff. Start at 1 second, double each attempt, cap at a reasonable maximum (e.g., 5 minutes).
Permanent failures (invalid input, missing resource): Do not retry. Route to dead letter queue immediately.
Ambiguous failures (database connection lost mid-operation): Retry, but the handler must be idempotent since the operation may have partially completed.

Set a maximum retry count (3-5 for most jobs) to prevent infinite retry loops. Add jitter to backoff intervals to avoid thundering herds when many jobs fail simultaneously.

Dead Letter Queues

When a job exhausts its retries, move it to a dead letter queue (DLQ):

Store the original job payload, error messages, attempt history, and timestamps
Monitor DLQ depth with alerts — a growing DLQ indicates a systemic problem
Provide tooling to inspect dead letters, fix the underlying issue, and replay them
Never auto-purge dead letters without human review

Concurrency Control

Manage how many jobs process simultaneously:

Set worker concurrency based on job characteristics: CPU-bound jobs should match core count; I/O-bound jobs can exceed it
Use named queues or priority levels to isolate different job types
Implement rate limiting within workers when they call external APIs
Consider global concurrency locks for jobs that must not run in parallel (e.g., jobs modifying the same resource)

Job Priorities

When some jobs are more urgent than others:

Use separate queues per priority level with weighted consumption
Or use a single queue with priority values if the queue technology supports it
Avoid starvation: ensure low-priority jobs still eventually process
Critical system jobs (alerting, security) should bypass normal priority queuing entirely

Progress Tracking

For long-running jobs, provide visibility:

Update a progress field in the job metadata as processing advances
Emit progress events that the frontend can poll or subscribe to via WebSocket
Store intermediate state so interrupted jobs can resume rather than restart
Report meaningful progress units (records processed, files generated) not percentages

Graceful Shutdown

Workers must shut down cleanly:

On receiving a shutdown signal (SIGTERM), stop accepting new jobs
Allow in-progress jobs to complete within a timeout window
If the timeout expires, release the job back to the queue for another worker
Never kill a worker mid-job without returning the job to the queue

Best Practices

Make every job handler idempotent — at-least-once delivery means duplicate processing is possible
Enqueue jobs within the same database transaction as the triggering write when possible
Keep job payloads small — store large data in object storage and pass references
Use separate queues for jobs with different latency requirements
Monitor queue depth, processing time, and failure rate as key operational metrics
Test job handlers with simulated failures to verify retry behavior
Log job lifecycle events (enqueued, started, completed, failed, retried) with correlation IDs

Anti-Patterns

The Mega Job: A single job that does 15 things — if step 12 fails, steps 1-11 must be repeated
The Optimistic Skip: Not implementing dead letter handling because "jobs rarely fail"
The Hard Kill: Sending SIGKILL to workers instead of SIGTERM, losing in-progress work
The Unbounded Retry: Retrying forever without a maximum attempt count or dead letter destination
The Fat Payload: Passing megabytes of data in the job payload instead of a reference to stored data
The Fire and Forget: Enqueuing jobs without any monitoring of queue depth or processing success rates
The Implicit Dependency: Job handlers that assume specific ordering of other jobs without explicit synchronization
The Shared Mutable State: Multiple concurrent workers modifying the same resource without coordination

Install this skill directly: skilldb add autonomous-agent-skills

Get CLI access →

Queue Processing

Queue Processing

Philosophy

Techniques

Queue Technology Selection

Job Serialization

Retry Policies

Dead Letter Queues

Concurrency Control

Job Priorities

Progress Tracking

Graceful Shutdown

Best Practices

Anti-Patterns

Related Skills

Abstraction Control

Accessibility Implementation

API Design Patterns

API Integration

Assumption Validation

Authentication Implementation