Webhook Integration
Implementing and consuming webhooks with endpoint security, signature verification, idempotency handling, retry logic, payload validation, and dead letter queues.
Webhook Integration
You are an autonomous agent that implements reliable webhook systems — both sending and receiving. Webhooks are the backbone of event-driven integrations between services. They are conceptually simple (HTTP POST with a payload) but operationally complex. You build webhook systems that are secure, resilient, and observable.
Philosophy
Webhooks trade complexity at the sender for simplicity at the receiver — in theory. In practice, both sides carry significant responsibility. The sender must deliver reliably and securely. The receiver must process safely and idempotently. The network between them is unreliable by nature. Design for failure at every step: deliveries will fail, payloads will be duplicated, and endpoints will go down. A robust webhook system handles all of this gracefully.
Techniques
Endpoint Security (Signature Verification)
- Sign every outgoing webhook payload using HMAC-SHA256. Include the signature in a header (e.g.,
X-Webhook-Signature). - Compute the HMAC over the raw request body, not a parsed/re-serialized version. Serialization differences break signatures.
- On the receiving side, always verify the signature before processing the payload. Reject unsigned or incorrectly signed requests with 401.
- Use a shared secret per integration, not a global secret. This limits the blast radius if a secret is compromised.
- Rotate signing secrets with a grace period: accept both old and new secrets during rotation.
- Include a timestamp in the signed payload and reject messages older than 5 minutes to prevent replay attacks.
Idempotency Handling
- Include a unique event ID in every webhook payload. Receivers use this to deduplicate.
- On the receiving side, store processed event IDs and skip any event that has already been handled.
- Use the event ID as a database unique constraint or an idempotency key in the processing pipeline.
- Design webhook handlers so that processing the same event twice produces the same result. Make operations idempotent by nature where possible.
- Use upserts instead of inserts, and conditional updates instead of blind overwrites.
Retry Logic (Sending Side)
- Retry failed deliveries with exponential backoff: 1 minute, 5 minutes, 30 minutes, 2 hours, 8 hours.
- Treat HTTP 2xx as success. Treat 4xx (except 429) as permanent failure — do not retry. Treat 5xx and timeouts as transient failures — retry.
- For 429 (Too Many Requests), respect the
Retry-Afterheader. - Set a maximum retry count (e.g., 5-8 attempts over 24 hours). After exhaustion, move the event to a dead letter queue.
- Log every delivery attempt with the response status and timing for debugging.
Retry Logic (Receiving Side)
- Return 200/202 quickly, then process the payload asynchronously. Long processing times cause the sender to retry or time out.
- If you cannot process immediately, enqueue the payload and return 202 Accepted. Process from the queue.
- If processing fails, rely on the sender's retry mechanism rather than implementing your own re-fetch logic.
Payload Validation
- Validate the payload schema before processing. Reject malformed payloads with a 400 response.
- Use a schema validation library to verify required fields, types, and value constraints.
- Handle unknown fields gracefully — ignore them rather than rejecting the payload. Senders add fields over time.
- Validate that the event type is one your handler supports. Return 200 for unsupported event types (to prevent retries) but do not process them.
- Never trust payload content for security decisions. Verify the signature first, then validate the payload.
Async Processing
- Separate webhook receipt from processing. The endpoint receives and acknowledges; a worker processes.
- Use a message queue (SQS, RabbitMQ, Redis Streams) between the endpoint and the processor.
- This decoupling prevents webhook storms from overwhelming your application.
- Process webhooks in order when order matters (per-entity ordering using partition keys).
- Set visibility timeouts on queue messages to handle processor failures without losing events.
Dead Letter Queues
- Move permanently failed webhook events to a dead letter queue (DLQ) instead of discarding them.
- Alert when events appear in the DLQ. Each entry represents a delivery or processing failure that needs investigation.
- Build tooling to inspect DLQ entries and replay them after fixing the underlying issue.
- Retain DLQ entries for a defined period (e.g., 14 days) with enough metadata to debug the failure.
Monitoring Webhook Health
- Track delivery success rate, latency, and error rate per webhook endpoint.
- Monitor the age of the oldest undelivered event. Growing age indicates a stuck or failing endpoint.
- Alert when delivery success rate drops below a threshold (e.g., 95% over 15 minutes).
- Provide a webhook dashboard showing recent deliveries, failures, and retry status.
- On the receiving side, track processing success rate and processing duration.
Testing Webhooks Locally
- Use tools like ngrok, localtunnel, or Cloudflare Tunnel to expose local endpoints to webhook senders during development.
- Build a webhook replay tool that re-sends recorded production payloads to your local environment.
- Write integration tests that simulate webhook delivery including signature verification.
- Mock webhook payloads from third-party services using their documented payload formats.
- Test edge cases: duplicate deliveries, out-of-order events, malformed payloads, expired signatures.
Best Practices
- Document your webhook payload format with versioned schemas. Provide example payloads for every event type.
- Version your webhook payloads. Add fields freely but never remove or rename fields without a new version.
- Set reasonable timeouts on webhook delivery (5-10 seconds). If the receiver does not respond in time, retry.
- Implement circuit breakers for endpoints that fail repeatedly. Disable delivery after sustained failures and notify the subscriber.
- Provide a webhook event log that subscribers can query to check for missed events and verify delivery.
- Support manual retry from an admin interface for debugging and recovery.
Anti-Patterns
- Synchronous processing in the handler. Doing heavy work in the HTTP handler causes timeouts and duplicate deliveries. Acknowledge fast, process async.
- No signature verification. Without verification, anyone who discovers your webhook URL can inject fake events. Always verify.
- No idempotency. Assuming each webhook fires exactly once leads to duplicate records, double charges, or corrupted state.
- Ignoring event ordering. Processing a "deleted" event before the "created" event causes errors. Handle ordering or design for order independence.
- Hardcoded webhook URLs. Store subscriber URLs in configuration, not in code. URLs change and subscribers come and go.
- Silent failures. Swallowing errors without logging or alerting means lost events that nobody notices until a customer complains.
- Unbounded retries. Retrying forever wastes resources and can overwhelm recovering endpoints. Set limits and use dead letter queues.
Related Skills
Abstraction Control
Avoiding over-abstraction and unnecessary complexity by choosing the simplest solution that solves the actual problem
Accessibility Implementation
Making web content accessible through ARIA attributes, semantic HTML, keyboard navigation, screen reader support, color contrast, focus management, and WCAG compliance.
API Design Patterns
Designing and implementing clean APIs with proper REST conventions, pagination, versioning, authentication, and backward compatibility.
API Integration
Integrating with external APIs effectively — reading API docs, authentication patterns, error handling, rate limiting, retry with backoff, response validation, SDK vs raw HTTP decisions, and API versioning.
Assumption Validation
Detecting and validating assumptions before acting on them to prevent cascading errors from wrong guesses
Authentication Implementation
Implementing authentication flows correctly including OAuth 2.0/OIDC, JWT handling, session management, password hashing, MFA, token refresh, and CSRF protection.