Skip to main content
Technology & EngineeringApi Design164 lines

Webhook Design

Webhook design and delivery patterns for reliable, secure event-driven API integrations

Quick Summary17 lines
You are an expert in webhook design and delivery patterns for designing robust APIs.

## Key Points

- Respond with 2xx within 5 seconds and process the event asynchronously; if the consumer takes too long, the sender may time out and retry.
- Include a webhook event log or dashboard where consumers can inspect recent deliveries, replay events, and debug failures.
- Disable endpoints automatically after repeated failures and notify the consumer so broken integrations do not silently accumulate retry load.
- Delivering events without a signature, leaving consumers unable to verify authenticity and vulnerable to spoofed payloads.
- Assuming exactly-once delivery; network issues and retries make at-least-once the realistic guarantee, so consumers must be idempotent.

## Quick Example

```
Webhook-Signature: t=1694789400,v1=5257a869e7ecebeda32affa62cdca3fa51cad7e77a0e56ff536d0ce8e108d8bd
```
skilldb get api-design-skills/Webhook DesignFull skill: 164 lines
Paste into your CLAUDE.md or agent config

Webhook Design — API Design

You are an expert in webhook design and delivery patterns for designing robust APIs.

Core Philosophy

Overview

Webhooks invert the typical request-response flow: instead of consumers polling for changes, your API pushes events to consumer-registered URLs. This enables real-time integrations but introduces challenges around reliability, security, and ordering.

Core Concepts

Event Payload Structure

Design a consistent envelope that wraps every event type.

{
  "id": "evt_abc123",
  "type": "order.completed",
  "created_at": "2025-09-15T14:30:00Z",
  "api_version": "2025-01-01",
  "data": {
    "id": "order_456",
    "total": 99.00,
    "currency": "USD",
    "customer_id": "cust_789"
  }
}

Subscription Registration

Let consumers register endpoints and choose which event types they want to receive.

POST /v1/webhooks HTTP/1.1
Content-Type: application/json

{
  "url": "https://consumer.example.com/hooks",
  "events": ["order.completed", "order.refunded"],
  "secret": "whsec_..."
}

Signature Verification

Sign every payload so consumers can verify it came from your API and was not tampered with.

Webhook-Signature: t=1694789400,v1=5257a869e7ecebeda32affa62cdca3fa51cad7e77a0e56ff536d0ce8e108d8bd

Implementation Patterns

HMAC Signature Generation

import hashlib, hmac, time

def sign_payload(payload_bytes: bytes, secret: str) -> str:
    timestamp = int(time.time())
    signed_content = f"{timestamp}.{payload_bytes.decode()}"
    signature = hmac.new(
        secret.encode(),
        signed_content.encode(),
        hashlib.sha256
    ).hexdigest()
    return f"t={timestamp},v1={signature}"

Consumer-Side Verification

def verify_webhook(payload: bytes, header: str, secret: str, tolerance_sec=300):
    parts = dict(p.split("=", 1) for p in header.split(","))
    timestamp = int(parts["t"])

    if abs(time.time() - timestamp) > tolerance_sec:
        raise ValueError("Timestamp outside tolerance — possible replay attack")

    expected = hmac.new(
        secret.encode(),
        f"{timestamp}.{payload.decode()}".encode(),
        hashlib.sha256
    ).hexdigest()

    if not hmac.compare_digest(expected, parts["v1"]):
        raise ValueError("Invalid signature")

Retry with Exponential Backoff

Retry failed deliveries with increasing delays and a maximum attempt count.

RETRY_SCHEDULE = [60, 300, 1800, 7200, 28800]  # seconds

async def deliver_webhook(event, endpoint):
    for attempt, delay in enumerate(RETRY_SCHEDULE):
        try:
            response = await http.post(
                endpoint.url,
                json=event,
                headers={"Webhook-Signature": sign_payload(event, endpoint.secret)},
                timeout=30,
            )
            if 200 <= response.status_code < 300:
                await record_delivery(event["id"], endpoint.id, "delivered")
                return
            if response.status_code < 500:
                # 4xx — do not retry client errors
                await record_delivery(event["id"], endpoint.id, "rejected")
                return
        except Exception:
            pass
        await asyncio.sleep(delay)
    await record_delivery(event["id"], endpoint.id, "failed")
    await disable_endpoint_if_threshold(endpoint)

Idempotency on the Consumer Side

Consumers should deduplicate by event ID to handle redeliveries safely.

async def handle_webhook(request):
    event = await request.json()
    if await already_processed(event["id"]):
        return Response(status_code=200)  # Acknowledge but skip
    await process_event(event)
    await mark_processed(event["id"])
    return Response(status_code=200)

Best Practices

  • Respond with 2xx within 5 seconds and process the event asynchronously; if the consumer takes too long, the sender may time out and retry.
  • Include a webhook event log or dashboard where consumers can inspect recent deliveries, replay events, and debug failures.
  • Disable endpoints automatically after repeated failures and notify the consumer so broken integrations do not silently accumulate retry load.

Common Pitfalls

  • Delivering events without a signature, leaving consumers unable to verify authenticity and vulnerable to spoofed payloads.
  • Assuming exactly-once delivery; network issues and retries make at-least-once the realistic guarantee, so consumers must be idempotent.

Anti-Patterns

Over-engineering for hypothetical scale. Building for millions of users when you have hundreds adds complexity without value. Solve today's problems first.

Ignoring the existing ecosystem. Reinventing functionality that mature libraries already provide well wastes time and introduces unnecessary risk.

Premature abstraction. Creating elaborate frameworks and utilities before you have enough concrete cases to know what the abstraction should look like produces the wrong abstraction.

Neglecting error handling at boundaries. Internal code can trust its inputs, but system boundaries (user input, APIs, file I/O) require defensive validation.

Skipping documentation for obvious code. What is obvious to you today will not be obvious to your colleague next month or to you next year.

Install this skill directly: skilldb add api-design-skills

Get CLI access →