Idempotency Audit
Verify that re-running any action produces safe, consistent results. In distributed systems, retries are inevitable: clients timeout, networks drop, queues redeliver, webhooks fire twice. If your operations are not idempotent, every retry is a potential data corruption event. ## Key Points 1. Generate an idempotency key (UUID). 2. Send POST /api/resources with the key and a creation payload. 3. Record the response (201 Created, resource ID). 4. Send the exact same POST with the same idempotency key. 5. Record the response. - [ ] Second request returns the same resource ID as the first. - [ ] Second request returns 200 (cached response) or 201 (same resource). - [ ] Only one resource exists in the database. - [ ] No side effects executed twice (no duplicate email, no duplicate job). - Two resources created with identical data. - Second request returns 409 Conflict without returning the original resource. - Side effects (email, webhook, job) triggered twice. ## Quick Example ``` [ ] Webhook handler extracts event ID from payload [ ] Event ID checked against processed-events store BEFORE processing [ ] Processing is wrapped in: if not processed, process and record [ ] Processed-events store has TTL (e.g., 30 days) to prevent unbounded growth [ ] Response is 200 even for duplicate (so provider stops retrying) ```
skilldb get production-audit-skills/idempotency-auditFull skill: 457 linesIdempotency Audit
Purpose
Verify that re-running any action produces safe, consistent results. In distributed systems, retries are inevitable: clients timeout, networks drop, queues redeliver, webhooks fire twice. If your operations are not idempotent, every retry is a potential data corruption event.
This audit systematically checks every mutation endpoint, background job, and event handler for safe re-execution.
Scope
| Category | What We Test |
|---|---|
| API endpoints | POST/PUT/PATCH re-submission with same payload |
| Background jobs | Queue redelivery, duplicate job execution |
| Webhooks / callbacks | Duplicate event delivery from providers |
| Billing / payments | Double-charge prevention |
| Notifications | Duplicate email/push/SMS prevention |
| Resource creation | Duplicate record prevention |
| File operations | Duplicate upload/processing prevention |
| State transitions | Re-applying same transition |
Risk Pattern Table
| Pattern | What It Hits | Risk | Symptom |
|---|---|---|---|
| POST without idempotency key | API, Data | HIGH | Client timeout + retry creates duplicate record |
| Queue without deduplication | Jobs | HIGH | Failed ack -> redelivery -> job runs twice |
| Webhook without event ID tracking | API, Data | HIGH | Provider retries webhook, side effects execute twice |
| Payment without charge idempotency | Billing | CRITICAL | Timeout + retry = customer charged twice |
| Email/notification without dedupe | UX | MEDIUM | User receives duplicate notifications |
| Counter increment on retry | Data | HIGH | Retry increments counter twice; stats are wrong |
| File creation on retry | Storage | MEDIUM | Duplicate files created, storage waste |
| DB insert without unique constraint | Data | HIGH | Duplicate rows; breaks assumptions downstream |
| Upsert that re-triggers side effects | Data, Billing | HIGH | Upsert succeeds (no duplicate row) but side effects fire again |
| Callback sets status unconditionally | State | MEDIUM | Late callback overwrites newer status |
Concrete Test Cases
TEST-ID-001: Repeated POST with Same Payload
Objective: Verify that sending the same create request twice does not create duplicate resources.
Steps:
- Generate an idempotency key (UUID).
- Send POST /api/resources with the key and a creation payload.
- Record the response (201 Created, resource ID).
- Send the exact same POST with the same idempotency key.
- Record the response.
Pass Criteria:
- Second request returns the same resource ID as the first.
- Second request returns 200 (cached response) or 201 (same resource).
- Only one resource exists in the database.
- No side effects executed twice (no duplicate email, no duplicate job).
Fail Criteria:
- Two resources created with identical data.
- Second request returns 409 Conflict without returning the original resource.
- Side effects (email, webhook, job) triggered twice.
Test Without Idempotency Key:
# If the endpoint does NOT support idempotency keys, test natural deduplication:
curl -X POST /api/projects -d '{"name": "Test Project"}' &
curl -X POST /api/projects -d '{"name": "Test Project"}' &
wait
# How many projects named "Test Project" exist?
# If 2: FAIL (no deduplication mechanism)
TEST-ID-002: Webhook Replay
Objective: Verify that replaying a webhook event does not cause duplicate side effects.
Steps:
- Capture a webhook payload from a provider (payment processor, AI service, etc.).
- Send it to the webhook endpoint.
- Verify the side effect (status update, credit applied, etc.).
- Send the exact same webhook payload again.
- Verify no duplicate side effect.
Pass Criteria:
- First delivery: side effect executes, 200 returned.
- Second delivery: no side effect, 200 returned (not 4xx error).
- Third delivery: same as second.
- Event ID is recorded and checked before processing.
Implementation Check:
[ ] Webhook handler extracts event ID from payload
[ ] Event ID checked against processed-events store BEFORE processing
[ ] Processing is wrapped in: if not processed, process and record
[ ] Processed-events store has TTL (e.g., 30 days) to prevent unbounded growth
[ ] Response is 200 even for duplicate (so provider stops retrying)
Webhook Idempotency Template:
def handle_webhook(request):
event_id = request.json['event_id']
# Check if already processed
if await event_store.exists(event_id):
return Response(status=200) # Acknowledge but don't reprocess
# Process the event
try:
await process_event(request.json)
await event_store.record(event_id, ttl=timedelta(days=30))
except Exception as e:
# Do NOT record the event ID if processing failed
# This allows genuine retries to succeed
raise
return Response(status=200)
TEST-ID-003: Client Timeout + Retry
Objective: Verify that a client timeout followed by retry does not cause duplicate operations.
Steps:
- Send a request to a slow endpoint (e.g., generation that takes 30s).
- Set client timeout to 5 seconds.
- Client times out; request may or may not have been processed server-side.
- Client retries with same idempotency key.
- Verify outcome.
Pass Criteria:
- If original request completed: retry returns the completed result.
- If original request is still processing: retry returns "in progress" status.
- If original request failed: retry re-executes (idempotency key is cleared on failure).
- No duplicate resources, charges, or side effects.
Scenarios:
Scenario A: Server completed before retry
Request 1: sent -> server processes -> client timeout -> server commits
Request 2: sent -> server checks idempotency key -> returns cached result
Result: One resource, one response. CORRECT.
Scenario B: Server still processing when retry arrives
Request 1: sent -> server starts processing -> client timeout
Request 2: sent -> server detects in-progress operation -> returns 409 or poll URL
Result: One operation, client waits. CORRECT.
Scenario C: Server failed before retry
Request 1: sent -> server fails -> idempotency key NOT stored (or marked failed)
Request 2: sent -> server re-executes -> succeeds
Result: One resource from retry. CORRECT.
TEST-ID-004: Double Billing Prevention
Objective: Verify that payment operations cannot be duplicated.
Steps:
- Initiate a payment/charge operation.
- Simulate: client timeout, webhook retry, server restart.
- Verify only one charge exists.
Pass Criteria:
- Payment processor called with idempotency key.
- Webhook that confirms payment is idempotent (checked by event ID).
- User's balance/credits updated exactly once.
- Audit log shows one charge, not two.
Critical Verification:
[ ] Payment API calls include provider-level idempotency key
- Stripe: Idempotency-Key header
- PayPal: PayPal-Request-Id header
[ ] Credit/balance updates use atomic operations
- NOT: read balance, add amount, write balance (race condition)
- YES: UPDATE SET balance = balance + amount WHERE charge_id NOT IN (processed)
[ ] Payment webhook handler checks event ID before applying credits
[ ] Refund operations are also idempotent
TEST-ID-005: Queue Job Redelivery
Objective: Verify that a job redelivered by the queue system does not execute twice.
Steps:
- Enqueue a job.
- Worker picks up the job and processes it.
- Simulate: worker crashes before acknowledging the job.
- Queue redelivers the job to another worker.
- Second worker processes the job.
- Verify only one result exists.
Pass Criteria:
- Output exists exactly once (not duplicated).
- External API called exactly once (or second call is a no-op due to provider idempotency).
- Database records created exactly once.
- Billing/metering reflects one execution, not two.
Implementation Patterns:
Pattern 1: Idempotent job design
- Job writes results keyed by job ID
- Re-execution overwrites (upsert) instead of creating new
- External calls use job ID as idempotency key
Pattern 2: Deduplication at queue level
- Queue checks message deduplication ID before delivery
- SQS: MessageDeduplicationId
- Redis: SET NX with job ID before processing
Pattern 3: Exactly-once processing with transactions
- Process job AND mark as complete in same DB transaction
- If transaction fails, both roll back (safe to retry)
- If transaction succeeds, re-delivery finds job already complete
TEST-ID-006: Notification Deduplication
Objective: Verify that users do not receive duplicate notifications.
Steps:
- Trigger an action that sends a notification (email, push, SMS).
- Retry the same action.
- Check notification delivery.
Pass Criteria:
- One notification sent (not two).
- Notification keyed by: (user_id, notification_type, entity_id, event_id).
- Deduplication window appropriate for notification type (e.g., 1 hour for email).
TEST-ID-007: Re-applying Same State Transition
Objective: Verify that applying a state transition that has already been applied is safe.
Steps:
- Transition a job from "processing" to "completed".
- Attempt to transition the same job from "processing" to "completed" again.
Pass Criteria:
- Second attempt is rejected (job is no longer in "processing" state).
- No side effects re-executed (no duplicate "completed" webhook, email, etc.).
- Error message is clear: "Job is already in 'completed' state."
- Alternatively: second attempt returns success idempotently (acknowledges completion).
Making Non-Idempotent Operations Safe
Pattern: Idempotency Key Store
Request -> Extract idempotency key -> Check store:
Found + completed: Return cached response
Found + in-progress: Return 409 or wait
Found + failed: Clear entry, re-execute
Not found: Record key as in-progress, execute, store response
Store schema:
idempotency_key: string (primary key)
status: 'in_progress' | 'completed' | 'failed'
response_code: int
response_body: json
created_at: timestamp
expires_at: timestamp (TTL for cleanup)
Pattern: Natural Idempotency Keys
Not all operations need client-provided keys. Some have natural deduplication:
| Operation | Natural Key | Implementation |
|-----------|------------|----------------|
| "Generate assets for project X" | project_id + operation_type | UNIQUE(project_id, op_type) WHERE status = 'active' |
| "Process webhook event ABC" | event_id | UNIQUE(event_id) in processed_events |
| "Send welcome email to user Y" | user_id + email_type | UNIQUE(user_id, email_type) with time window |
| "Charge $10 for order Z" | order_id | UNIQUE(order_id) in charges |
Pattern: Upsert with Side-Effect Guard
-- Insert or find existing
INSERT INTO resources (id, data, created_at)
VALUES (gen_id(), 'payload', NOW())
ON CONFLICT (natural_key) DO NOTHING
RETURNING id, (xmax = 0) as was_inserted;
-- Only execute side effects if was_inserted = true
-- This prevents duplicate side effects on retry
Pattern: Outbox for Side Effects
Instead of:
1. Insert record
2. Send email <- If this fails, retry re-inserts record (duplicate)
Use outbox:
1. In single transaction: Insert record + Insert outbox entry
2. Separate process reads outbox, sends email, marks entry as sent
3. Retry-safe: outbox entry checked before re-sending
Idempotency Audit Matrix
For every mutation in the system, fill in:
| Endpoint / Operation | Idempotency Key | Dedupe Mechanism | Side Effects | Side Effect Guard | Verdict |
|---------------------|-----------------|------------------|-------------|-------------------|---------|
| POST /projects | X-Idempotency-Key | Key store | Email, webhook | Outbox pattern | PASS |
| POST /generate | project_id | Active job check | API call, storage | Job dedupe | PASS |
| POST /webhooks/payment | event_id | Processed events | Credit update | Event ID check | PASS |
| PUT /projects/:id | version field | Optimistic lock | None | N/A | PASS |
| POST /invite | email + project | Unique constraint | Email | NOT GUARDED | FAIL |
Post-Audit Checklist
[ ] All POST endpoints support idempotency keys (client-provided or natural)
[ ] Webhook handlers check event ID before processing
[ ] Payment operations use provider-level idempotency keys
[ ] Queue jobs are designed for safe redelivery
[ ] Side effects use outbox pattern or deduplication guard
[ ] Notifications deduplicated by (user, type, entity, time window)
[ ] State transitions reject duplicate applications
[ ] Idempotency key store has TTL-based cleanup
[ ] Failed operations clear idempotency key (allow genuine retry)
[ ] In-progress operations detected on retry (return 409 or poll URL)
[ ] Upsert operations guard side effects with was_inserted check
[ ] All idempotency mechanisms tested with automated retry simulation
What Earlier Audits Miss
Standard testing sends each request once. This audit matters because:
- Unit tests call each function once. They never test "what if this function runs twice with the same input?"
- Integration tests verify the happy path. They do not simulate network retries, webhook redelivery, or queue job duplication.
- Code reviews focus on correctness of single execution, not safety of repeated execution.
- QA testing clicks buttons once and verifies the result. They do not click twice or test with flaky network conditions.
- Payment testing verifies charges work, not that the same charge cannot happen twice.
This would be called an Idempotency Audit -- specifically testing whether re-running any mutation produces safe, consistent, non-duplicated results under client retry, webhook replay, queue redelivery, and network timeout conditions.
Automation Opportunities
| Test | Automatable? | Method |
|---|---|---|
| TEST-ID-001: Repeated POST | YES | Send same request twice with same idempotency key; assert single resource |
| TEST-ID-002: Webhook replay | YES | Replay captured webhook payload; assert no duplicate side effect |
| TEST-ID-003: Client timeout + retry | YES | Mock slow server, timeout client, retry; assert single result |
| TEST-ID-004: Double billing | YES | Send payment request twice; assert single charge in provider dashboard |
| TEST-ID-005: Queue redelivery | YES | Process job, skip ack, redeliver; assert single output |
| TEST-ID-006: Notification dedupe | YES | Trigger same action twice; assert single notification sent |
| TEST-ID-007: State transition replay | YES | Apply same transition twice; assert rejection or idempotent acceptance |
# Automated idempotency test for all POST endpoints
ENDPOINTS=("/api/projects" "/api/assets" "/api/generate")
for endpoint in "${ENDPOINTS[@]}"; do
KEY=$(uuidgen)
R1=$(curl -s -w "\n%{http_code}" -X POST "$endpoint" \
-H "X-Idempotency-Key: $KEY" \
-H "Content-Type: application/json" \
-d '{"name": "idempotency-test"}')
R2=$(curl -s -w "\n%{http_code}" -X POST "$endpoint" \
-H "X-Idempotency-Key: $KEY" \
-H "Content-Type: application/json" \
-d '{"name": "idempotency-test"}')
ID1=$(echo "$R1" | head -1 | jq -r '.id')
ID2=$(echo "$R2" | head -1 | jq -r '.id')
[ "$ID1" = "$ID2" ] && echo "PASS: $endpoint" || echo "FAIL: $endpoint (id1=$ID1, id2=$ID2)"
done
Reusable Audit Report Template
# Idempotency Audit Report
## System: _______________
## Date: YYYY-MM-DD
## Auditor: _______________
## Mutation Inventory
| Endpoint/Operation | Idempotency Key | Dedupe Mechanism | Side Effects Guarded? | Verdict |
|-------------------|-----------------|------------------|----------------------|---------|
| POST /projects | | | | |
| POST /generate | | | | |
| Webhook handler | | | | |
## Test Results
| Test ID | Description | Result | Evidence |
|---------|-------------|--------|----------|
| TEST-ID-001 | Repeated POST | PASS/FAIL | Duplicate records: ___ |
| TEST-ID-002 | Webhook replay | PASS/FAIL | Duplicate side effects: ___ |
| TEST-ID-003 | Timeout + retry | PASS/FAIL | Duplicate operations: ___ |
| TEST-ID-004 | Double billing | PASS/FAIL | Double charges: ___ |
| TEST-ID-005 | Queue redelivery | PASS/FAIL | Duplicate outputs: ___ |
| TEST-ID-006 | Notification dedupe | PASS/FAIL | Duplicate notifications: ___ |
| TEST-ID-007 | Transition replay | PASS/FAIL | Duplicate transitions: ___ |
## Score: PASS / PARTIAL / FAIL
Priority Targeting
Run this audit FIRST if:
- Users report duplicate records appearing
- Billing shows double charges
- Users receive duplicate emails or notifications
- The system processes webhooks from external providers
- Background jobs use at-least-once delivery queues
- Any operation involves external API calls that cost money
- Client-side retry logic exists (axios retry, fetch retry, etc.)
Install this skill directly: skilldb add production-audit-skills