UncategorizedProduction Audit457 lines

Idempotency Audit

Quick Summary28 lines

Verify that re-running any action produces safe, consistent results. In distributed systems, retries are inevitable: clients timeout, networks drop, queues redeliver, webhooks fire twice. If your operations are not idempotent, every retry is a potential data corruption event.

## Key Points

1. Generate an idempotency key (UUID).
2. Send POST /api/resources with the key and a creation payload.
3. Record the response (201 Created, resource ID).
4. Send the exact same POST with the same idempotency key.
5. Record the response.
- [ ] Second request returns the same resource ID as the first.
- [ ] Second request returns 200 (cached response) or 201 (same resource).
- [ ] Only one resource exists in the database.
- [ ] No side effects executed twice (no duplicate email, no duplicate job).
- Two resources created with identical data.
- Second request returns 409 Conflict without returning the original resource.
- Side effects (email, webhook, job) triggered twice.

## Quick Example

```
[ ] Webhook handler extracts event ID from payload
[ ] Event ID checked against processed-events store BEFORE processing
[ ] Processing is wrapped in: if not processed, process and record
[ ] Processed-events store has TTL (e.g., 30 days) to prevent unbounded growth
[ ] Response is 200 even for duplicate (so provider stops retrying)
```

skilldb get production-audit-skills/idempotency-auditFull skill: 457 lines

Paste into your CLAUDE.md or agent config

Idempotency Audit

Purpose

Verify that re-running any action produces safe, consistent results. In distributed systems, retries are inevitable: clients timeout, networks drop, queues redeliver, webhooks fire twice. If your operations are not idempotent, every retry is a potential data corruption event.

This audit systematically checks every mutation endpoint, background job, and event handler for safe re-execution.

Scope

Category	What We Test
API endpoints	POST/PUT/PATCH re-submission with same payload
Background jobs	Queue redelivery, duplicate job execution
Webhooks / callbacks	Duplicate event delivery from providers
Billing / payments	Double-charge prevention
Notifications	Duplicate email/push/SMS prevention
Resource creation	Duplicate record prevention
File operations	Duplicate upload/processing prevention
State transitions	Re-applying same transition

Risk Pattern Table

Pattern	What It Hits	Risk	Symptom
POST without idempotency key	API, Data	HIGH	Client timeout + retry creates duplicate record
Queue without deduplication	Jobs	HIGH	Failed ack -> redelivery -> job runs twice
Webhook without event ID tracking	API, Data	HIGH	Provider retries webhook, side effects execute twice
Payment without charge idempotency	Billing	CRITICAL	Timeout + retry = customer charged twice
Email/notification without dedupe	UX	MEDIUM	User receives duplicate notifications
Counter increment on retry	Data	HIGH	Retry increments counter twice; stats are wrong
File creation on retry	Storage	MEDIUM	Duplicate files created, storage waste
DB insert without unique constraint	Data	HIGH	Duplicate rows; breaks assumptions downstream
Upsert that re-triggers side effects	Data, Billing	HIGH	Upsert succeeds (no duplicate row) but side effects fire again
Callback sets status unconditionally	State	MEDIUM	Late callback overwrites newer status

Concrete Test Cases

TEST-ID-001: Repeated POST with Same Payload

Objective: Verify that sending the same create request twice does not create duplicate resources.

Steps:

Generate an idempotency key (UUID).
Send POST /api/resources with the key and a creation payload.
Record the response (201 Created, resource ID).
Send the exact same POST with the same idempotency key.
Record the response.

Pass Criteria:

Second request returns the same resource ID as the first.
Second request returns 200 (cached response) or 201 (same resource).
Only one resource exists in the database.
No side effects executed twice (no duplicate email, no duplicate job).

Fail Criteria:

Two resources created with identical data.
Second request returns 409 Conflict without returning the original resource.
Side effects (email, webhook, job) triggered twice.

Test Without Idempotency Key:

# If the endpoint does NOT support idempotency keys, test natural deduplication:
curl -X POST /api/projects -d '{"name": "Test Project"}' &
curl -X POST /api/projects -d '{"name": "Test Project"}' &
wait
# How many projects named "Test Project" exist?
# If 2: FAIL (no deduplication mechanism)

TEST-ID-002: Webhook Replay

Objective: Verify that replaying a webhook event does not cause duplicate side effects.

Steps:

Capture a webhook payload from a provider (payment processor, AI service, etc.).
Send it to the webhook endpoint.
Verify the side effect (status update, credit applied, etc.).
Send the exact same webhook payload again.
Verify no duplicate side effect.

Pass Criteria:

First delivery: side effect executes, 200 returned.
Second delivery: no side effect, 200 returned (not 4xx error).
Third delivery: same as second.
Event ID is recorded and checked before processing.

Implementation Check:

[ ] Webhook handler extracts event ID from payload
[ ] Event ID checked against processed-events store BEFORE processing
[ ] Processing is wrapped in: if not processed, process and record
[ ] Processed-events store has TTL (e.g., 30 days) to prevent unbounded growth
[ ] Response is 200 even for duplicate (so provider stops retrying)

Webhook Idempotency Template:

def handle_webhook(request):
    event_id = request.json['event_id']

    # Check if already processed
    if await event_store.exists(event_id):
        return Response(status=200)  # Acknowledge but don't reprocess

    # Process the event
    try:
        await process_event(request.json)
        await event_store.record(event_id, ttl=timedelta(days=30))
    except Exception as e:
        # Do NOT record the event ID if processing failed
        # This allows genuine retries to succeed
        raise

    return Response(status=200)

TEST-ID-003: Client Timeout + Retry

Objective: Verify that a client timeout followed by retry does not cause duplicate operations.

Steps:

Send a request to a slow endpoint (e.g., generation that takes 30s).
Set client timeout to 5 seconds.
Client times out; request may or may not have been processed server-side.
Client retries with same idempotency key.
Verify outcome.

Pass Criteria:

If original request completed: retry returns the completed result.
If original request is still processing: retry returns "in progress" status.
If original request failed: retry re-executes (idempotency key is cleared on failure).
No duplicate resources, charges, or side effects.

Scenarios:

Scenario A: Server completed before retry
  Request 1: sent -> server processes -> client timeout -> server commits
  Request 2: sent -> server checks idempotency key -> returns cached result
  Result: One resource, one response. CORRECT.

Scenario B: Server still processing when retry arrives
  Request 1: sent -> server starts processing -> client timeout
  Request 2: sent -> server detects in-progress operation -> returns 409 or poll URL
  Result: One operation, client waits. CORRECT.

Scenario C: Server failed before retry
  Request 1: sent -> server fails -> idempotency key NOT stored (or marked failed)
  Request 2: sent -> server re-executes -> succeeds
  Result: One resource from retry. CORRECT.

TEST-ID-004: Double Billing Prevention

Objective: Verify that payment operations cannot be duplicated.

Steps:

Initiate a payment/charge operation.
Simulate: client timeout, webhook retry, server restart.
Verify only one charge exists.

Pass Criteria:

Payment processor called with idempotency key.
Webhook that confirms payment is idempotent (checked by event ID).
User's balance/credits updated exactly once.
Audit log shows one charge, not two.

Critical Verification:

[ ] Payment API calls include provider-level idempotency key
    - Stripe: Idempotency-Key header
    - PayPal: PayPal-Request-Id header
[ ] Credit/balance updates use atomic operations
    - NOT: read balance, add amount, write balance (race condition)
    - YES: UPDATE SET balance = balance + amount WHERE charge_id NOT IN (processed)
[ ] Payment webhook handler checks event ID before applying credits
[ ] Refund operations are also idempotent

TEST-ID-005: Queue Job Redelivery

Objective: Verify that a job redelivered by the queue system does not execute twice.

Steps:

Enqueue a job.
Worker picks up the job and processes it.
Simulate: worker crashes before acknowledging the job.
Queue redelivers the job to another worker.
Second worker processes the job.
Verify only one result exists.

Pass Criteria:

Output exists exactly once (not duplicated).
External API called exactly once (or second call is a no-op due to provider idempotency).
Database records created exactly once.
Billing/metering reflects one execution, not two.

Implementation Patterns:

Pattern 1: Idempotent job design
  - Job writes results keyed by job ID
  - Re-execution overwrites (upsert) instead of creating new
  - External calls use job ID as idempotency key

Pattern 2: Deduplication at queue level
  - Queue checks message deduplication ID before delivery
  - SQS: MessageDeduplicationId
  - Redis: SET NX with job ID before processing

Pattern 3: Exactly-once processing with transactions
  - Process job AND mark as complete in same DB transaction
  - If transaction fails, both roll back (safe to retry)
  - If transaction succeeds, re-delivery finds job already complete

TEST-ID-006: Notification Deduplication

Objective: Verify that users do not receive duplicate notifications.

Steps:

Trigger an action that sends a notification (email, push, SMS).
Retry the same action.
Check notification delivery.

Pass Criteria:

One notification sent (not two).
Notification keyed by: (user_id, notification_type, entity_id, event_id).
Deduplication window appropriate for notification type (e.g., 1 hour for email).

TEST-ID-007: Re-applying Same State Transition

Objective: Verify that applying a state transition that has already been applied is safe.

Steps:

Transition a job from "processing" to "completed".
Attempt to transition the same job from "processing" to "completed" again.

Pass Criteria:

Second attempt is rejected (job is no longer in "processing" state).
No side effects re-executed (no duplicate "completed" webhook, email, etc.).
Error message is clear: "Job is already in 'completed' state."
Alternatively: second attempt returns success idempotently (acknowledges completion).

Making Non-Idempotent Operations Safe

Pattern: Idempotency Key Store

Request -> Extract idempotency key -> Check store:
  Found + completed: Return cached response
  Found + in-progress: Return 409 or wait
  Found + failed: Clear entry, re-execute
  Not found: Record key as in-progress, execute, store response

Store schema:
  idempotency_key: string (primary key)
  status: 'in_progress' | 'completed' | 'failed'
  response_code: int
  response_body: json
  created_at: timestamp
  expires_at: timestamp (TTL for cleanup)

Pattern: Natural Idempotency Keys

Not all operations need client-provided keys. Some have natural deduplication:

| Operation | Natural Key | Implementation |
|-----------|------------|----------------|
| "Generate assets for project X" | project_id + operation_type | UNIQUE(project_id, op_type) WHERE status = 'active' |
| "Process webhook event ABC" | event_id | UNIQUE(event_id) in processed_events |
| "Send welcome email to user Y" | user_id + email_type | UNIQUE(user_id, email_type) with time window |
| "Charge $10 for order Z" | order_id | UNIQUE(order_id) in charges |

Pattern: Upsert with Side-Effect Guard

-- Insert or find existing
INSERT INTO resources (id, data, created_at)
VALUES (gen_id(), 'payload', NOW())
ON CONFLICT (natural_key) DO NOTHING
RETURNING id, (xmax = 0) as was_inserted;

-- Only execute side effects if was_inserted = true
-- This prevents duplicate side effects on retry

Pattern: Outbox for Side Effects

Instead of:
  1. Insert record
  2. Send email     <- If this fails, retry re-inserts record (duplicate)

Use outbox:
  1. In single transaction: Insert record + Insert outbox entry
  2. Separate process reads outbox, sends email, marks entry as sent
  3. Retry-safe: outbox entry checked before re-sending

Idempotency Audit Matrix

For every mutation in the system, fill in:

| Endpoint / Operation | Idempotency Key | Dedupe Mechanism | Side Effects | Side Effect Guard | Verdict |
|---------------------|-----------------|------------------|-------------|-------------------|---------|
| POST /projects | X-Idempotency-Key | Key store | Email, webhook | Outbox pattern | PASS |
| POST /generate | project_id | Active job check | API call, storage | Job dedupe | PASS |
| POST /webhooks/payment | event_id | Processed events | Credit update | Event ID check | PASS |
| PUT /projects/:id | version field | Optimistic lock | None | N/A | PASS |
| POST /invite | email + project | Unique constraint | Email | NOT GUARDED | FAIL |

Post-Audit Checklist

[ ] All POST endpoints support idempotency keys (client-provided or natural)
[ ] Webhook handlers check event ID before processing
[ ] Payment operations use provider-level idempotency keys
[ ] Queue jobs are designed for safe redelivery
[ ] Side effects use outbox pattern or deduplication guard
[ ] Notifications deduplicated by (user, type, entity, time window)
[ ] State transitions reject duplicate applications
[ ] Idempotency key store has TTL-based cleanup
[ ] Failed operations clear idempotency key (allow genuine retry)
[ ] In-progress operations detected on retry (return 409 or poll URL)
[ ] Upsert operations guard side effects with was_inserted check
[ ] All idempotency mechanisms tested with automated retry simulation

What Earlier Audits Miss

Standard testing sends each request once. This audit matters because:

Unit tests call each function once. They never test "what if this function runs twice with the same input?"
Integration tests verify the happy path. They do not simulate network retries, webhook redelivery, or queue job duplication.
Code reviews focus on correctness of single execution, not safety of repeated execution.
QA testing clicks buttons once and verifies the result. They do not click twice or test with flaky network conditions.
Payment testing verifies charges work, not that the same charge cannot happen twice.

This would be called an Idempotency Audit -- specifically testing whether re-running any mutation produces safe, consistent, non-duplicated results under client retry, webhook replay, queue redelivery, and network timeout conditions.

Automation Opportunities

Test	Automatable?	Method
TEST-ID-001: Repeated POST	YES	Send same request twice with same idempotency key; assert single resource
TEST-ID-002: Webhook replay	YES	Replay captured webhook payload; assert no duplicate side effect
TEST-ID-003: Client timeout + retry	YES	Mock slow server, timeout client, retry; assert single result
TEST-ID-004: Double billing	YES	Send payment request twice; assert single charge in provider dashboard
TEST-ID-005: Queue redelivery	YES	Process job, skip ack, redeliver; assert single output
TEST-ID-006: Notification dedupe	YES	Trigger same action twice; assert single notification sent
TEST-ID-007: State transition replay	YES	Apply same transition twice; assert rejection or idempotent acceptance

# Automated idempotency test for all POST endpoints
ENDPOINTS=("/api/projects" "/api/assets" "/api/generate")
for endpoint in "${ENDPOINTS[@]}"; do
  KEY=$(uuidgen)
  R1=$(curl -s -w "\n%{http_code}" -X POST "$endpoint" \
    -H "X-Idempotency-Key: $KEY" \
    -H "Content-Type: application/json" \
    -d '{"name": "idempotency-test"}')
  R2=$(curl -s -w "\n%{http_code}" -X POST "$endpoint" \
    -H "X-Idempotency-Key: $KEY" \
    -H "Content-Type: application/json" \
    -d '{"name": "idempotency-test"}')
  ID1=$(echo "$R1" | head -1 | jq -r '.id')
  ID2=$(echo "$R2" | head -1 | jq -r '.id')
  [ "$ID1" = "$ID2" ] && echo "PASS: $endpoint" || echo "FAIL: $endpoint (id1=$ID1, id2=$ID2)"
done

Reusable Audit Report Template

# Idempotency Audit Report

## System: _______________
## Date: YYYY-MM-DD
## Auditor: _______________

## Mutation Inventory
| Endpoint/Operation | Idempotency Key | Dedupe Mechanism | Side Effects Guarded? | Verdict |
|-------------------|-----------------|------------------|----------------------|---------|
| POST /projects | | | | |
| POST /generate | | | | |
| Webhook handler | | | | |

## Test Results
| Test ID | Description | Result | Evidence |
|---------|-------------|--------|----------|
| TEST-ID-001 | Repeated POST | PASS/FAIL | Duplicate records: ___ |
| TEST-ID-002 | Webhook replay | PASS/FAIL | Duplicate side effects: ___ |
| TEST-ID-003 | Timeout + retry | PASS/FAIL | Duplicate operations: ___ |
| TEST-ID-004 | Double billing | PASS/FAIL | Double charges: ___ |
| TEST-ID-005 | Queue redelivery | PASS/FAIL | Duplicate outputs: ___ |
| TEST-ID-006 | Notification dedupe | PASS/FAIL | Duplicate notifications: ___ |
| TEST-ID-007 | Transition replay | PASS/FAIL | Duplicate transitions: ___ |

## Score: PASS / PARTIAL / FAIL

Priority Targeting

Run this audit FIRST if:

Users report duplicate records appearing
Billing shows double charges
Users receive duplicate emails or notifications
The system processes webhooks from external providers
Background jobs use at-least-once delivery queues
Any operation involves external API calls that cost money
Client-side retry logic exists (axios retry, fetch retry, etc.)

Install this skill directly: skilldb add production-audit-skills

Get CLI access →

Purpose

Scope

Risk Pattern Table

Concrete Test Cases

TEST-ID-001: Repeated POST with Same Payload

If the endpoint does NOT support idempotency keys, test natural deduplication:

How many projects named "Test Project" exist?

If 2: FAIL (no deduplication mechanism)

TEST-ID-002: Webhook Replay

TEST-ID-003: Client Timeout + Retry

TEST-ID-004: Double Billing Prevention

TEST-ID-005: Queue Job Redelivery

TEST-ID-006: Notification Deduplication

TEST-ID-007: Re-applying Same State Transition

Making Non-Idempotent Operations Safe

Pattern: Idempotency Key Store

Pattern: Natural Idempotency Keys

Pattern: Upsert with Side-Effect Guard

Pattern: Outbox for Side Effects

Idempotency Audit Matrix

Post-Audit Checklist

What Earlier Audits Miss

Automation Opportunities

Automated idempotency test for all POST endpoints

Reusable Audit Report Template

Idempotency Audit Report

System: _______________

Date: YYYY-MM-DD

Auditor: _______________

Mutation Inventory

Test Results

Score: PASS / PARTIAL / FAIL

Priority Targeting

Details

Pack: production-audit-skills
File: idempotency-audit.md
Lines: 457
Category: Uncategorized

Download via CLI

Pro

$ skilldb add production-audit-skills

Installs the full Production Audit pack to your project.

Idempotency Audit

Idempotency Audit

Purpose

Scope

Risk Pattern Table

Concrete Test Cases

TEST-ID-001: Repeated POST with Same Payload

TEST-ID-002: Webhook Replay

TEST-ID-003: Client Timeout + Retry

TEST-ID-004: Double Billing Prevention

TEST-ID-005: Queue Job Redelivery

TEST-ID-006: Notification Deduplication

TEST-ID-007: Re-applying Same State Transition

Making Non-Idempotent Operations Safe

Pattern: Idempotency Key Store

Pattern: Natural Idempotency Keys

Pattern: Upsert with Side-Effect Guard

Pattern: Outbox for Side Effects

Idempotency Audit Matrix

Post-Audit Checklist

What Earlier Audits Miss

Automation Opportunities

Reusable Audit Report Template

Priority Targeting

Related Skills

Concurrency & Race Condition Audit

Cost Explosion Audit

Data Lifecycle Audit

Human Error & Operator Safety Audit

Observability & Debuggability Audit

Permission Drift Audit