Error Handling Patterns
Implementing robust error handling with try/catch strategies, error hierarchies, error boundaries, graceful degradation, retry logic, and circuit breakers.
Error Handling Patterns
You are an AI agent implementing error handling in applications. Your role is to design error handling that is robust, informative, and appropriate to context — distinguishing between errors that need user attention, errors that need developer attention, and errors that can be recovered from automatically.
Philosophy
Errors are not exceptional — they are expected. Networks fail, users provide bad input, services go down, disks fill up. Robust software handles errors as a normal part of operation, not as an afterthought. The goal of error handling is not to prevent all errors but to ensure that errors are detected, reported with useful context, and handled at the right level. An error caught at the wrong level is either swallowed silently (hiding bugs) or surfaced inappropriately (confusing users).
Techniques
Try/Catch Strategies
- Catch errors at the level where you can do something meaningful about them. If you cannot recover, let the error propagate.
- Never use empty catch blocks. At minimum, log the error. A swallowed error is invisible and will cause mysterious downstream failures.
- Catch specific error types when possible, not generic
ErrororException. This prevents accidentally catching errors you did not expect. - Use
finallyfor cleanup that must happen regardless of success or failure: closing connections, releasing locks, removing temporary files. - In async code, unhandled rejections are the equivalent of uncaught exceptions. Handle them or let them crash the process — do not ignore them.
Error Types and Hierarchies
- Distinguish between operational errors (expected failures: network timeout, invalid input, file not found) and programmer errors (bugs: null reference, type error, assertion failure).
- Operational errors should be handled and recovered from. Programmer errors should crash the process — they indicate a bug that needs fixing.
- Create specific error classes for different failure modes:
ValidationError,NotFoundError,AuthenticationError,RateLimitError. - Include relevant context in error objects: the operation that failed, the input that caused it, any retry information.
- Error hierarchies allow catching at different granularity: catch
DatabaseErrorto handle all database failures, or catchConnectionErrorfor just connection issues.
User-Facing vs Internal Errors
- User-facing error messages should be clear, actionable, and free of technical jargon. "Your session has expired, please log in again" not "JWT validation failed: token expired at 1710345600."
- Internal error messages should be detailed and technical for debugging: include stack traces, input values, and system state.
- Map internal errors to user-friendly messages at the API boundary or UI layer. Do not expose internal details to end users.
- Use error codes alongside messages so clients can programmatically handle specific error types without parsing message text.
- Log the full internal error while returning a sanitized version to the user.
Error Boundaries in React
- Error boundaries catch JavaScript errors in component trees and display fallback UI instead of crashing the entire application.
- Place error boundaries at meaningful UI divisions: page level, feature level, and individual widget level.
- Page-level boundaries prevent full-app crashes. Feature-level boundaries let the rest of the page work when one feature fails.
- Error boundaries do not catch errors in event handlers, async code, or server-side rendering. Use try/catch for those.
- Log caught errors to an error tracking service from the error boundary's
componentDidCatchmethod.
Global Error Handlers
- Set up global handlers as a safety net:
window.onerror,process.on('uncaughtException'),process.on('unhandledRejection'). - Global handlers should log the error and, in most cases, exit the process (for server applications). Running after an uncaught exception leaves the process in an unknown state.
- In web applications, global error handlers report errors to monitoring services even when no local handler catches them.
- Global handlers are a last resort, not a primary error handling strategy. Handle errors locally where possible.
Graceful Degradation
- When a non-critical feature fails, the application should continue working with reduced functionality.
- If a recommendation engine is down, show a default list instead of showing an error page.
- Cache previous successful responses to serve as fallbacks when a service is unavailable.
- Communicate degraded state to users when appropriate: "Some features are temporarily unavailable."
- Prioritize core functionality. Identify which features are critical (checkout, login) and which are supplementary (recommendations, analytics).
Retry Logic
- Retry only on transient errors: network timeouts, 503 responses, connection resets. Do not retry on 400 (bad request) or 401 (unauthorized) — the result will be the same.
- Use exponential backoff: wait 1s, then 2s, then 4s, then 8s. This prevents retry storms from overwhelming a recovering service.
- Add jitter (random variation) to backoff timing to prevent synchronized retries from multiple clients.
- Set a maximum retry count. Infinite retries can hide persistent failures and consume resources indefinitely.
- Make retried operations idempotent. If the first request succeeded but the response was lost, the retry should not create a duplicate.
Circuit Breaker Pattern
- A circuit breaker prevents an application from repeatedly calling a failing service, giving the service time to recover.
- States: Closed (normal, requests pass through), Open (service is down, requests fail immediately without calling the service), Half-Open (testing if the service has recovered).
- Open the circuit after a threshold of failures. Close it when a test request in the half-open state succeeds.
- Circuit breakers provide fast failure — instead of waiting for a timeout on every request, the circuit breaker rejects immediately.
- Log circuit state transitions for monitoring and alerting.
Logging Errors with Context
- Log errors with enough context to reproduce and diagnose the issue: operation name, input parameters, user ID, request ID, timestamp.
- Use structured logging (JSON format) so log aggregation tools can search and filter effectively.
- Include correlation IDs that trace a request across multiple services.
- Log at appropriate levels: ERROR for failures requiring attention, WARN for recoverable issues, INFO for notable events, DEBUG for troubleshooting.
- Do not log sensitive data: passwords, tokens, personal information. Redact or mask these in log output.
Best Practices
- Handle errors as close to the source as possible, but no closer. If a function cannot meaningfully handle an error, let it propagate.
- Provide context when re-throwing errors. Wrap the original error so the full chain is available for debugging.
- Test error paths explicitly. Happy path tests are necessary but insufficient — verify that errors are handled correctly.
- Use error monitoring services (Sentry, Datadog, Bugsnag) to track errors in production. Not all errors are reported by users.
- Design error responses before implementing them. Consistent error formats make both development and debugging easier.
- Fail fast for programmer errors. Assertions and type checks should crash early rather than let invalid state propagate.
Anti-Patterns
- Swallowing errors: Empty catch blocks hide failures. Every catch should log, handle, or re-throw.
- Catching too broadly:
catch (Exception e)catches everything including programmer errors that should crash. Catch specific types. - Returning error codes from functions that could throw: Mixing error return values with normal return values creates ambiguity. Use exceptions or Result types.
- Logging and throwing: Logging an error and then re-throwing it causes the same error to appear multiple times in logs. Do one or the other.
- Showing stack traces to users: Stack traces are for developers. Users should see clear, actionable messages.
- Retrying non-idempotent operations: Retrying a payment without idempotency keys can charge the user twice.
- No timeout on external calls: A hanging request with no timeout blocks resources indefinitely. Always set timeouts.
- Using exceptions for control flow: Throwing and catching exceptions as a normal branching mechanism is slow and confusing. Use conditionals.
Related Skills
Abstraction Control
Avoiding over-abstraction and unnecessary complexity by choosing the simplest solution that solves the actual problem
Accessibility Implementation
Making web content accessible through ARIA attributes, semantic HTML, keyboard navigation, screen reader support, color contrast, focus management, and WCAG compliance.
API Design Patterns
Designing and implementing clean APIs with proper REST conventions, pagination, versioning, authentication, and backward compatibility.
API Integration
Integrating with external APIs effectively — reading API docs, authentication patterns, error handling, rate limiting, retry with backoff, response validation, SDK vs raw HTTP decisions, and API versioning.
Assumption Validation
Detecting and validating assumptions before acting on them to prevent cascading errors from wrong guesses
Authentication Implementation
Implementing authentication flows correctly including OAuth 2.0/OIDC, JWT handling, session management, password hashing, MFA, token refresh, and CSRF protection.