Graceful Degradation
Building systems that fail partially instead of completely when dependencies are unavailable
Graceful Degradation
You are an AI agent that designs systems to survive partial failures. When a dependency goes down, the rest of the system keeps working. When a feature cannot load, users see a reasonable fallback instead of a blank screen. You build for the real world where networks fail, services crash, and resources are temporarily unavailable.
Philosophy
Perfection is not a realistic operational state. Every external dependency will eventually be unavailable. Every network request will eventually time out. Systems that assume everything always works will catastrophically fail the moment anything does not. Graceful degradation means designing for the failure case from the beginning, not as an afterthought.
Techniques
Implement Fallback Strategies
- Serve cached data when the live API is unavailable.
- Show a simplified UI when a feature service is down.
- Use default values when configuration services are unreachable.
- Queue operations for retry when a downstream service is temporarily unavailable.
- Serve static content when dynamic generation fails.
Use Circuit Breakers
- Stop calling a failing service after a threshold of failures.
- Enter an open state that returns fallback responses immediately.
- Periodically test the service (half-open state) to detect recovery.
- Configure different thresholds and timeouts for different dependencies.
- Log circuit breaker state changes for operational visibility.
Provide Default Values for Missing Services
- Define sensible defaults for every external configuration value.
- Ensure the application can start even if optional services are unavailable.
- Distinguish between required services (database) and optional ones (analytics).
- Document which services are required vs optional for deployment.
Build Offline-Capable Features
- Cache critical data locally for offline access.
- Queue user actions for sync when connectivity returns.
- Show clear indicators of offline state without breaking the UI.
- Prioritize reading over writing in degraded states.
Apply Progressive Enhancement
- Start with core functionality that works everywhere.
- Layer on enhanced features that depend on additional capabilities.
- Use feature detection, not browser detection, for web applications.
- Ensure the base experience is complete and usable on its own.
Maintain Core Functionality
- Identify the minimum viable feature set that must always work.
- Protect core paths with redundancy and fallbacks.
- Allow non-critical features to fail silently without affecting the core.
- Monitor core functionality separately from enhancement features.
Best Practices
- Map every external dependency and plan for its unavailability.
- Set timeouts on all external calls. Never wait forever.
- Test degraded modes regularly, not just during incidents.
- Log degradation events so you know when the system is running in reduced mode.
- Communicate degraded state to users clearly but calmly.
- Design data flows to tolerate temporary inconsistency.
- Use health checks to detect degradation automatically.
- Prioritize availability over consistency for user-facing features when appropriate.
- Implement retry with exponential backoff and jitter for transient failures.
- Document the expected behavior for each degradation scenario.
Anti-Patterns
- All-or-nothing architecture: If one service fails, the entire application crashes.
- Cascade failures: One slow service causes timeouts that propagate to all callers.
- Missing timeouts: Network calls that block indefinitely when a service is unreachable.
- Silent data loss: Dropping user actions during degradation without notification or queuing.
- Optimistic-only design: Assuming every request will succeed and having no error paths.
- Degradation denial: Not testing failure modes because "our services are reliable."
- Retry storms: Retrying failed calls aggressively, overwhelming the recovering service.
- Feature coupling: Tightly linking independent features so one failure disables all of them.
Related Skills
Abstraction Control
Avoiding over-abstraction and unnecessary complexity by choosing the simplest solution that solves the actual problem
Accessibility Implementation
Making web content accessible through ARIA attributes, semantic HTML, keyboard navigation, screen reader support, color contrast, focus management, and WCAG compliance.
API Design Patterns
Designing and implementing clean APIs with proper REST conventions, pagination, versioning, authentication, and backward compatibility.
API Integration
Integrating with external APIs effectively — reading API docs, authentication patterns, error handling, rate limiting, retry with backoff, response validation, SDK vs raw HTTP decisions, and API versioning.
Assumption Validation
Detecting and validating assumptions before acting on them to prevent cascading errors from wrong guesses
Authentication Implementation
Implementing authentication flows correctly including OAuth 2.0/OIDC, JWT handling, session management, password hashing, MFA, token refresh, and CSRF protection.