Feature Flag Implementation
Using feature flags for safe deployments including flag types, gradual rollouts, A/B testing, flag cleanup, kill switches, user segmentation, and configuration management.
Feature Flag Implementation
You are an autonomous agent that uses feature flags to decouple deployment from release. Feature flags give you the ability to ship code to production without exposing it to users, roll out features gradually, and instantly disable problematic functionality. They are a safety mechanism and a strategic tool.
Philosophy
Deploying code and releasing features are two different activities. Feature flags separate them. You can deploy daily while releasing only when ready. This reduces deployment risk, enables experimentation, and gives product teams control over the user experience. But flags are a form of technical debt — every flag you add must eventually be removed. Use them intentionally, manage them actively, and clean them up promptly.
Techniques
Flag Types
- Release flags control the visibility of features in development. They are temporary — remove them once the feature is fully launched or abandoned.
- Experiment flags support A/B testing and multivariate experiments. They are tied to a measurement period and removed when the experiment concludes.
- Operational flags (ops flags) control system behavior: circuit breakers, rate limits, maintenance modes. These are often long-lived and act as runtime configuration.
- Permission flags gate features to specific user segments: beta users, enterprise customers, internal teams. They may be long-lived but should be reviewed periodically.
- Classify every flag at creation. The type determines its lifecycle and cleanup expectations.
Gradual Rollouts
- Start by enabling the flag for internal users and employees. This is the cheapest smoke test.
- Roll out to 1% of users, monitor metrics for a defined observation period, then increase to 5%, 10%, 25%, 50%, 100%.
- Use consistent user bucketing (hash of user ID) so that the same user always sees the same variant within a rollout phase.
- Define rollback criteria before starting the rollout: error rate thresholds, latency thresholds, user complaint volume.
- Automate rollout progression where possible, with automatic pauses when health metrics degrade.
A/B Testing Integration
- Feature flags are the mechanism; the experiment framework provides measurement and statistical analysis.
- Assign users to experiment groups at the flag evaluation layer. Log the assignment for analysis.
- Ensure experiment groups are mutually exclusive and collectively exhaustive for the relevant user population.
- Run experiments for a statistically significant duration. Do not call an experiment early based on preliminary results.
- Measure the metrics that matter (conversion rate, engagement, retention) not just the metrics that move first (click rate).
Kill Switches
- Every new feature should have a kill switch: a flag that can disable the feature instantly without a deploy.
- Kill switches should be evaluable without external dependencies. If the flag service is down, the kill switch should default to "off."
- Test kill switches in staging before relying on them in production. Verify that disabling the flag cleanly removes the feature.
- Document which kill switches exist and how to activate them. Include them in incident runbooks.
User Segmentation
- Segment by user attributes: plan type, geography, language, account age, organization.
- Segment by behavioral attributes: power users, new users, users who opted into beta.
- Support allowlists and denylists for individual user overrides.
- Keep segmentation rules simple. Complex targeting logic is hard to reason about and debug.
- Log which segment a user matched for debugging and auditing.
Flag Evaluation Performance
- Evaluate flags locally using a cached configuration, not by making a network call on every evaluation.
- SDKs should initialize by fetching the flag configuration once, then evaluate locally from that snapshot.
- Use streaming or webhook updates to push configuration changes rather than polling.
- Flag evaluation should add less than 1ms to request processing. If it takes longer, the implementation needs optimization.
- Handle SDK initialization failure gracefully. Define sensible defaults for every flag.
Configuration Management
- Store flag definitions in a centralized flag management system (LaunchDarkly, Unleash, Flagsmith, or a custom service).
- Track flag metadata: owner, creation date, type, expected removal date, description.
- Use environments (development, staging, production) with independent flag configurations.
- Require code review for flag configuration changes in production, just as you would for code changes.
- Audit trail: log who changed which flag, when, and why.
Best Practices
- Keep flag evaluation logic at the edges of your code. Do not scatter flag checks deep inside business logic.
- Use a consistent pattern for flag usage: wrap the flagged behavior in a clear if/else block, never nest flag checks.
- Write code for both paths (flag on and flag off). Test both paths. Do not assume the flag will always be on or always be off.
- Set a cleanup date for every release and experiment flag at creation time. Add a calendar reminder or a tracking ticket.
- Name flags descriptively:
enable-new-checkout-flow, notflag-123ortest-feature. - Default to the safe state (usually "off" for new features, "on" for kill switches).
- Monitor flag usage in code. Flags referenced in code but not in the flag service (or vice versa) indicate stale flags.
Anti-Patterns
- Flag debt. Leaving old flags in the code indefinitely creates dead branches, confusion, and combinatorial complexity. Clean up flags within two weeks of full rollout.
- Nested flag checks. Checking one flag inside another creates an exponential number of code paths. Keep flags independent.
- Using flags for permanent configuration. Ops flags are acceptable, but a flag that will never be removed is just configuration. Put it in a config file.
- No defaults. If the flag service is unreachable and there is no default value, the application crashes. Every flag evaluation must specify a default.
- Testing only the happy path. If you only test with the flag on, you will discover bugs in the off path during an emergency rollback — the worst possible time.
- Too many active flags. Having dozens of active flags creates a combinatorial explosion of application states that is impossible to test comprehensively. Keep the number of active flags small.
- Flag-driven architecture. If your architecture depends on flags to function, you have coupled your system to a deployment mechanism. Flags should be removable without refactoring.
Related Skills
Abstraction Control
Avoiding over-abstraction and unnecessary complexity by choosing the simplest solution that solves the actual problem
Accessibility Implementation
Making web content accessible through ARIA attributes, semantic HTML, keyboard navigation, screen reader support, color contrast, focus management, and WCAG compliance.
API Design Patterns
Designing and implementing clean APIs with proper REST conventions, pagination, versioning, authentication, and backward compatibility.
API Integration
Integrating with external APIs effectively — reading API docs, authentication patterns, error handling, rate limiting, retry with backoff, response validation, SDK vs raw HTTP decisions, and API versioning.
Assumption Validation
Detecting and validating assumptions before acting on them to prevent cascading errors from wrong guesses
Authentication Implementation
Implementing authentication flows correctly including OAuth 2.0/OIDC, JWT handling, session management, password hashing, MFA, token refresh, and CSRF protection.