Kill switches for quick disable
Critical features have kill switches that can be toggled in < 5 minutes without deploy, with documented procedures and controlled access
Question to ask
"How fast can you disable that new feature if it's melting prod?"
Verification guide
Severity: Recommended
Kill switches are feature flags that can instantly disable functionality in production without a deploy. Critical for incident response when a feature causes problems.
Check automatically:
- Check for kill switch naming patterns:
# Common kill switch naming
grep -rE "KILL_|DISABLE_|ENABLE_|EMERGENCY_" src/ app/ lib/ --include="*.ts" --include="*.js" 2>/dev/null
# Feature flags that control critical features
grep -rE "isOn\(['\"].*payment|isOn\(['\"].*checkout|isOn\(['\"].*auth" src/ app/ lib/ --include="*.ts" --include="*.js" 2>/dev/null
- Check toggle speed (can flags be changed without deploy?):
For env var flags:
- Require restart/redeploy to toggle = slow (minutes to hours)
- NOT suitable for kill switches
For GrowthBook/feature flag services:
- Toggle via dashboard = instant (seconds)
- Suitable for kill switches
# Check if using env vars (slow toggle) vs SDK (fast toggle)
grep -rE "process\.env\.(FEATURE_|ENABLE_|DISABLE_)" src/ app/ lib/ --include="*.ts" --include="*.js" 2>/dev/null
# Check for GrowthBook SDK (fast toggle)
grep -rE "gb\.isOn|gb\.evalFeature|useFeatureIsOn" src/ app/ lib/ --include="*.ts" --include="*.js" 2>/dev/null
- Check for documented kill switch procedures:
# Look for runbooks or incident docs
find . -name "*.md" -exec grep -l -iE "kill switch|disable feature|emergency" {} \; 2>/dev/null
# Check CLAUDE.md or operational docs
grep -iE "kill switch|disable|emergency" CLAUDE.md README.md docs/*.md 2>/dev/null
- Check for critical features that should have kill switches:
Identify features that could cause incidents if they break:
- Payment processing
- External API integrations
- New features in active development
- Third-party service dependencies
# Find critical integrations that might need kill switches
grep -rE "stripe|paypal|twilio|sendgrid|openai|anthropic" src/ app/ lib/ --include="*.ts" --include="*.js" 2>/dev/null | head -20
Ask user:
- "How quickly can you disable a feature in production? (< 5 min = good, requires deploy = bad)"
- "Which features have kill switches? (payments, external APIs, new features)"
- "Is there documentation on how to disable features in an emergency?"
- "Who has access to toggle kill switches?"
Cross-reference with:
- FF-001 (kill switches are a type of feature flag)
- Section 35 (incident response - kill switches in runbooks)
- DEPLOY-001 (deployments should be fast, but kill switches should be faster)
Pass criteria:
- Critical features have kill switches
- Kill switches can be toggled in < 5 minutes (ideally instant via dashboard)
- Team knows how to use them (documented or well-known)
- Kill switch access is controlled but available to on-call
Fail criteria:
- No kill switches for risky features (payments, external APIs)
- Toggling requires a deploy (defeats the purpose)
- No documentation on how to disable features in emergency
- Only one person knows how to toggle (bus factor = 1)
Notes on kill switch implementation:
Env var kill switches (acceptable for simple cases):
- Set
DISABLE_PAYMENTS=truein environment - Requires restart/redeploy to take effect
- OK for non-urgent features, not for emergencies
Feature flag service kill switches (recommended):
- Toggle in GrowthBook dashboard
- Takes effect immediately (SDK polls or uses SSE)
- Proper audit trail of who toggled what when
Hybrid approach:
- Use feature flag service for instant toggles
- Have env var override as backup if flag service is down
process.env.DISABLE_PAYMENTS || !gb.isOn('payments')
Evidence to capture:
- Kill switches identified and their toggle speed
- Critical features covered (or not)
- Documentation/runbook status
- Who has access to toggle kill switches