LST-007 recommended stress-testing
Graceful degradation under load
When load exceeds capacity, good systems degrade gracefully instead of crashing completely. They shed load, return cached responses, or disable non-critical features.
Question to ask
"When a dependency dies, does it take everything down?"
Pass criteria
- ✓ Degradation strategy documented and implemented
- ✓ Circuit breakers protect against cascading failures
- ✓ Non-critical features can be disabled (feature flags, config)
- ✓ System returns errors gracefully rather than hanging/crashing
Fail criteria
- ✗ System crashes or hangs completely under overload
- ✗ No circuit breakers (one slow dependency takes down everything)
- ✗ We just hope it doesn't happen
- ✗ Degradation is uncontrolled (random failures)
Related items
LST-006 Breaking points identified (stress testing) LST-008 Auto-scaling triggers tested section-30 section-19 section-33
Verification guide
Severity: Recommended
When load exceeds capacity, good systems degrade gracefully instead of crashing completely. They shed load, return cached responses, or disable non-critical features.
Check automatically:
- Look for degradation patterns in code:
# Look for circuit breakers, load shedding, degradation patterns
grep -riE "circuit.*breaker|load.*shed|graceful.*degrad|fallback|bulkhead" src/ lib/ app/ --include="*.ts" --include="*.js" --include="*.py" --include="*.go" 2>/dev/null
# Check for libraries that implement these patterns
grep -E "opossum|cockatiel|hystrix|resilience4j|polly|circuitbreaker|pybreaker" package.json requirements.txt go.mod Gemfile 2>/dev/null
# Look for rate limiting / throttling at app level
grep -riE "rate.*limit|throttl|too.*many.*request|429" src/ lib/ app/ --include="*.ts" --include="*.js" 2>/dev/null
# Check for feature flags that could disable features under load
grep -riE "feature.*flag|launchdarkly|flagsmith|unleash|growthbook" package.json src/ --include="*.json" --include="*.ts" 2>/dev/null
# Look for queue/backpressure patterns
grep -riE "backpressure|queue.*full|reject.*request|shed" src/ lib/ --include="*.ts" --include="*.js" 2>/dev/null
Ask user:
- "What happens when your system is overloaded?"
- "Do you have circuit breakers for external dependencies?"
- "Can you disable non-critical features under load?"
- "Is there a 'degraded mode' the system can operate in?"
Graceful degradation strategies:
| Strategy | Description |
|---|---|
| Circuit breakers | Stop calling failing services, return fallback |
| Load shedding | Reject excess requests early (429) |
| Feature flags | Disable non-critical features |
| Cached fallbacks | Return stale data instead of failing |
| Queue limits | Cap queue depth, reject when full |
Cross-reference with:
- LST-006 (stress testing reveals what needs degradation handling)
- LST-008 (auto-scaling is one response, degradation is another)
- Section 30 (rate limiting) - rate limiting is a form of load shedding
- Section 19 (error handling) - graceful errors under load
- Section 33 (feature flags) - kill switches for degradation
Pass criteria:
- Degradation strategy documented and implemented
- Circuit breakers protect against cascading failures
- Non-critical features can be disabled (feature flags, config)
- System returns errors gracefully rather than hanging/crashing
Fail criteria:
- System crashes or hangs completely under overload
- No circuit breakers (one slow dependency takes down everything)
- "We just hope it doesn't happen"
- Degradation is uncontrolled (random failures)
Evidence to capture:
- Degradation strategies in place (circuit breakers, load shedding, feature flags)
- Libraries/patterns used
- What features can be disabled under load