LST-007 recommended stress-testing

Graceful degradation under load

When load exceeds capacity, good systems degrade gracefully instead of crashing completely. They shed load, return cached responses, or disable non-critical features.

Question to ask

"When a dependency dies, does it take everything down?"

Verification guide

Severity: Recommended

When load exceeds capacity, good systems degrade gracefully instead of crashing completely. They shed load, return cached responses, or disable non-critical features.

Check automatically:

Look for degradation patterns in code:

# Look for circuit breakers, load shedding, degradation patterns
grep -riE "circuit.*breaker|load.*shed|graceful.*degrad|fallback|bulkhead" src/ lib/ app/ --include="*.ts" --include="*.js" --include="*.py" --include="*.go" 2>/dev/null

# Check for libraries that implement these patterns
grep -E "opossum|cockatiel|hystrix|resilience4j|polly|circuitbreaker|pybreaker" package.json requirements.txt go.mod Gemfile 2>/dev/null

# Look for rate limiting / throttling at app level
grep -riE "rate.*limit|throttl|too.*many.*request|429" src/ lib/ app/ --include="*.ts" --include="*.js" 2>/dev/null

# Check for feature flags that could disable features under load
grep -riE "feature.*flag|launchdarkly|flagsmith|unleash|growthbook" package.json src/ --include="*.json" --include="*.ts" 2>/dev/null

# Look for queue/backpressure patterns
grep -riE "backpressure|queue.*full|reject.*request|shed" src/ lib/ --include="*.ts" --include="*.js" 2>/dev/null

Ask user:

"What happens when your system is overloaded?"
"Do you have circuit breakers for external dependencies?"
"Can you disable non-critical features under load?"
"Is there a 'degraded mode' the system can operate in?"

Graceful degradation strategies:

Strategy	Description
Circuit breakers	Stop calling failing services, return fallback
Load shedding	Reject excess requests early (429)
Feature flags	Disable non-critical features
Cached fallbacks	Return stale data instead of failing
Queue limits	Cap queue depth, reject when full

Cross-reference with:

LST-006 (stress testing reveals what needs degradation handling)
LST-008 (auto-scaling is one response, degradation is another)
Section 30 (rate limiting) - rate limiting is a form of load shedding
Section 19 (error handling) - graceful errors under load
Section 33 (feature flags) - kill switches for degradation

Pass criteria:

Degradation strategy documented and implemented
Circuit breakers protect against cascading failures
Non-critical features can be disabled (feature flags, config)
System returns errors gracefully rather than hanging/crashing

Fail criteria:

System crashes or hangs completely under overload
No circuit breakers (one slow dependency takes down everything)
"We just hope it doesn't happen"
Degradation is uncontrolled (random failures)

Evidence to capture:

Degradation strategies in place (circuit breakers, load shedding, feature flags)
Libraries/patterns used
What features can be disabled under load

Section

36. Load & Stress Testing

Operations & Incident Management

Graceful degradation under load

Related items

Verification guide