LST-008 recommended stress-testing

Auto-scaling triggers tested

Auto-scaling that's never been triggered is theoretical. It might not scale fast enough, might have permission issues, or might hit account limits. If you rely on auto-scaling, test it.

Question to ask

"Auto-scaling configured but never triggered — sure it works?"

Verification guide

Severity: Recommended

Auto-scaling that's never been triggered is theoretical. It might not scale fast enough, might have permission issues, or might hit account limits. If you rely on auto-scaling, test it.

Check automatically:

Check for auto-scaling configs:

# Check for Kubernetes HPA
find . -name "*.yaml" -o -name "*.yml" | xargs grep -l "HorizontalPodAutoscaler" 2>/dev/null | grep -v node_modules

# Check Terraform for auto-scaling
grep -riE "autoscal|aws_autoscaling|google_compute_autoscaler" terraform/ infrastructure/ --include="*.tf" 2>/dev/null

# Check for AWS auto-scaling
grep -riE "AutoScalingGroup|TargetTrackingScaling|ScalingPolicy" terraform/ cloudformation/ --include="*.tf" --include="*.yml" --include="*.yaml" --include="*.json" 2>/dev/null

# Check for GCP auto-scaling
grep -riE "autoscaler|minReplicas|maxReplicas" terraform/ --include="*.tf" 2>/dev/null

# Check for scaling documentation
grep -riE "auto.*scal|scale.*up|scale.*down|scaling.*trigger|hpa" docs/ README.md CLAUDE.md --include="*.md" 2>/dev/null

# Check Kubernetes manifests for resource requests (needed for HPA)
grep -riE "resources:|requests:|limits:|cpu:|memory:" k8s/ kubernetes/ helm/ --include="*.yaml" --include="*.yml" 2>/dev/null

Ask user:

"Do you use auto-scaling?"
"Has it ever actually triggered? (intentionally or during an incident)"
"What are the scaling triggers? (CPU, memory, request count, custom metrics)"
"How long does it take to scale up?"

Auto-scaling validation checklist:

Scaling triggers defined and appropriate
Min/max instances configured
Scale-up time measured (cold start + provisioning)
Account quotas checked (won't hit limits during scale)
Scaling has been tested under real load
Cooldown periods appropriate

Cross-reference with:

LST-006 (stress testing can validate auto-scaling)
LST-007 (graceful degradation while waiting for scale-up)
Section 26 (high availability) - auto-scaling is part of HA strategy
Section 38 (cost monitoring) - auto-scaling affects costs

Pass criteria:

Auto-scaling configured with appropriate triggers
Scaling has been tested under load (not just configured and hoped)
Scale-up time is known and acceptable
Account/quota limits checked (won't hit "max instances" unexpectedly)

Fail criteria:

Auto-scaling configured but never tested
Scaling triggers misconfigured (scales on wrong metric, too slow)
Hits account limits during scale-up
No auto-scaling when traffic is unpredictable

Notes: Not all systems need auto-scaling. Fixed capacity is fine if traffic is predictable and you've sized appropriately. The key is: if you're relying on auto-scaling, test it.

Evidence to capture:

Auto-scaling configuration (triggers, min/max instances)
Whether scaling has been tested
Scale-up time under load
Any account limits that could block scaling

Section

36. Load & Stress Testing

Operations & Incident Management

Auto-scaling triggers tested

Related items

Verification guide