LST-005 recommended capacity-planning
Capacity limits documented
"How much traffic can we handle?" is a question every team should be able to answer. Documented capacity limits inform scaling decisions, incident response, and business planning.
Question to ask
"How much traffic can you handle before things break?"
Pass criteria
- ✓ Capacity limits documented per service/endpoint
- ✓ Limits based on actual testing (not guesses)
- ✓ Team knows the bottleneck (database, CPU, memory, external API)
Fail criteria
- ✗ No idea what limits are ("never tested")
- ✗ Limits documented but never validated
- ✗ Only discovered limits during outages
Related items
Verification guide
Severity: Recommended
"How much traffic can we handle?" is a question every team should be able to answer. Documented capacity limits inform scaling decisions, incident response, and business planning.
Check automatically:
- Look for capacity documentation:
# Look for capacity documentation
grep -riE "capacity|limit|max.*request|rps|requests.*per.*second|concurrent.*users|throughput" docs/ README.md CLAUDE.md --include="*.md" 2>/dev/null
# Check for architecture/scaling docs
find . -maxdepth 3 -name "*capacity*" -o -name "*scaling*" -o -name "*architecture*" 2>/dev/null | grep -v node_modules
# Look for load test results that document limits
find . -maxdepth 3 -type d -name "*loadtest*" -o -name "*results*" 2>/dev/null | grep -v node_modules
# Check for runbooks mentioning capacity
grep -riE "capacity|scaling|traffic.*spike" runbooks/ docs/runbooks/ --include="*.md" 2>/dev/null
Ask user:
- "What's the max RPS your API can handle?"
- "At what point does your database become the bottleneck?"
- "Where are capacity limits documented?"
- "What component fails first under load?"
Cross-reference with:
- LST-002 (baselines include capacity info)
- LST-006 (breaking points are the extreme end of capacity)
- Section 21 (caching) - caching affects capacity
- Section 30 (rate limiting) - rate limits should be below capacity limits
Pass criteria:
- Capacity limits documented per service/endpoint
- Limits based on actual testing (not guesses)
- Team knows the bottleneck (database, CPU, memory, external API)
Fail criteria:
- No idea what limits are ("never tested")
- Limits documented but never validated
- Only discovered limits during outages
Evidence to capture:
- Documented capacity limits (RPS, concurrent users, etc.)
- Known bottleneck(s)
- When limits were last validated