COST-002 recommended Anomaly Detection
Anomaly detection and unexpected cost alerts
Fixed thresholds miss mid-month spikes. Anomaly detection catches unusual spending patterns. Usage-based services need monitoring.
Question to ask
"What if costs tripled overnight — when would you know?"
Pass criteria
- ✓ Anomaly detection enabled
- ✓ Usage-based services identified
- ✓ High-risk services monitored
Fail criteria
- ✗ Only fixed thresholds
- ✗ Unaware of usage-based pricing risks
- ✗ No monitoring for variable-cost services
Verification guide
Severity: Recommended
Fixed thresholds miss mid-month spikes. Anomaly detection catches unusual spending patterns regardless of absolute amounts.
Check automatically:
# AWS - Check for Cost Anomaly Detection monitors
aws ce get-anomaly-monitors
# AWS - Check anomaly subscriptions (who gets notified)
aws ce get-anomaly-subscriptions
# Look for third-party cost tools
grep -riE "vantage|cloudhealth|kubecost|finops|cost.*monitor" docs/ README.md package.json --include="*.md" --include="*.json" 2>/dev/null
# Check for usage-based services that could spike
grep -riE "openai|anthropic|twilio|sendgrid|stripe.*metered|bandwidth|cdn" src/ app/ --include="*.ts" --include="*.js" --include="*.py" 2>/dev/null | head -20
Ask user:
- "Do you have anomaly detection beyond fixed thresholds?"
- "Would you catch a sudden 3x spike mid-month before hitting budget?"
- "Any third-party cost monitoring tools?" (Vantage, CloudHealth, Kubecost)
- "Which tools have usage-based pricing that could spike?" (AI APIs, SMS, bandwidth)
Common usage-based services to monitor:
| Service Type | Examples | Spike Risk |
|---|---|---|
| AI/ML APIs | OpenAI, Anthropic, AWS Bedrock | High - token usage |
| Communications | Twilio, SendGrid | High - per message |
| CDN/Bandwidth | CloudFront, Cloudflare (paid tier) | Medium - traffic spikes |
| Serverless | Lambda, Cloud Functions | Medium - invocation count |
| Database | Aurora Serverless, Firestore | Medium - read/write ops |
Pass criteria:
- Anomaly detection enabled for cloud (native or third-party)
- Awareness of usage-based tools with spike potential
- Monitoring in place for high-risk usage-based services
- Would catch unexpected spikes before invoice arrives
Fail criteria:
- Only fixed threshold alerts
- "We'd notice when the bill comes"
- Usage-based tools with no monitoring or alerts
- No awareness of which services have variable pricing
Cross-reference with:
- COST-001 (anomaly detection complements threshold alerts)
- COST-003 (usage-based tools should have budget awareness)
Evidence to capture:
- Anomaly detection tools in use (AWS native, third-party)
- Usage-based services identified
- Monitoring status for each high-risk service