COST-002 recommended Anomaly Detection

Anomaly detection and unexpected cost alerts

Fixed thresholds miss mid-month spikes. Anomaly detection catches unusual spending patterns. Usage-based services need monitoring.

Question to ask

"What if costs tripled overnight — when would you know?"

Pass criteria

  • Anomaly detection enabled
  • Usage-based services identified
  • High-risk services monitored

Fail criteria

  • Only fixed thresholds
  • Unaware of usage-based pricing risks
  • No monitoring for variable-cost services

Verification guide

Severity: Recommended

Fixed thresholds miss mid-month spikes. Anomaly detection catches unusual spending patterns regardless of absolute amounts.

Check automatically:

# AWS - Check for Cost Anomaly Detection monitors
aws ce get-anomaly-monitors

# AWS - Check anomaly subscriptions (who gets notified)
aws ce get-anomaly-subscriptions

# Look for third-party cost tools
grep -riE "vantage|cloudhealth|kubecost|finops|cost.*monitor" docs/ README.md package.json --include="*.md" --include="*.json" 2>/dev/null

# Check for usage-based services that could spike
grep -riE "openai|anthropic|twilio|sendgrid|stripe.*metered|bandwidth|cdn" src/ app/ --include="*.ts" --include="*.js" --include="*.py" 2>/dev/null | head -20

Ask user:

  • "Do you have anomaly detection beyond fixed thresholds?"
  • "Would you catch a sudden 3x spike mid-month before hitting budget?"
  • "Any third-party cost monitoring tools?" (Vantage, CloudHealth, Kubecost)
  • "Which tools have usage-based pricing that could spike?" (AI APIs, SMS, bandwidth)

Common usage-based services to monitor:

Service Type Examples Spike Risk
AI/ML APIs OpenAI, Anthropic, AWS Bedrock High - token usage
Communications Twilio, SendGrid High - per message
CDN/Bandwidth CloudFront, Cloudflare (paid tier) Medium - traffic spikes
Serverless Lambda, Cloud Functions Medium - invocation count
Database Aurora Serverless, Firestore Medium - read/write ops

Pass criteria:

  • Anomaly detection enabled for cloud (native or third-party)
  • Awareness of usage-based tools with spike potential
  • Monitoring in place for high-risk usage-based services
  • Would catch unexpected spikes before invoice arrives

Fail criteria:

  • Only fixed threshold alerts
  • "We'd notice when the bill comes"
  • Usage-based tools with no monitoring or alerts
  • No awareness of which services have variable pricing

Cross-reference with:

  • COST-001 (anomaly detection complements threshold alerts)
  • COST-003 (usage-based tools should have budget awareness)

Evidence to capture:

  • Anomaly detection tools in use (AWS native, third-party)
  • Usage-based services identified
  • Monitoring status for each high-risk service

Section

38. Cost Monitoring & Budget Alerts

Compliance & Legal