MON-002 critical general

Database performance monitoring

Slow query logging enabled with reasonable threshold. Query analysis tooling available (pg_stat_statements, Performance Schema, etc.). Dashboard or report exists showing slow queries, OR audit record shows regular review.

Question to ask

"When did you last look at your slow query log?"

Verification guide

Severity: Critical

Check automatically:

  1. Verify slow query logging is enabled:

    PostgreSQL:

    -- Check slow query threshold
    SHOW log_min_duration_statement;
    -- Should return a value (e.g., 1000 for 1 second), not -1 (disabled)
    
    -- Check if pg_stat_statements is enabled
    SELECT * FROM pg_extension WHERE extname = 'pg_stat_statements';
    
    -- Verify it's collecting data
    SELECT count(*) FROM pg_stat_statements;
    

    MySQL:

    -- Check slow query log status
    SHOW VARIABLES LIKE 'slow_query_log';
    -- Should be ON
    
    SHOW VARIABLES LIKE 'long_query_time';
    -- Should be set (e.g., 1 or 2 seconds)
    
    -- Check Performance Schema
    SHOW VARIABLES LIKE 'performance_schema';
    -- Should be ON
    

    Cloud SQL (GCP):

    # Check query insights enabled
    gcloud sql instances describe INSTANCE_NAME --format='json' | jq '.settings.insightsConfig'
    

    RDS (AWS):

    # Check Performance Insights enabled
    aws rds describe-db-instances --db-instance-identifier INSTANCE_ID --query 'DBInstances[].PerformanceInsightsEnabled'
    
    # Check parameter group for slow query settings
    aws rds describe-db-parameters --db-parameter-group-name PARAM_GROUP --query 'Parameters[?ParameterName==`slow_query_log`]'
    
  2. Check for query analysis tooling:

    # Look for query analysis tools in codebase
    grep -riE "pg_stat_statements|performance_schema|query.*(insight|analysis|monitor)" . --include="*.yml" --include="*.yaml" --include="*.json" --include="*.tf" 2>/dev/null
    
    # Check for APM with DB tracing
    grep -riE "datadog.*trace|newrelic.*transaction|apm.*database" . --include="*.yml" --include="*.yaml" --include="*.json" 2>/dev/null
    
  3. Verify dashboard or audit record exists:

    # Look for references to slow query dashboards/reports
    grep -riE "slow.*(query|queries).*dashboard|query.*performance.*report|database.*performance" . --include="*.md" 2>/dev/null
    

Ask user for evidence: "Slow query logging must have a dashboard/report OR audit record showing regular review.

Please provide ONE of:

  1. Dashboard: Screenshot of slow query dashboard (Datadog, CloudWatch, GCP Query Insights, pgAdmin, etc.)
  2. Report: Recent slow query report or analysis document
  3. Audit record: Evidence of recent slow query review (ticket, meeting notes, commit fixing slow query)

When was the last time slow queries were reviewed and acted upon?"

Cross-reference with:

  • DB-001 (Connection pooling) - slow queries exhaust connection pools
  • MON-001 (Infrastructure metrics) - DB CPU metrics may indicate query issues

Pass criteria:

  • Slow query logging is enabled with reasonable threshold
  • Query analysis tool/extension available (pg_stat_statements, Performance Schema, etc.)
  • Dashboard exists showing slow queries, OR
  • Audit record shows regular review of slow query data
  • Evidence of queries being optimized based on findings

Fail criteria:

  • Slow query logging disabled
  • No query analysis capability
  • Logging enabled but no dashboard/report
  • No evidence anyone reviews slow query data
  • Last review was > 3 months ago

Evidence to capture:

  • Slow query threshold setting
  • Query analysis tool in use
  • Dashboard screenshot OR audit record
  • Date of last slow query review
  • Examples of queries optimized (if available)

Section

12. Monitoring

Observability