ANA-003 recommended Data Pipeline

GitHub data flows to warehouse

GitHub data (PRs, commits, contributors) flows to warehouse for engineering metrics and correlation with product analytics.

Question to ask

"What's your team's actual PR cycle time this quarter?"

What to check

  • Look for GitHub API usage for data extraction
  • Check for webhooks feeding GitHub events to warehouse
  • Look for GitHub tables in Dataform/dbt models
  • Check for engineering analytics tools (LinearB, Sleuth, Swarmia)

Verification guide

Severity: Optional

Engineering metrics (PRs, commits, contributors) in the warehouse enable correlation with product analytics and custom engineering reports.

Check automatically:

  1. Look for GitHub data extraction:
# GitHub API usage for analytics
grep -riE "octokit|@octokit|github.*api" --include="*.ts" --include="*.js" --include="*.py" . 2>/dev/null | grep -v node_modules | head -10

# Scripts that extract PR/commit data
grep -riE "pulls|commits|contributors|pull_request" --include="*.ts" --include="*.js" --include="*.py" scripts/ jobs/ cron/ 2>/dev/null | head -10
  1. Check for GitHub webhooks feeding data:
# Webhook handlers for GitHub events
grep -riE "webhook.*github|github.*webhook|pull_request.*event|push.*event" --include="*.ts" --include="*.js" . 2>/dev/null | grep -v node_modules | head -10
  1. Check for GitHub data in pipeline configs:
# GitHub in ETL/Dataform configs
grep -riE "github" dataform.json definitions/*.sqlx 2>/dev/null | head -5
grep -riE "github" fivetran/ airbyte/ 2>/dev/null | head -5
  1. Check for engineering analytics tools:
# Dedicated tools that sync GitHub data
grep -riE "linearb|sleuth|swarmia|gitprime|pluralsight.*flow|faros" . 2>/dev/null | grep -v node_modules | head -5

If not found in code, ask user:

  • "Do you track GitHub activity (PRs, commits) in your data warehouse?"
  • "Can you correlate engineering activity with product metrics?"
  • "How do you measure engineering velocity?"

Cross-reference with:

  • ANA-002 (requires warehouse to exist)
  • ANA-005 (GitHub data enables git analytics)

Pass criteria:

  • GitHub data (PRs, commits, contributors) flows to warehouse
  • Can query engineering metrics alongside product analytics
  • Automated sync (webhook or scheduled job)

Fail criteria:

  • No GitHub data in warehouse
  • Only using GitHub's built-in insights (can't write custom queries)

Partial (acceptable):

  • Using a dedicated tool (LinearB, Sleuth) instead of warehouse - still provides visibility
  • Only PR data synced, not commits - note limitation

Evidence to capture:

  • Sync method (webhook, API polling, managed connector)
  • Data points captured (PRs, commits, reviews, etc.)
  • Where data lands (warehouse table names or external tool)

Section

18. Analytics

Performance & Analytics