Analyze GitHub repositories to extract insights about commit frequency, outstanding contributors, release timeline, and project health metrics...
This skill guides analysis of GitHub repositories to extract meaningful insights about development activity, contributor patterns, and release cycles.
Use GitHub's REST or GraphQL API for efficient data retrieval:
import requests
from datetime import datetime
def analyze_commits(owner, repo, token=None):
headers = {'Authorization': f'token {token}'} if token else {}
url = f'https://api.github.com/repos/{owner}/{repo}/commits'
all_commits = []
page = 1
while True:
response = requests.get(url, headers=headers, params={'page': page, 'per_page': 100})
commits = response.json()
if not commits:
break
all_commits.extend(commits)
page += 1
return all_commits
def analyze_contributors(commits):
contributor_stats = {}
for commit in commits:
author = commit['commit']['author']['name']
contributor_stats[author] = contributor_stats.get(author, 0) + 1
return sorted(contributor_stats.items(), key=lambda x: x[1], reverse=True)
def analyze_releases(owner, repo, token=None):
headers = {'Authorization': f'token {token}'} if token else {}
url = f'https://api.github.com/repos/{owner}/{repo}/releases'
response = requests.get(url, headers=headers)
return response.json()
Benefits:
Use git commands for detailed analysis when repository is already cloned:
# Get commit history with timestamps
git log --pretty=format:"%h|%an|%ae|%ad|%s" --date=iso > commits.txt
# Count commits by author
git shortlog -sn --all
# Get all tags/releases
git tag -l --sort=-version:refname
# Commit frequency by week
git log --pretty=format:"%ad" --date=short | awk '{print $1}' | uniq -c
# Commits per month
git log --pretty=format:"%ad" --date=format:"%Y-%m" | sort | uniq -c
Combine both methods for comprehensive analysis:
Repository Identification
Data Collection
Data Processing
Insight Generation
Visualization & Reporting
Repository: owner/repo
Analysis Period: YYYY-MM-DD to YYYY-MM-DD
Commit Activity:
- Total Commits: N
- Active Contributors: N
- Average Commits/Week: N
- Most Active Period: YYYY-MM
Top Contributors:
1. Name (N commits, X%)
2. Name (N commits, X%)
...
Recent Releases:
- v1.2.3 (YYYY-MM-DD) - N days since previous
- v1.2.2 (YYYY-MM-DD) - N days since previous
...
Handle rate limiting: Always check API rate limit headers and implement exponential backoff
Large repositories: For repos with 10k+ commits, consider:
Privacy considerations: GitHub API exposes public data only; private repos require authentication
Timezone handling: Normalize all timestamps to UTC for consistent analysis
Bot commits: Filter out automated commits (dependabot, renovate) for human contributor analysis
Email normalization: Same contributor may use different email addresses; consider consolidation