Interface with the Greenhouse Harvest API v3 to query recruitment and hiring data conversationally...
Query and analyze recruitment data from the Greenhouse Harvest API v3 conversationally. Handle requests for candidate counts, application statuses, interview schedules, pipeline conversions, and other talent acquisition metrics.
First, install the required Python packages:
pip install -r requirements.txt
Or install manually:
pip install requests
The Greenhouse Harvest API v3 uses OAuth authentication with client credentials. Set the following environment variables:
export GREENHOUSE_CLIENT_ID="your-client-id"
export GREENHOUSE_CLIENT_SECRET="your-client-secret"
export GREENHOUSE_USER_ID="your-user-id" # Optional - defaults to service account
Getting OAuth credentials:
Use the provided Python client for all API interactions:
from greenhouse_client import GreenhouseClient
client = GreenhouseClient()
# List all open jobs
jobs = client.get('/v3/jobs', params={'status': 'open'})
# Get applications for a specific job
applications = client.get_all('/v3/applications', params={'job_id': 12345})
# Get scheduled interviews for this week
interviews = client.get_all('/v3/scheduled_interviews', params={
'starts_after': '2024-01-15T00:00:00Z',
'starts_before': '2024-01-22T00:00:00Z'
})
The client handles:
get_all()User request: "How many candidates have applied for software engineer this week?"
Steps:
job_id and created_afterSee: references/common_queries.md → "Candidate Counting Queries" for complete examples
User request: "What status is Jane Smith in?"
Steps:
updated_after)Note: Greenhouse lacks direct name search; filter locally or use date ranges to limit scope.
See: references/common_queries.md → "Candidate Status Lookup" for optimization strategies
User request: "How many interviews are booked this week?"
Steps:
/v3/scheduled_interviews with starts_after and starts_beforeSee: references/common_queries.md → "Interview Counting Queries" for examples
User request: "What is the passthrough rate from application review to phone screen for the product designer role?"
Steps:
Note: Greenhouse tracks current stage, not full history. For more accurate data, use scorecards as a proxy for stage completion.
See: references/common_queries.md → "Pipeline Conversion Analysis" for detailed approaches
The skill supports many other query types:
See: references/common_queries.md for complete patterns and code examples
Use when: You need to understand what endpoints are available or what parameters/filters they support.
Contents:
Examples:
Use when: You need detailed field definitions or relationships between data models.
Contents:
Examples:
Use when: You're handling a conversational query and need a complete workflow pattern.
Contents:
Examples:
Some endpoints support query filters, while others require client-side filtering:
from datetime import datetime, timedelta
# Endpoints that support date filters (candidates, jobs, etc.)
one_week_ago = (datetime.now() - timedelta(days=7)).isoformat()
candidates = client.get_all('/v3/candidates', params={'created_after': one_week_ago})
# Applications endpoint does NOT support filters - fetch all and filter client-side
all_apps = client.get_all('/v3/applications')
recent_apps = [app for app in all_apps
if datetime.fromisoformat(app['created_at'].replace('Z', '+00:00')) > datetime.now() - timedelta(days=7)]
The Greenhouse API uses cursor-based pagination via Link headers. The client handles this automatically:
# Automatically fetches all pages using cursor pagination
all_jobs = client.get_all('/v3/jobs')
# Limit pages for safety during testing
recent = client.get_all('/v3/candidates', max_pages=5)
Note: The page parameter is not supported by the API. Use get_all() for pagination or manually follow Link headers.
Jobs, users, departments, and offices change infrequently. Cache them:
# Fetch once
jobs_cache = {job['id']: job for job in client.get_all('/v3/jobs')}
# Reuse
job_name = jobs_cache[application['job_id']]['name']
Always check for null/missing fields:
stage_name = app['current_stage']['name'] if app.get('current_stage') else 'Unknown'
All Greenhouse timestamps are UTC. Convert for user-facing reports if needed.
Understanding how objects relate:
Candidate
├─ has many Applications
│ ├─ belongs to Job
│ ├─ has current_stage (Job Stage)
│ └─ has many Scorecards
└─ has Recruiter (User)
Job
├─ has many Stages
├─ belongs to Department
├─ belongs to Office
└─ has Hiring Team (Users)
Scheduled Interview
├─ belongs to Application
├─ has many Interviewers (Users)
└─ uses Interview Kit
Scorecard
├─ belongs to Application
├─ belongs to Interviewer (User)
└─ has ratings and questions
ValueError: OAuth credentials not provided. Set GREENHOUSE_CLIENT_ID and GREENHOUSE_CLIENT_SECRET environment variables
Solution: Set the required environment variables:
export GREENHOUSE_CLIENT_ID="your-client-id"
export GREENHOUSE_CLIENT_SECRET="your-client-secret"
export GREENHOUSE_USER_ID="your-user-id" # Optional
If you receive 401 errors, verify:
The client handles rate limiting automatically with retries and exponential backoff. If you consistently hit limits:
max_pages parameter when testingFor large organizations, fetching all candidates/applications can be slow:
created_after or updated_after filtersmax_pages to limit data volume during developmentGreenhouse only tracks current stage, not full progression history:
Python API client with authentication, pagination, error handling, and rate limiting.
Import and use:
from greenhouse_client import GreenhouseClient
client = GreenhouseClient()
Comprehensive endpoint documentation organized by resource type.
Detailed data models and field definitions for all major objects.
Complete workflow patterns and code examples for conversational queries.