Check if medicinal chemistry papers are in ChEMBL database to access curated bioactivity data
ChEMBL is a manually curated database of ~99,000 medicinal chemistry papers with extracted, standardized bioactivity data. If a paper is in ChEMBL, you can access structured data without parsing PDFs.
Core principle: Check ChEMBL first for medicinal chemistry papers. Curated data is more reliable than table parsing.
Use this skill when:
When NOT to use:
Base URL: https://www.ebi.ac.uk/chembl/api/data/
No authentication required
CRITICAL: ChEMBL can ONLY be queried by DOI, NOT by PMID
?doi=10.1234/exampleTwo-step process:
Query by DOI (ONLY method that works):
curl -s "https://www.ebi.ac.uk/chembl/api/data/document.json?doi=DOI"
⚠️ IMPORTANT: Must use DOI, not PMID
# ✅ CORRECT - Use DOI
doi="10.1021/jm401507s"
curl -s "https://www.ebi.ac.uk/chembl/api/data/document.json?doi=$doi"
# ❌ WRONG - PMID won't work (will return 0 results)
pmid="24446688"
curl -s "https://www.ebi.ac.uk/chembl/api/data/document.json?pubmed_id=$pmid" # Does NOT work!
If you only have PMID: Fetch DOI from PubMed first, then query ChEMBL with the DOI.
Response structure:
{
"documents": [
{
"document_chembl_id": "CHEMBL3120156",
"doi": "10.1021/jm401507s",
"title": "Discovery and development of simeprevir (TMC435), a HCV NS3/4A protease inhibitor.",
"abstract": "Hepatitis C virus is a blood-borne infection...",
"pubmed_id": 24446688,
"journal": "J Med Chem",
"year": 2014,
"doc_type": "PUBLICATION"
}
],
"page_meta": {
"total_count": 1
}
}
Key fields:
document_chembl_id - Use this to retrieve activity datadoc_type - "PUBLICATION" (from literature) or "DATASET" (deposited)pubmed_id - PMID is in the response, but cannot be used to query ChEMBLtotal_count = 0, paper not in ChEMBLParse response:
response=$(curl -s "https://www.ebi.ac.uk/chembl/api/data/document.json?doi=$doi")
if [ $(echo "$response" | jq -r '.page_meta.total_count') -gt 0 ]; then
chembl_id=$(echo "$response" | jq -r '.documents[0].document_chembl_id')
echo "✓ Found in ChEMBL: $chembl_id"
else
echo "✗ Not in ChEMBL"
fi
Query activity endpoint:
curl -s "https://www.ebi.ac.uk/chembl/api/data/activity.json?document_chembl_id=CHEMBL3120156&limit=1"
Extract total count:
activity_url="https://www.ebi.ac.uk/chembl/api/data/activity.json?document_chembl_id=$chembl_id&limit=1"
activity_count=$(curl -s "$activity_url" | jq -r '.page_meta.total_count')
echo "→ $activity_count bioactivity data points"
Report immediately:
📄 [15/127] Screening: "Discovery and development of simeprevir"
Abstract score: 9 → Fetching full text...
✓ ChEMBL: CHEMBL3120156 (101 activity data points)
→ IC50 data for HCV NS3 protease inhibitors available
Add to SUMMARY.md:
### [Discovery and development of simeprevir (TMC435), a HCV NS3/4A protease inhibitor](https://doi.org/10.1021/jm401507s) (Score: 9)
**DOI:** [10.1021/jm401507s](https://doi.org/10.1021/jm401507s)
**PMID:** [24446688](https://pubmed.ncbi.nlm.nih.gov/24446688/)
**ChEMBL:** [CHEMBL3120156](https://www.ebi.ac.uk/chembl/document_report_card/CHEMBL3120156/) (101 data points)
**Key Findings:**
- IC50 data for HCV NS3/4A protease inhibitors (from ChEMBL)
- Lead compound simeprevir (TMC435) approved for HCV treatment
- Structures and full activity data: [ChEMBL API](https://www.ebi.ac.uk/chembl/api/data/activity.json?document_chembl_id=CHEMBL3120156)
**ChEMBL Activity Summary:**
- IC50 values for HCV NS3/4A protease
- PK parameters (AUC, Cmax, clearance)
- DMPK assays (metabolic stability, permeability)
Always include ChEMBL status:
Add to papers-reviewed.json:
{
"10.1021/jm401507s": {
"pmid": "24446688",
"status": "relevant",
"score": 9,
"chembl_id": "CHEMBL3120156",
"chembl_activities": 101,
"has_structured_data": true
}
}
For papers with rich ChEMBL data (>20 activities), consider extracting:
# Get all IC50 data
curl -s "https://www.ebi.ac.uk/chembl/api/data/activity.json?document_chembl_id=CHEMBL3120156&standard_type=IC50&limit=100" > chembl_data.json
# Summary statistics
jq '[.activities[] | .standard_value | tonumber] | "Min: \(min), Max: \(max), Count: \(length)"' chembl_data.json
Report to user:
📊 ChEMBL data extracted:
- IC50 values for HCV NS3/4A protease
- All structures downloaded
- Data saved to: chembl_CHEMBL3120156_ic50.json
During evaluating-paper-relevance workflow:
Workflow integration point:
Stage 2: Deep Dive
├─ 1. Fetch Full Text (PMC → DOI → Unpaywall)
├─ 1.5. Check ChEMBL ← ADD THIS STEP
│ ├─ Query by DOI
│ ├─ If found: note ChEMBL ID + activity count
│ └─ Report to user
├─ 2. Scan for Relevant Content
└─ 3. Extract Findings
| Type | Description | Units |
|---|---|---|
| IC50 | Half-maximal inhibitory concentration | nM, µM |
| MIC | Minimum inhibitory concentration | µg/mL, nM |
| Ki | Inhibition constant | nM, µM |
| EC50 | Half-maximal effective concentration | nM, µM |
| Kd | Dissociation constant | nM, µM |
| Potency | General potency measurement | Various |
Filter by activity type:
curl "https://www.ebi.ac.uk/chembl/api/data/activity.json?document_chembl_id=ID&standard_type=MIC"
~99,000 documents (as of 2025)
Well represented:
Poorly represented:
Typical hit rate:
vs. PDF table parsing:
When to still use PDF:
CRITICAL: Report ChEMBL check for every relevant paper
Example workflow report:
📄 [15/50] Screening: "Novel MmpL3 inhibitors..."
Abstract score: 8 → Checking ChEMBL...
✓ ChEMBL: CHEMBL3456789 (34 data points)
→ Fetching full text...
→ Added to SUMMARY.md with ChEMBL link
For papers not in ChEMBL:
📄 [16/50] Screening: "Another paper..."
Abstract score: 9 → Checking ChEMBL...
✗ Not in ChEMBL (likely too recent or review paper)
→ Fetching full text via Unpaywall...
For research sessions with many medicinal chemistry papers:
Create check_chembl.py:
#!/usr/bin/env python3
import requests
import json
import sys
def check_chembl(doi):
"""Check if DOI is in ChEMBL and return summary
IMPORTANT: Must use DOI, not PMID. ChEMBL API does not accept PMID queries.
"""
# Query document (ONLY works with DOI)
doc_url = f"https://www.ebi.ac.uk/chembl/api/data/document.json?doi={doi}"
try:
doc_response = requests.get(doc_url, timeout=10).json()
except:
return None
# Check if found
if doc_response.get('page_meta', {}).get('total_count', 0) == 0:
return {'in_chembl': False}
doc = doc_response['documents'][0]
chembl_id = doc['document_chembl_id']
# Get activity count
act_url = f"https://www.ebi.ac.uk/chembl/api/data/activity.json?document_chembl_id={chembl_id}&limit=1"
try:
act_response = requests.get(act_url, timeout=10).json()
activity_count = act_response.get('page_meta', {}).get('total_count', 0)
except:
activity_count = 0
return {
'in_chembl': True,
'chembl_id': chembl_id,
'activity_count': activity_count,
'doc_type': doc.get('doc_type'),
'title': doc.get('title')
}
if __name__ == "__main__":
doi = sys.argv[1]
result = check_chembl(doi)
if result and result['in_chembl']:
print(f"✓ {result['chembl_id']} ({result['activity_count']} activities)")
else:
print("✗ Not in ChEMBL")
Usage:
python3 check_chembl.py "10.1021/jm401507s"
# Output: ✓ CHEMBL3120156 (101 activities)
Querying by PMID: Using PMID instead of DOI → Always returns 0 results, ChEMBL only accepts DOI queries Skipping ChEMBL check: Not checking medicinal chemistry papers → Missing structured data that's already extracted Checking non-medchem papers: Checking genomics/cell biology papers → Wasting time, won't be in ChEMBL Not reporting status: Silent ChEMBL checks → User can't see what's happening Not adding to SUMMARY.md: Forgetting to include ChEMBL ID → Harder for user to access data later Only using ChEMBL: Not fetching full text when paper in ChEMBL → Missing context, methods, discussion Parsing PDFs when in ChEMBL: Manually extracting tables when structured data available → Wasting time and introducing errors
| Task | Command |
|---|---|
| Check if DOI in ChEMBL | curl "https://www.ebi.ac.uk/chembl/api/data/document.json?doi=DOI" |
| Get activity count | curl "https://www.ebi.ac.uk/chembl/api/data/activity.json?document_chembl_id=ID&limit=1" |
| Get all activities | curl "https://www.ebi.ac.uk/chembl/api/data/activity.json?document_chembl_id=ID&limit=1000" |
| Filter by activity type | curl "...activity.json?document_chembl_id=ID&standard_type=MIC" |
| ChEMBL paper page | https://www.ebi.ac.uk/chembl/document_report_card/CHEMBL_ID/ |
Add to .claude/settings.local.json.template:
"Bash(curl*https://www.ebi.ac.uk/chembl/api/data/*)",
"WebFetch(domain:www.ebi.ac.uk)"
ChEMBL check successful when:
After checking ChEMBL:
docs/CHEMBL_INTEGRATION.md