Convert PDFs to Markdown using Mistral OCR API with image extraction...
Convert PDF documents to Markdown format using Mistral's OCR API. Automatically extracts text, formatting, and images.
Use the conversion script from this skill's directory:
# Convert entire PDF
python scripts/convert_pdf_to_markdown.py input.pdf output.md
# Convert specific pages
python scripts/convert_pdf_to_markdown.py input.pdf output.md --pages "1-5"
python scripts/convert_pdf_to_markdown.py input.pdf output.md --pages "1,3,5"
Output/PDFConversions/
├── document.md # Markdown with text and image references
└── images/
├── img-0.jpeg # Extracted images
├── img-1.jpeg
└── ...
from pathlib import Path
import subprocess
# Run conversion script
result = subprocess.run([
"python",
".claude/skills/mistral-pdf-to-markdown/scripts/convert_pdf_to_markdown.py",
"input.pdf",
"Output/PDFConversions/output.md",
"--pages", "1-10"
], capture_output=True, text=True)
print(result.stdout)
images/ subfolder automaticallyThe script requires:
Notes/.env (line 2: mistral_api_key=...)mistralai, python-dotenv, pypdfpython scripts/convert_pdf_to_markdown.py \
"Data/papers/research.pdf" \
"Notes/Paper Markdown/research.md"
# Extract pages 10-20 (introduction and methods)
python scripts/convert_pdf_to_markdown.py \
"paper.pdf" \
"Notes/Paper Markdown/intro_methods.md" \
--pages "10-20"
# Extract pages with figures
python scripts/convert_pdf_to_markdown.py \
"paper.pdf" \
"Notes/Paper Markdown/figures.md" \
--pages "25,27,30,35"
API Key Not Found:
Error: Mistral API key not found in Notes/.env
→ Add mistral_api_key=YOUR_KEY to line 2 of Notes/.env
Page Out of Range:
Warning: Page 100 out of range, skipping
→ Check PDF page count and adjust page selection
API Rate Limit: → Wait a moment and retry, or reduce page count per request
images/ subfolderimages/img-X.jpegpdf skill insteadpdf skill - For local PDF manipulation without API callsreference.md - Additional details about the Mistral OCR API