Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    FuZhiyu

    mistral-pdf-to-markdown

    FuZhiyu/mistral-pdf-to-markdown
    Productivity
    4
    9 installs

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Convert PDFs to Markdown using Mistral OCR API with image extraction...

    SKILL.md

    Mistral PDF to Markdown Converter

    Convert PDF documents to Markdown format using Mistral's OCR API. Automatically extracts text, formatting, and images.

    When to Use

    • Converting research papers or documents to Markdown
    • Extracting text from scanned PDFs (OCR capability)
    • Preserving document structure with headers and formatting
    • Extracting embedded images from PDFs

    Quick Start

    Use the conversion script from this skill's directory:

    # Convert entire PDF
    python scripts/convert_pdf_to_markdown.py input.pdf output.md
    
    # Convert specific pages
    python scripts/convert_pdf_to_markdown.py input.pdf output.md --pages "1-5"
    python scripts/convert_pdf_to_markdown.py input.pdf output.md --pages "1,3,5"
    

    Output Structure

    Output/PDFConversions/
    ├── document.md          # Markdown with text and image references
    └── images/
        ├── img-0.jpeg      # Extracted images
        ├── img-1.jpeg
        └── ...
    

    Usage in Code

    from pathlib import Path
    import subprocess
    
    # Run conversion script
    result = subprocess.run([
        "python",
        ".claude/skills/mistral-pdf-to-markdown/scripts/convert_pdf_to_markdown.py",
        "input.pdf",
        "Output/PDFConversions/output.md",
        "--pages", "1-10"
    ], capture_output=True, text=True)
    
    print(result.stdout)
    

    Key Features

    • Markdown formatting: Preserves headers, lists, and structure
    • Image extraction: Saves images to images/ subfolder automatically
    • Page selection: Extract specific pages or ranges
    • Scanned PDF support: True OCR capability for image-based PDFs
    • Relative paths: Image references use ![...](images/img-X.jpeg)

    Requirements

    The script requires:

    • Mistral API key in Notes/.env (line 2: mistral_api_key=...)
    • Python packages: mistralai, python-dotenv, pypdf

    Common Use Cases

    Convert Research Paper

    python scripts/convert_pdf_to_markdown.py \
      "Data/papers/research.pdf" \
      "Notes/Paper Markdown/research.md"
    

    Extract Specific Sections

    # Extract pages 10-20 (introduction and methods)
    python scripts/convert_pdf_to_markdown.py \
      "paper.pdf" \
      "Notes/Paper Markdown/intro_methods.md" \
      --pages "10-20"
    

    Extract Figures Only

    # Extract pages with figures
    python scripts/convert_pdf_to_markdown.py \
      "paper.pdf" \
      "Notes/Paper Markdown/figures.md" \
      --pages "25,27,30,35"
    

    Error Handling

    API Key Not Found:

    Error: Mistral API key not found in Notes/.env
    

    → Add mistral_api_key=YOUR_KEY to line 2 of Notes/.env

    Page Out of Range:

    Warning: Page 100 out of range, skipping
    

    → Check PDF page count and adjust page selection

    API Rate Limit: → Wait a moment and retry, or reduce page count per request

    Notes

    • Images are saved as JPEG files in images/ subfolder
    • Markdown image references are automatically updated to images/img-X.jpeg
    • Large PDFs may take longer to process due to API limits
    • For simple text extraction without OCR, consider using the pdf skill instead
    • Scanned PDFs benefit most from this skill's OCR capability

    See Also

    • pdf skill - For local PDF manipulation without API calls
    • reference.md - Additional details about the Mistral OCR API
    Recommended Servers
    Docfork
    Docfork
    ScrapeGraph AI Integration Server
    ScrapeGraph AI Integration Server
    Jina AI
    Jina AI
    Repository
    fuzhiyu/researchprojecttemplate
    Files