Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    nebutra

    mineru

    nebutra/mineru-skill
    1 installs

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Parse PDFs into clean Markdown using MinerU's VLM engine...

    SKILL.md

    MinerU Document Parser

    Convert PDF, Word, PPT, and images to clean Markdown using MinerU's VLM engine — LaTeX formulas, tables, and images all preserved.

    Setup

    1. Get free API token at https://mineru.net/user-center/api-token
    export MINERU_TOKEN="your-token-here"
    

    Limits: 2000 pages/day · 200 MB per file · 600 pages per file

    Supported File Types

    Type Formats
    📕 PDF .pdf — papers, textbooks, scanned docs
    📝 Word .docx — reports, manuscripts
    📊 PPT .pptx — slides, presentations
    🖼️ Image .jpg, .jpeg, .png — OCR extraction

    Commands

    Single File

    python3 scripts/mineru_v2.py --file ./document.pdf --output ./output/
    

    Batch Directory with Resume

    python3 scripts/mineru_v2.py \
      --dir ./docs/ \
      --output ./output/ \
      --workers 10 \
      --resume
    

    Direct to Obsidian

    python3 scripts/mineru_v2.py \
      --dir ./pdfs/ \
      --output "~/Library/Mobile Documents/com~apple~CloudDocs/Obsidian/VaultName/" \
      --resume
    

    Chinese Documents

    python3 scripts/mineru_v2.py --dir ./papers/ --output ./output/ --language ch
    

    Complex Layouts (Slow but Most Accurate)

    python3 scripts/mineru_v2.py --file ./paper.pdf --output ./output/ --model vlm
    

    CLI Options

    --dir PATH          Input directory (PDF/Word/PPT/images)
    --file PATH         Single file
    --output PATH       Output directory (default: ./output/)
    --workers N         Concurrent workers (default: 5, max: 15)
    --resume            Skip already processed files
    --model MODEL       Model version: pipeline | vlm | MinerU-HTML (default: vlm)
    --language LANG     Document language: auto | en | ch (default: auto)
    --no-formula        Disable formula recognition
    --no-table          Disable table extraction
    --token TOKEN       API token (overrides MINERU_TOKEN env var)
    

    Model Version Guide

    Model Speed Accuracy Best For
    pipeline ⚡ Fast High Standard docs, most use cases
    vlm 🐢 Slow Highest Complex layouts, multi-column, mixed text+figures
    MinerU-HTML ⚡ Fast High Web-style output, HTML-ready content

    Script Selection

    Script Use When
    mineru_v2.py Default — async parallel (up to 15 workers)
    mineru_async.py Fast network, need maximum throughput
    mineru_stable.py Unstable network — sequential, max retry

    Output Structure

    output/
    ├── document-name/
    │   ├── document-name.md    # Main Markdown
    │   ├── images/             # Extracted images
    │   └── content.json        # Metadata
    

    Performance

    Workers Speed
    1 (sequential) 1.2 files/min
    5 3.1 files/min
    15 5.6 files/min

    Error Handling

    • 5x auto-retry with exponential backoff
    • Use --resume to continue interrupted batches
    • Failed files listed at end of run

    API Reference

    For detailed API documentation, see references/api_reference.md.

    Recommended Servers
    Docfork
    Docfork
    ScrapeGraph AI Integration Server
    ScrapeGraph AI Integration Server
    Bright Data
    Bright Data
    Repository
    nebutra/mineru-skill
    Files