Smithery Logo
MCPsSkillsDocsPricing
Login
NewFlame, an assistant that learns and improves. Available onTelegramSlack
    dnvriend

    skill-ollama-deepseek-ocr-tool

    dnvriend/skill-ollama-deepseek-ocr-tool
    Data & Analytics
    3
    1 installs

    About

    SKILL.md

    Install

    • Telegram
      Telegram
    • Slack
      Slack
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    • Download skill
    ├─
    ├─
    └─
    Smithery Logo

    Give agents more agency

    Resources

    DocumentationPrivacy PolicySystem Status

    Company

    PricingAboutBlog

    Connect

    © 2026 Smithery. All rights reserved.

    About

    Batch OCR processing with DeepSeek-OCR via Ollama

    SKILL.md

    When to use

    • Convert textbook/lecture images to markdown notes
    • Batch OCR processing of scanned documents
    • Extract text from image sequences (iPhone photos, screenshots)
    • Create searchable markdown from visual content
    • Process documents privately without cloud services

    ollama-deepseek-ocr-tool Skill

    Purpose

    This skill provides access to ollama-deepseek-ocr-tool, a CLI tool for fast, private batch OCR processing using DeepSeek-OCR via Ollama. Converts sequences of images (textbook pages, slides, scans) into a single coherent markdown document.

    Key capabilities:

    • ⚡ Fast processing (~3s per image on M4)
    • 🔒 Private - runs entirely locally
    • 📝 Clean markdown output (tables, headings, lists)
    • 🔄 Natural sorting (IMG_1 < IMG_2 < IMG_10)
    • 💰 Free - no API costs or rate limits

    When to Use This Skill

    Use this skill when:

    • Converting textbook chapters to Obsidian notes
    • Processing lecture slides or handouts to markdown
    • Extracting text from scanned documents
    • Creating searchable study materials from images
    • Need comprehensive examples and troubleshooting

    Do NOT use this skill for:

    • Cloud-based OCR (this is local-only)
    • Describing image content (extracts text only)
    • Handwritten text recognition (printed text only)
    • Real-time streaming OCR (batch processing only)

    CLI Tool: ollama-deepseek-ocr-tool

    The ollama-deepseek-ocr-tool processes multiple images in sequence and creates a single markdown document with extracted text. Images are sorted naturally and text is appended sequentially for coherent reading.

    Installation

    # Clone and install
    git clone https://github.com/dnvriend/ollama-deepseek-ocr-tool.git
    cd ollama-deepseek-ocr-tool
    uv tool install .
    

    Prerequisites

    1. Ollama - Local LLM runtime

      brew install ollama
      ollama serve
      
    2. DeepSeek-OCR model (~6GB download)

      ollama pull deepseek-ocr
      
    3. Python 3.14+ and uv package manager

    Quick Start

    # Example 1: Process textbook chapter from iPhone photos
    ollama-deepseek-ocr-tool "IMG_*.png" chapter-3-notes.md
    
    # Example 2: Convert lecture slides to markdown
    ollama-deepseek-ocr-tool "lecture-week5/*.jpg" week5-summary.md
    
    # Example 3: With verbose logging to debug issues
    ollama-deepseek-ocr-tool "*.png" output.md -vv
    

    Main Command - Batch OCR Processing

    Process images matching a glob pattern and create a markdown document.

    Usage:

    ollama-deepseek-ocr-tool GLOB_PATTERN OUTPUT_FILE [OPTIONS]
    

    Arguments:

    • GLOB_PATTERN: Pattern to match images (e.g., ".png", "dir/.jpg")
    • OUTPUT_FILE: Path to output markdown file (will be overwritten)
    • -v/-vv/-vvv: Verbosity (INFO/DEBUG/TRACE)
    • --help: Show comprehensive help with examples
    • --version: Show version

    Examples:

    # Basic: Process all PNGs in current directory
    ollama-deepseek-ocr-tool "*.png" output.md
    
    # Process specific directory
    ollama-deepseek-ocr-tool "textbook-ch3/*.jpg" chapter-3.md
    
    # With verbose logging
    ollama-deepseek-ocr-tool "*.png" output.md -vv
    
    # Preview help (shows all examples)
    ollama-deepseek-ocr-tool --help
    

    Output Format:

    <!-- Source: IMG_4170.png -->
    
    [extracted text from image 1]
    
    ---
    
    <!-- Source: IMG_4171.png -->
    
    [extracted text from image 2]
    
    ⚙️ Advanced Features (Click to expand)

    Multi-Level Verbosity Logging

    Control logging detail with progressive verbosity levels. All logs output to stderr.

    Logging Levels:

    Flag Level Output Use Case
    (none) WARNING Errors and warnings only Production, quiet mode
    -v INFO + High-level operations Normal debugging
    -vv DEBUG + Detailed info, full tracebacks Development, troubleshooting
    -vvv TRACE + Library internals Deep debugging

    Examples:

    # INFO level - see operations
    ollama-deepseek-ocr-tool command -v
    
    # DEBUG level - see detailed info
    ollama-deepseek-ocr-tool command -vv
    
    # TRACE level - see all internals
    ollama-deepseek-ocr-tool command -vvv
    

    What Can Be Extracted

    Text & Formatting:

    • ✅ Headings (H1, H2, H3)
    • ✅ Body text with bold/italic
    • ✅ Bulleted and numbered lists
    • ✅ Multi-column layouts

    Tables:

    • ✅ Clean markdown table format
    • ✅ Headers and structure preserved
    • ✅ Merged cells handled

    Diagrams & Figures:

    • ✅ Text labels extracted
    • ✅ Figure captions captured
    • ❌ Visual content not described
    • ❌ Flowchart arrows not preserved

    Performance Characteristics

    • Speed: ~3 seconds per image (M4 MacBook)
    • Memory: ~6GB (DeepSeek-OCR model)
    • Throughput: ~20 images per minute
    • Scalability: Sequential processing (no parallel batching)
    🔧 Troubleshooting (Click to expand)

    Common Issues

    Issue: "No files match pattern"

    # Check your glob pattern and current directory
    ls *.png  # Verify files exist
    
    # Use absolute or relative paths correctly
    ollama-deepseek-ocr-tool "./images/*.png" output.md
    

    Issue: "Connection refused" / "OCR extraction failed"

    # Ensure Ollama is running
    ollama serve
    
    # Verify model is installed
    ollama list | grep deepseek-ocr
    
    # Pull model if missing
    ollama pull deepseek-ocr
    

    Issue: Poor quality extraction

    • Use -vv flag to see word counts and verify extraction
    • Check image quality (resolution, clarity)
    • For complex layouts, results may vary
    • Tables and diagrams work best with clear text

    Issue: Slow processing

    • Expected: ~3 seconds per image on M4
    • Check if Ollama is using GPU acceleration
    • Sequential processing is by design (6GB model)

    Getting Help

    # Show comprehensive help with examples
    ollama-deepseek-ocr-tool --help
    
    # Use verbose logging to debug
    ollama-deepseek-ocr-tool "*.png" output.md -vv
    

    Exit Codes

    • 0: Success - all images processed
    • 1: Validation error - no files match pattern or invalid arguments
    • 2: Runtime error - Ollama connection failed or model not found

    Best Practices

    1. Organize images before processing: Name files sequentially (IMG_001, IMG_002) for natural sorting
    2. Use descriptive output names: chapter-3-entrepreneurship.md not output.md
    3. Start with small batches: Test with 2-3 images first to verify quality
    4. Enable verbose logging for debugging: Use -vv to see extraction progress and word counts
    5. Review output after processing: OCR may miss formatting or misread complex layouts
    6. Keep images at good resolution: Higher quality = better extraction
    7. Process similar content together: Keep textbook pages separate from diagrams

    Resources

    • GitHub: https://github.com/dnvriend/ollama-deepseek-ocr-tool
    • Python Package Index: https://pypi.org/project/ollama-deepseek-ocr-tool/
    • Documentation:
    Recommended Servers
    Local Model Suitability MCP
    Local Model Suitability MCP
    Parallel Tasks
    Parallel Tasks
    Bright Data
    Bright Data
    Repository
    dnvriend/ollama-deepseek-ocr-tool
    Files