Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Give agents more agency

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    mjunaidca

    working-with-documents

    mjunaidca/working-with-documents
    Productivity
    19

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Creates and edits Office documents: Word (.docx), PDF, and PowerPoint (.pptx). Use when working with document creation, PDF manipulation, presentation generation, tracked changes, or converting...

    SKILL.md

    Working with Documents

    Quick Reference

    Format Read Create Edit
    DOCX pandoc, python-docx docx-js OOXML (unpack/edit/pack)
    PDF pdfplumber, pypdf reportlab pypdf (merge/split)
    PPTX markitdown html2pptx OOXML (unpack/edit/pack)

    Word Documents (.docx)

    Reading Content

    # Convert to markdown (preserves structure)
    pandoc document.docx -o output.md
    
    # With tracked changes visible
    pandoc --track-changes=all document.docx -o output.md
    

    Creating New Documents

    Use docx-js (JavaScript):

    const { Document, Packer, Paragraph, TextRun } = require('docx');
    
    const doc = new Document({
      sections: [{
        children: [
          new Paragraph({
            children: [
              new TextRun({ text: "Hello World", bold: true }),
            ],
          }),
        ],
      }],
    });
    
    Packer.toBuffer(doc).then(buffer => {
      fs.writeFileSync("output.docx", buffer);
    });
    

    Editing Existing Documents (Tracked Changes)

    # 1. Unpack
    python ooxml/scripts/unpack.py document.docx unpacked/
    
    # 2. Edit XML files in unpacked/word/document.xml
    # Key files:
    #   - word/document.xml (main content)
    #   - word/comments.xml (comments)
    #   - word/media/ (images)
    
    # 3. Pack
    python ooxml/scripts/pack.py unpacked/ edited.docx
    

    Tracked changes XML pattern:

    <!-- Deletion -->
    <w:del><w:r><w:delText>old text</w:delText></w:r></w:del>
    
    <!-- Insertion -->
    <w:ins><w:r><w:t>new text</w:t></w:r></w:ins>
    

    PDF Documents

    Reading PDFs

    import pdfplumber
    
    # Extract text
    with pdfplumber.open("document.pdf") as pdf:
        for page in pdf.pages:
            print(page.extract_text())
    
    # Extract tables
    with pdfplumber.open("document.pdf") as pdf:
        for page in pdf.pages:
            tables = page.extract_tables()
            for table in tables:
                for row in table:
                    print(row)
    

    Creating PDFs

    from reportlab.lib.pagesizes import letter
    from reportlab.platypus import SimpleDocTemplate, Paragraph
    from reportlab.lib.styles import getSampleStyleSheet
    
    doc = SimpleDocTemplate("output.pdf", pagesize=letter)
    styles = getSampleStyleSheet()
    story = [
        Paragraph("Report Title", styles['Title']),
        Paragraph("Body text goes here.", styles['Normal']),
    ]
    doc.build(story)
    

    Merging/Splitting PDFs

    from pypdf import PdfReader, PdfWriter
    
    # Merge
    writer = PdfWriter()
    for pdf_file in ["doc1.pdf", "doc2.pdf"]:
        reader = PdfReader(pdf_file)
        for page in reader.pages:
            writer.add_page(page)
    writer.write(open("merged.pdf", "wb"))
    
    # Split
    reader = PdfReader("input.pdf")
    for i, page in enumerate(reader.pages):
        writer = PdfWriter()
        writer.add_page(page)
        writer.write(open(f"page_{i+1}.pdf", "wb"))
    

    Command-Line Tools

    # Extract text
    pdftotext input.pdf output.txt
    pdftotext -layout input.pdf output.txt  # Preserve layout
    
    # Merge with qpdf
    qpdf --empty --pages file1.pdf file2.pdf -- merged.pdf
    
    # Split pages
    qpdf input.pdf --pages . 1-5 -- pages1-5.pdf
    

    PowerPoint Presentations (.pptx)

    Reading Content

    # Convert to markdown
    python -m markitdown presentation.pptx
    

    Creating New Presentations

    Use html2pptx workflow:

    1. Create HTML slides (720pt × 405pt for 16:9)
    2. Convert with html2pptx.js library
    3. Validate with thumbnail grid
    # Create thumbnails for validation
    python scripts/thumbnail.py output.pptx --cols 4
    

    Editing Existing Presentations

    # 1. Unpack
    python ooxml/scripts/unpack.py presentation.pptx unpacked/
    
    # Key files:
    #   - ppt/slides/slide1.xml, slide2.xml, etc.
    #   - ppt/notesSlides/ (speaker notes)
    #   - ppt/media/ (images)
    
    # 2. Edit XML
    
    # 3. Validate
    python ooxml/scripts/validate.py unpacked/ --original presentation.pptx
    
    # 4. Pack
    python ooxml/scripts/pack.py unpacked/ edited.pptx
    

    Rearranging Slides

    # Duplicate, reorder, delete slides
    python scripts/rearrange.py template.pptx output.pptx 0,3,3,5,7
    # Creates: slide 0, slide 3 (twice), slide 5, slide 7
    

    Converting Between Formats

    # DOCX/PPTX to PDF
    soffice --headless --convert-to pdf document.docx
    
    # PDF to images
    pdftoppm -jpeg -r 150 document.pdf page
    # Creates: page-1.jpg, page-2.jpg, etc.
    
    # DOCX to Markdown
    pandoc document.docx -o output.md
    

    OCR for Scanned Documents

    import pytesseract
    from pdf2image import convert_from_path
    
    images = convert_from_path('scanned.pdf')
    text = ""
    for image in images:
        text += pytesseract.image_to_string(image)
    

    Design Guidelines (Presentations)

    Color Palettes

    Pick 3-5 colors that work together:

    Palette Colors
    Classic Blue Navy #1C2833, Slate #2E4053, Silver #AAB7B8
    Teal & Coral Teal #5EA8A7, Coral #FE4447, White #FFFFFF
    Black & Gold Gold #BF9A4A, Black #000000, Cream #F4F6F6

    Web-Safe Fonts Only

    Arial, Helvetica, Times New Roman, Georgia, Verdana, Tahoma, Trebuchet MS, Courier New, Impact

    Layout Rules

    • Two-column: Use for exactly 2 distinct items
    • Three-column: Use for exactly 3 items
    • Never vertically stack charts below text
    • Full-bleed images with text overlays work well

    Dependencies

    # Python
    pip install pypdf pdfplumber reportlab python-docx openpyxl
    
    # System tools
    apt-get install pandoc poppler-utils libreoffice
    
    # Node.js (for docx-js)
    npm install docx
    

    Verification

    Run: python scripts/verify.py

    Related Skills

    • working-with-spreadsheets - Excel file handling
    • building-nextjs-apps - Frontend for document uploads
    Recommended Servers
    Docfork
    Docfork
    Laddro Career
    Laddro Career
    tldraw
    tldraw
    Repository
    mjunaidca/mjs-agent-skills
    Files