Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    ivanvza

    web-scraper

    ivanvza/web-scraper
    Data & Analytics
    10
    1 installs

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Web scraping toolkit for extracting content from web pages. Fetch HTML, extract links, parse text content, and download page resources...

    SKILL.md

    Web Scraper

    A toolkit for extracting content from web pages using Python.

    When to Use This Skill

    Activate this skill when the user needs to:

    • Fetch the HTML content of a web page
    • Extract all links from a page
    • Get readable text content from HTML
    • Scrape data from websites
    • Download and analyze web content

    Requirements

    This skill requires external packages:

    pip install requests beautifulsoup4
    

    Available Scripts

    Always run scripts with --help first to see all available options.

    Script Purpose
    fetch_page.py Download HTML content from a URL
    extract_links.py Extract all links from a page
    extract_text.py Extract readable text from HTML

    Decision Tree

    Task → What do you need?
        │
        ├─ Raw HTML content?
        │   └─ Use: fetch_page.py <url>
        │
        ├─ List of links on a page?
        │   └─ Use: extract_links.py <url>
        │
        └─ Text content (no HTML tags)?
            └─ Use: extract_text.py <url>
    

    Quick Examples

    Fetch page HTML:

    python scripts/fetch_page.py https://example.com
    python scripts/fetch_page.py https://example.com --output page.html
    

    Extract all links:

    python scripts/extract_links.py https://example.com
    python scripts/extract_links.py https://example.com --absolute --filter "\.pdf$"
    

    Extract text content:

    python scripts/extract_text.py https://example.com
    python scripts/extract_text.py https://example.com --paragraphs
    

    Best Practices

    1. Respect robots.txt - Check if scraping is allowed
    2. Add delays - Don't overwhelm servers with rapid requests
    3. Use appropriate User-Agent - Identify your scraper properly
    4. Handle errors gracefully - Websites may block or timeout
    5. Cache responses - Don't re-fetch unchanged pages

    Common Issues

    • 403 Forbidden: Site may be blocking scrapers. Try with --user-agent flag.
    • Timeout: Site may be slow. Increase --timeout value.
    • Empty content: Page may require JavaScript. These scripts handle static HTML only.
    • Encoding issues: Use --encoding flag if text appears garbled.

    Reference Files

    See references/selectors.md for CSS selector syntax reference.

    Ethical Considerations

    • Only scrape public data
    • Respect rate limits and robots.txt
    • Don't scrape personal/private information
    • Check website terms of service
    • Consider using official APIs when available
    Recommended Servers
    ScrapeGraph AI Integration Server
    ScrapeGraph AI Integration Server
    Apify
    Apify
    Bright Data
    Bright Data
    Repository
    ivanvza/dspy-skills
    Files