Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    vm0-ai

    bright-data

    vm0-ai/bright-data
    Data & Analytics
    28
    1 installs

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Bright Data Web Scraper API via curl. Use this skill for scraping social media (Twitter/X, Reddit, YouTube, Instagram, TikTok), account management, and usage monitoring.

    SKILL.md

    Bright Data Web Scraper API

    Use the Bright Data API via direct curl calls for social media scraping, web data extraction, and account management.

    Official docs: https://docs.brightdata.com/


    When to Use

    Use this skill when you need to:

    • Scrape social media - Twitter/X, Reddit, YouTube, Instagram, TikTok, LinkedIn
    • Extract web data - Posts, profiles, comments, engagement metrics
    • Monitor usage - Track bandwidth and request usage
    • Manage account - Check status and zones

    Prerequisites

    1. Sign up at Bright Data
    2. Get your API key from Settings > Users
    3. Create a Web Scraper dataset in the Control Panel to get your dataset_id
    export BRIGHTDATA_TOKEN="your-api-key"
    

    Base URL

    https://api.brightdata.com
    

    Social Media Scraping

    Bright Data supports scraping these social media platforms:

    Platform Profiles Posts Comments Reels/Videos
    Twitter/X ✅ ✅ - -
    Reddit - ✅ ✅ -
    YouTube ✅ ✅ ✅ -
    Instagram ✅ ✅ ✅ ✅
    TikTok ✅ ✅ ✅ -
    LinkedIn ✅ ✅ - -

    How to Use

    1. Trigger Scraping (Asynchronous)

    Trigger a data collection job and get a snapshot_id for later retrieval.

    Write to /tmp/brightdata_request.json:

    [
      {"url": "https://twitter.com/username"},
      {"url": "https://twitter.com/username2"}
    ]
    

    Then run (replace <dataset-id> with your actual dataset ID):

    curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
      -H "Authorization: Bearer $BRIGHTDATA_TOKEN" \
      -H "Content-Type: application/json" \
      -d @/tmp/brightdata_request.json
    

    Response:

    {
      "snapshot_id": "s_m4x7enmven8djfqak"
    }
    

    2. Trigger Scraping (Synchronous)

    Get results immediately in the response (for small requests).

    Write to /tmp/brightdata_request.json:

    [
      {"url": "https://www.reddit.com/r/technology/comments/xxxxx"}
    ]
    

    Then run (replace <dataset-id> with your actual dataset ID):

    curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
      -H "Authorization: Bearer $BRIGHTDATA_TOKEN" \
      -H "Content-Type: application/json" \
      -d @/tmp/brightdata_request.json
    

    3. Monitor Progress

    Check the status of a scraping job (replace <snapshot-id> with your actual snapshot ID):

    curl -s "https://api.brightdata.com/datasets/v3/progress/<snapshot-id>" \
      -H "Authorization: Bearer $BRIGHTDATA_TOKEN"
    

    Response:

    {
      "snapshot_id": "s_m4x7enmven8djfqak",
      "dataset_id": "gd_xxxxx",
      "status": "running"
    }
    

    Status values: running, ready, failed


    4. Download Results

    Once status is ready, download the collected data (replace <snapshot-id> with your actual snapshot ID):

    curl -s "https://api.brightdata.com/datasets/v3/snapshot/<snapshot-id>?format=json" \
      -H "Authorization: Bearer $BRIGHTDATA_TOKEN"
    

    5. List Snapshots

    Get all your snapshots:

    curl -s "https://api.brightdata.com/datasets/v3/snapshots" \
      -H "Authorization: Bearer $BRIGHTDATA_TOKEN" | jq '.[] | {snapshot_id, dataset_id, status}'
    

    6. Cancel Snapshot

    Cancel a running job (replace <snapshot-id> with your actual snapshot ID):

    curl -s -X POST "https://api.brightdata.com/datasets/v3/cancel?snapshot_id=<snapshot-id>" \
      -H "Authorization: Bearer $BRIGHTDATA_TOKEN"
    

    Platform-Specific Examples

    Twitter/X - Scrape Profile

    Write to /tmp/brightdata_request.json:

    [
      {"url": "https://twitter.com/elonmusk"}
    ]
    

    Then run (replace <dataset-id> with your actual dataset ID):

    curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
      -H "Authorization: Bearer $BRIGHTDATA_TOKEN" \
      -H "Content-Type: application/json" \
      -d @/tmp/brightdata_request.json
    

    Returns: x_id, profile_name, biography, is_verified, followers, following, profile_image_link

    Twitter/X - Scrape Posts

    Write to /tmp/brightdata_request.json:

    [
      {"url": "https://twitter.com/username/status/123456789"}
    ]
    

    Then run (replace <dataset-id> with your actual dataset ID):

    curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
      -H "Authorization: Bearer $BRIGHTDATA_TOKEN" \
      -H "Content-Type: application/json" \
      -d @/tmp/brightdata_request.json
    

    Returns: post_id, text, replies, likes, retweets, views, hashtags, media


    Reddit - Scrape Subreddit Posts

    Write to /tmp/brightdata_request.json:

    [
      {"url": "https://www.reddit.com/r/technology", "sort_by": "hot"}
    ]
    

    Then run (replace <dataset-id> with your actual dataset ID):

    curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
      -H "Authorization: Bearer $BRIGHTDATA_TOKEN" \
      -H "Content-Type: application/json" \
      -d @/tmp/brightdata_request.json
    

    Parameters: url, sort_by (new/top/hot)

    Returns: post_id, title, description, num_comments, upvotes, date_posted, community

    Reddit - Scrape Comments

    Write to /tmp/brightdata_request.json:

    [
      {"url": "https://www.reddit.com/r/technology/comments/xxxxx/post_title"}
    ]
    

    Then run (replace <dataset-id> with your actual dataset ID):

    curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
      -H "Authorization: Bearer $BRIGHTDATA_TOKEN" \
      -H "Content-Type: application/json" \
      -d @/tmp/brightdata_request.json
    

    Returns: comment_id, user_posted, comment_text, upvotes, replies


    YouTube - Scrape Video Info

    Write to /tmp/brightdata_request.json:

    [
      {"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}
    ]
    

    Then run (replace <dataset-id> with your actual dataset ID):

    curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
      -H "Authorization: Bearer $BRIGHTDATA_TOKEN" \
      -H "Content-Type: application/json" \
      -d @/tmp/brightdata_request.json
    

    Returns: title, views, likes, num_comments, video_length, transcript, channel_name

    YouTube - Search by Keyword

    Write to /tmp/brightdata_request.json:

    [
      {"keyword": "artificial intelligence", "num_of_posts": 50}
    ]
    

    Then run (replace <dataset-id> with your actual dataset ID):

    curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
      -H "Authorization: Bearer $BRIGHTDATA_TOKEN" \
      -H "Content-Type: application/json" \
      -d @/tmp/brightdata_request.json
    

    YouTube - Scrape Comments

    Write to /tmp/brightdata_request.json:

    [
      {"url": "https://www.youtube.com/watch?v=xxxxx", "load_replies": 3}
    ]
    

    Then run (replace <dataset-id> with your actual dataset ID):

    curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
      -H "Authorization: Bearer $BRIGHTDATA_TOKEN" \
      -H "Content-Type: application/json" \
      -d @/tmp/brightdata_request.json
    

    Returns: comment_text, likes, replies, username, date


    Instagram - Scrape Profile

    Write to /tmp/brightdata_request.json:

    [
      {"url": "https://www.instagram.com/username"}
    ]
    

    Then run (replace <dataset-id> with your actual dataset ID):

    curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
      -H "Authorization: Bearer $BRIGHTDATA_TOKEN" \
      -H "Content-Type: application/json" \
      -d @/tmp/brightdata_request.json
    

    Returns: followers, post_count, profile_name, is_verified, biography

    Instagram - Scrape Posts

    Write to /tmp/brightdata_request.json:

    [
      {
        "url": "https://www.instagram.com/username",
        "num_of_posts": 20,
        "start_date": "01-01-2024",
        "end_date": "12-31-2024"
      }
    ]
    

    Then run (replace <dataset-id> with your actual dataset ID):

    curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
      -H "Authorization: Bearer $BRIGHTDATA_TOKEN" \
      -H "Content-Type: application/json" \
      -d @/tmp/brightdata_request.json
    

    Account Management

    Check Account Status

    curl -s "https://api.brightdata.com/status" \
      -H "Authorization: Bearer $BRIGHTDATA_TOKEN"
    

    Response:

    {
      "status": "active",
      "customer": "hl_xxxxxxxx",
      "can_make_requests": true,
      "ip": "x.x.x.x"
    }
    

    Get Active Zones

    curl -s "https://api.brightdata.com/zone/get_active_zones" \
      -H "Authorization: Bearer $BRIGHTDATA_TOKEN" | jq '.[] | {name, type}'
    

    Get Bandwidth Usage

    curl -s "https://api.brightdata.com/customer/bw" \
      -H "Authorization: Bearer $BRIGHTDATA_TOKEN"
    

    Getting Dataset IDs

    To use the scraping features, you need a dataset_id:

    1. Go to Bright Data Control Panel
    2. Create a new Web Scraper dataset or select an existing one
    3. Choose the platform (Twitter, Reddit, YouTube, etc.)
    4. Copy the dataset_id from the dataset settings

    Dataset IDs can also be found in the bandwidth usage API response under the data field keys (e.g., v__ds_api_gd_xxxxx where gd_xxxxx is your dataset ID).


    Common Parameters

    Parameter Description Example
    url Target URL to scrape https://twitter.com/user
    keyword Search keyword "artificial intelligence"
    num_of_posts Limit number of results 50
    start_date Filter by date (MM-DD-YYYY) "01-01-2024"
    end_date Filter by date (MM-DD-YYYY) "12-31-2024"
    sort_by Sort order (Reddit) new, top, hot
    format Response format json, csv

    Rate Limits

    • Batch mode: up to 100 concurrent requests
    • Maximum input size: 1GB per batch
    • Exceeding limits returns 429 error

    Guidelines

    1. Create datasets first: Use the Control Panel to create scraper datasets
    2. Use async for large jobs: Use /trigger for discovery and batch operations
    3. Use sync for small jobs: Use /scrape for single URL quick lookups
    4. Check status before download: Poll /progress until status is ready
    5. Respect rate limits: Don't exceed 100 concurrent requests
    6. Date format: Use MM-DD-YYYY for date parameters
    Recommended Servers
    Bright Data
    Bright Data
    Apify
    Apify
    Apify
    Apify
    Repository
    vm0-ai/vm0-skills
    Files