Bright Data Web Scraper API via curl. Use this skill for scraping social media (Twitter/X, Reddit, YouTube, Instagram, TikTok), account management, and usage monitoring.
Use the Bright Data API via direct curl calls for social media scraping, web data extraction, and account management.
Official docs:
https://docs.brightdata.com/
Use this skill when you need to:
dataset_idexport BRIGHTDATA_TOKEN="your-api-key"
https://api.brightdata.com
Bright Data supports scraping these social media platforms:
| Platform | Profiles | Posts | Comments | Reels/Videos |
|---|---|---|---|---|
| Twitter/X | ✅ | ✅ | - | - |
| - | ✅ | ✅ | - | |
| YouTube | ✅ | ✅ | ✅ | - |
| ✅ | ✅ | ✅ | ✅ | |
| TikTok | ✅ | ✅ | ✅ | - |
| ✅ | ✅ | - | - |
Trigger a data collection job and get a snapshot_id for later retrieval.
Write to /tmp/brightdata_request.json:
[
{"url": "https://twitter.com/username"},
{"url": "https://twitter.com/username2"}
]
Then run (replace <dataset-id> with your actual dataset ID):
curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
-H "Authorization: Bearer $BRIGHTDATA_TOKEN" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json
Response:
{
"snapshot_id": "s_m4x7enmven8djfqak"
}
Get results immediately in the response (for small requests).
Write to /tmp/brightdata_request.json:
[
{"url": "https://www.reddit.com/r/technology/comments/xxxxx"}
]
Then run (replace <dataset-id> with your actual dataset ID):
curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer $BRIGHTDATA_TOKEN" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json
Check the status of a scraping job (replace <snapshot-id> with your actual snapshot ID):
curl -s "https://api.brightdata.com/datasets/v3/progress/<snapshot-id>" \
-H "Authorization: Bearer $BRIGHTDATA_TOKEN"
Response:
{
"snapshot_id": "s_m4x7enmven8djfqak",
"dataset_id": "gd_xxxxx",
"status": "running"
}
Status values: running, ready, failed
Once status is ready, download the collected data (replace <snapshot-id> with your actual snapshot ID):
curl -s "https://api.brightdata.com/datasets/v3/snapshot/<snapshot-id>?format=json" \
-H "Authorization: Bearer $BRIGHTDATA_TOKEN"
Get all your snapshots:
curl -s "https://api.brightdata.com/datasets/v3/snapshots" \
-H "Authorization: Bearer $BRIGHTDATA_TOKEN" | jq '.[] | {snapshot_id, dataset_id, status}'
Cancel a running job (replace <snapshot-id> with your actual snapshot ID):
curl -s -X POST "https://api.brightdata.com/datasets/v3/cancel?snapshot_id=<snapshot-id>" \
-H "Authorization: Bearer $BRIGHTDATA_TOKEN"
Write to /tmp/brightdata_request.json:
[
{"url": "https://twitter.com/elonmusk"}
]
Then run (replace <dataset-id> with your actual dataset ID):
curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer $BRIGHTDATA_TOKEN" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json
Returns: x_id, profile_name, biography, is_verified, followers, following, profile_image_link
Write to /tmp/brightdata_request.json:
[
{"url": "https://twitter.com/username/status/123456789"}
]
Then run (replace <dataset-id> with your actual dataset ID):
curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer $BRIGHTDATA_TOKEN" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json
Returns: post_id, text, replies, likes, retweets, views, hashtags, media
Write to /tmp/brightdata_request.json:
[
{"url": "https://www.reddit.com/r/technology", "sort_by": "hot"}
]
Then run (replace <dataset-id> with your actual dataset ID):
curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
-H "Authorization: Bearer $BRIGHTDATA_TOKEN" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json
Parameters: url, sort_by (new/top/hot)
Returns: post_id, title, description, num_comments, upvotes, date_posted, community
Write to /tmp/brightdata_request.json:
[
{"url": "https://www.reddit.com/r/technology/comments/xxxxx/post_title"}
]
Then run (replace <dataset-id> with your actual dataset ID):
curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer $BRIGHTDATA_TOKEN" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json
Returns: comment_id, user_posted, comment_text, upvotes, replies
Write to /tmp/brightdata_request.json:
[
{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}
]
Then run (replace <dataset-id> with your actual dataset ID):
curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer $BRIGHTDATA_TOKEN" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json
Returns: title, views, likes, num_comments, video_length, transcript, channel_name
Write to /tmp/brightdata_request.json:
[
{"keyword": "artificial intelligence", "num_of_posts": 50}
]
Then run (replace <dataset-id> with your actual dataset ID):
curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
-H "Authorization: Bearer $BRIGHTDATA_TOKEN" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json
Write to /tmp/brightdata_request.json:
[
{"url": "https://www.youtube.com/watch?v=xxxxx", "load_replies": 3}
]
Then run (replace <dataset-id> with your actual dataset ID):
curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer $BRIGHTDATA_TOKEN" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json
Returns: comment_text, likes, replies, username, date
Write to /tmp/brightdata_request.json:
[
{"url": "https://www.instagram.com/username"}
]
Then run (replace <dataset-id> with your actual dataset ID):
curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer $BRIGHTDATA_TOKEN" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json
Returns: followers, post_count, profile_name, is_verified, biography
Write to /tmp/brightdata_request.json:
[
{
"url": "https://www.instagram.com/username",
"num_of_posts": 20,
"start_date": "01-01-2024",
"end_date": "12-31-2024"
}
]
Then run (replace <dataset-id> with your actual dataset ID):
curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
-H "Authorization: Bearer $BRIGHTDATA_TOKEN" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json
curl -s "https://api.brightdata.com/status" \
-H "Authorization: Bearer $BRIGHTDATA_TOKEN"
Response:
{
"status": "active",
"customer": "hl_xxxxxxxx",
"can_make_requests": true,
"ip": "x.x.x.x"
}
curl -s "https://api.brightdata.com/zone/get_active_zones" \
-H "Authorization: Bearer $BRIGHTDATA_TOKEN" | jq '.[] | {name, type}'
curl -s "https://api.brightdata.com/customer/bw" \
-H "Authorization: Bearer $BRIGHTDATA_TOKEN"
To use the scraping features, you need a dataset_id:
dataset_id from the dataset settingsDataset IDs can also be found in the bandwidth usage API response under the data field keys (e.g., v__ds_api_gd_xxxxx where gd_xxxxx is your dataset ID).
| Parameter | Description | Example |
|---|---|---|
url |
Target URL to scrape | https://twitter.com/user |
keyword |
Search keyword | "artificial intelligence" |
num_of_posts |
Limit number of results | 50 |
start_date |
Filter by date (MM-DD-YYYY) | "01-01-2024" |
end_date |
Filter by date (MM-DD-YYYY) | "12-31-2024" |
sort_by |
Sort order (Reddit) | new, top, hot |
format |
Response format | json, csv |
429 error/trigger for discovery and batch operations/scrape for single URL quick lookups/progress until status is ready