apify-js-sdk

rawveg/apify-js-sdk

DevOps

1 installs

About

SKILL.md

apify-js-sdk

rawveg/apify-js-sdk

DevOps

1 installs

About

Apify JS SDK Documentation - Web scraping, crawling, and Actor development

SKILL.md

Apify-Js-Sdk Skill

Comprehensive assistance with Apify JavaScript SDK development for web scraping, crawling, and Actor creation. This skill provides access to official Apify documentation covering the API, SDK, and platform features.

When to Use This Skill

This skill should be triggered when:

Building web scrapers or crawlers with Apify
Working with Apify Actors (creation, management, deployment)
Using the Apify JavaScript Client to interact with the Apify API
Managing Apify datasets, key-value stores, or request queues
Implementing data extraction with Cheerio or other parsing libraries
Setting up crawling workflows with link extraction and filtering
Debugging Apify code or Actor runs
Configuring logging and monitoring for Apify Actors
Learning Apify platform best practices

Key Concepts

Actors

Serverless cloud programs running on the Apify platform. Actors can perform various tasks like web scraping, data processing, or automation.

Datasets

Storage for structured data (results from scraping). Each Actor run can have an associated dataset where scraped data is stored.

Key-Value Stores

Storage for arbitrary data like files, screenshots, or configuration. Each Actor run has a default key-value store.

Request Queue

Queue for managing URLs to be crawled. Handles URL deduplication and retry logic automatically.

Apify Client

JavaScript/Python library for interacting with the Apify API programmatically from your code.

Quick Reference

Basic Link Extraction with Cheerio

Extract all links from a webpage using Cheerio:

import * as cheerio from 'cheerio';
import { gotScraping } from 'got-scraping';

const storeUrl = 'https://warehouse-theme-metal.myshopify.com/collections/sales';

const response = await gotScraping(storeUrl);
const html = response.body;

const $ = cheerio.load(html);

// Select all anchor elements
const links = $('a');

// Extract href attributes
for (const link of links) {
    const url = $(link).attr('href');
    console.log(url);
}

Running an Actor with Apify Client

Call an Actor and wait for results:

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({
    token: 'YOUR_API_TOKEN',
});

// Run an Actor and wait for it to finish
const run = await client.actor('some_actor_id').call();

// Get dataset items from the run
const { items } = await client.dataset(run.defaultDatasetId).listItems();

console.log(items);

Creating and Managing Datasets

Store scraped data in a dataset:

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({
    token: 'YOUR_API_TOKEN',
});

// Create a new dataset
const dataset = await client.datasets().getOrCreate('my-dataset');

// Add items to the dataset
await client.dataset(dataset.id).pushItems([
    { title: 'Product 1', price: 29.99 },
    { title: 'Product 2', price: 39.99 },
]);

// Retrieve items
const { items } = await client.dataset(dataset.id).listItems();

Key-Value Store Operations

Store and retrieve arbitrary data:

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({
    token: 'YOUR_API_TOKEN',
});

const store = await client.keyValueStores().getOrCreate('my-store');

// Store a value
await client.keyValueStore(store.id).setRecord({
    key: 'config',
    value: { apiUrl: 'https://api.example.com' },
});

// Retrieve a value
const record = await client.keyValueStore(store.id).getRecord('config');
console.log(record.value);

Logging Configuration

Set up proper logging for Apify Actors:

import logging
from apify.log import ActorLogFormatter

async def main() -> None:
    handler = logging.StreamHandler()
    handler.setFormatter(ActorLogFormatter())

    apify_logger = logging.getLogger('apify')
    apify_logger.setLevel(logging.DEBUG)
    apify_logger.addHandler(handler)

Using the Actor Context

Access Actor run context and storage:

from apify import Actor

async def main() -> None:
    async with Actor:
        # Log messages
        Actor.log.info('Starting Actor run')

        # Access input
        actor_input = await Actor.get_input()

        # Save data to dataset
        await Actor.push_data({
            'url': 'https://example.com',
            'title': 'Example Page'
        })

        # Save to key-value store
        await Actor.set_value('OUTPUT', {'status': 'done'})

Running an Actor Task

Execute a pre-configured Actor task:

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({
    token: 'YOUR_API_TOKEN',
});

// Run a task with custom input
const run = await client.task('task-id').call({
    startUrls: ['https://example.com'],
    maxPages: 10,
});

console.log(`Task run: ${run.id}`);

Redirecting Logs from Called Actors

Redirect logs from a called Actor to the parent run:

from apify import Actor

async def main() -> None:
    async with Actor:
        # Default redirect logger
        await Actor.call(actor_id='some_actor_id')

        # No redirect logger
        await Actor.call(actor_id='some_actor_id', logger=None)

        # Custom redirect logger
        await Actor.call(
            actor_id='some_actor_id',
            logger=logging.getLogger('custom_logger')
        )

Getting Actor Run Details

Retrieve information about an Actor run:

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({
    token: 'YOUR_API_TOKEN',
});

// Get run details
const run = await client.run('run-id').get();

console.log(`Status: ${run.status}`);
console.log(`Started: ${run.startedAt}`);
console.log(`Finished: ${run.finishedAt}`);

Listing Actor Builds

Get all builds for a specific Actor:

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({
    token: 'YOUR_API_TOKEN',
});

const { items } = await client.actor('actor-id').builds().list({
    limit: 10,
    desc: true,
});

for (const build of items) {
    console.log(`Build ${build.buildNumber}: ${build.status}`);
}

Reference Files

This skill includes comprehensive documentation in the references/ directory:

llms-txt.md

Complete API reference documentation with detailed information on:

Actor Management: Creating, updating, and running Actors
Builds: Managing Actor builds and versions
Runs: Controlling Actor execution and monitoring
Tasks: Pre-configured Actor executions
Datasets: Structured data storage and retrieval
Key-Value Stores: Arbitrary data storage
Request Queues: URL queue management
Client SDK: JavaScript/Python client libraries
Logging: Configuring and managing logs

llms-full.md

Extensive documentation covering:

Complete Apify API v2 reference
All API endpoints with request/response examples
Authentication and rate limiting
Error handling
Webhooks and integrations

llms.md

High-level overview and getting started guide with:

Platform concepts and architecture
Quick start examples
Common patterns and workflows
Best practices for web scraping

Working with This Skill

For Beginners

Start with these concepts:

Understanding Actors: Review the Actors introduction to learn about the core building block
First Scraper: Use the link extraction examples to build your first web scraper
Data Storage: Learn about Datasets and Key-Value Stores for storing results
API Basics: Get familiar with the Apify Client for programmatic access

Key reference: llms.md for platform overview and getting started guides

For Intermediate Users

Focus on these areas:

Advanced Crawling: Implement request queues and link filtering
Actor Tasks: Set up pre-configured runs with custom inputs
Logging: Configure proper logging with ActorLogFormatter
Error Handling: Implement retry logic and error recovery
Webhooks: Set up notifications for Actor run events

Key reference: llms-txt.md for detailed API methods and parameters

For Advanced Users

Explore these topics:

Actor Builds: Manage versions and deployments
Metamorph: Transform running Actors into different Actors
Custom Integrations: Build complex workflows with the API
Performance Optimization: Tune concurrency and resource usage
Multi-Actor Orchestration: Chain multiple Actors together

Key reference: llms-full.md for complete API endpoint reference

Navigation Tips

Search by concept: Use keywords like "dataset", "actor", "build" to find relevant sections
Check examples: Look for code blocks in the documentation for working examples
API endpoints: All endpoints follow the pattern /v2/{resource}/{action}
Client methods: SDK methods mirror API endpoints (e.g., client.actor().run())

Common Patterns

Web Scraping Workflow

Set up the crawler with initial URLs
Extract links from pages
Filter and enqueue new URLs
Extract data from pages
Store results in a dataset
Handle errors and retries

Actor Development Workflow

Create Actor locally or in Apify Console
Write scraping logic with Cheerio/Puppeteer
Test locally with sample data
Build and deploy to Apify platform
Create tasks for different configurations
Monitor runs and debug issues

Data Pipeline Pattern

Run an Actor to scrape data
Store results in a dataset
Call another Actor to process the data
Export final results to external system
Use webhooks to trigger next steps

Resources

Official Documentation

API Reference: Complete API v2 documentation at https://docs.apify.com/api/v2
SDK Docs: JavaScript and Python SDK documentation
Academy: Web scraping tutorials and best practices
Examples: Ready-to-use Actor templates

Getting Help

Check the reference files for detailed API documentation
Review code examples for common patterns
Use the Apify Console for visual debugging
Monitor Actor runs with detailed logs

Notes

This skill was automatically generated from official Apify documentation
Reference files preserve structure and examples from source docs
Code examples include proper language detection for syntax highlighting
Documentation covers both JavaScript and Python SDKs
API version 2 is the current stable version

Best Practices

Always use API tokens for authentication (never hardcode)
Handle rate limits appropriately (respect platform quotas)
Store credentials securely using Actor secrets
Log appropriately (INFO for progress, DEBUG for details)
Clean up resources (close stores/datasets when done)
Use request queues for large-scale crawling
Implement retries for failed requests
Monitor Actor memory usage to prevent crashes

Updating

To refresh this skill with updated documentation:

Re-run the documentation scraper with the same configuration
The skill will be rebuilt with the latest information
Review the updated reference files for new features

About

SKILL.md

About

Apify JS SDK Documentation - Web scraping, crawling, and Actor development

SKILL.md

Apify-Js-Sdk Skill

When to Use This Skill

This skill should be triggered when:

Building web scrapers or crawlers with Apify
Working with Apify Actors (creation, management, deployment)
Using the Apify JavaScript Client to interact with the Apify API
Managing Apify datasets, key-value stores, or request queues
Implementing data extraction with Cheerio or other parsing libraries
Setting up crawling workflows with link extraction and filtering
Debugging Apify code or Actor runs
Configuring logging and monitoring for Apify Actors
Learning Apify platform best practices

Key Concepts

Actors

Serverless cloud programs running on the Apify platform. Actors can perform various tasks like web scraping, data processing, or automation.

Datasets

Storage for structured data (results from scraping). Each Actor run can have an associated dataset where scraped data is stored.

Key-Value Stores

Storage for arbitrary data like files, screenshots, or configuration. Each Actor run has a default key-value store.

Request Queue

Queue for managing URLs to be crawled. Handles URL deduplication and retry logic automatically.

Apify Client

JavaScript/Python library for interacting with the Apify API programmatically from your code.

Quick Reference

Basic Link Extraction with Cheerio

Extract all links from a webpage using Cheerio:

import * as cheerio from 'cheerio';
import { gotScraping } from 'got-scraping';

const storeUrl = 'https://warehouse-theme-metal.myshopify.com/collections/sales';

const response = await gotScraping(storeUrl);
const html = response.body;

const $ = cheerio.load(html);

// Select all anchor elements
const links = $('a');

// Extract href attributes
for (const link of links) {
    const url = $(link).attr('href');
    console.log(url);
}

Running an Actor with Apify Client

Call an Actor and wait for results:

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({
    token: 'YOUR_API_TOKEN',
});

// Run an Actor and wait for it to finish
const run = await client.actor('some_actor_id').call();

// Get dataset items from the run
const { items } = await client.dataset(run.defaultDatasetId).listItems();

console.log(items);

Creating and Managing Datasets

Store scraped data in a dataset:

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({
    token: 'YOUR_API_TOKEN',
});

// Create a new dataset
const dataset = await client.datasets().getOrCreate('my-dataset');

// Add items to the dataset
await client.dataset(dataset.id).pushItems([
    { title: 'Product 1', price: 29.99 },
    { title: 'Product 2', price: 39.99 },
]);

// Retrieve items
const { items } = await client.dataset(dataset.id).listItems();

Key-Value Store Operations

Store and retrieve arbitrary data:

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({
    token: 'YOUR_API_TOKEN',
});

const store = await client.keyValueStores().getOrCreate('my-store');

// Store a value
await client.keyValueStore(store.id).setRecord({
    key: 'config',
    value: { apiUrl: 'https://api.example.com' },
});

// Retrieve a value
const record = await client.keyValueStore(store.id).getRecord('config');
console.log(record.value);

Logging Configuration

Set up proper logging for Apify Actors:

import logging
from apify.log import ActorLogFormatter

async def main() -> None:
    handler = logging.StreamHandler()
    handler.setFormatter(ActorLogFormatter())

    apify_logger = logging.getLogger('apify')
    apify_logger.setLevel(logging.DEBUG)
    apify_logger.addHandler(handler)

Using the Actor Context

Access Actor run context and storage:

from apify import Actor

async def main() -> None:
    async with Actor:
        # Log messages
        Actor.log.info('Starting Actor run')

        # Access input
        actor_input = await Actor.get_input()

        # Save data to dataset
        await Actor.push_data({
            'url': 'https://example.com',
            'title': 'Example Page'
        })

        # Save to key-value store
        await Actor.set_value('OUTPUT', {'status': 'done'})

Running an Actor Task

Execute a pre-configured Actor task:

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({
    token: 'YOUR_API_TOKEN',
});

// Run a task with custom input
const run = await client.task('task-id').call({
    startUrls: ['https://example.com'],
    maxPages: 10,
});

console.log(`Task run: ${run.id}`);

Redirecting Logs from Called Actors

Redirect logs from a called Actor to the parent run:

from apify import Actor

async def main() -> None:
    async with Actor:
        # Default redirect logger
        await Actor.call(actor_id='some_actor_id')

        # No redirect logger
        await Actor.call(actor_id='some_actor_id', logger=None)

        # Custom redirect logger
        await Actor.call(
            actor_id='some_actor_id',
            logger=logging.getLogger('custom_logger')
        )

Getting Actor Run Details

Retrieve information about an Actor run:

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({
    token: 'YOUR_API_TOKEN',
});

// Get run details
const run = await client.run('run-id').get();

console.log(`Status: ${run.status}`);
console.log(`Started: ${run.startedAt}`);
console.log(`Finished: ${run.finishedAt}`);

Listing Actor Builds

Get all builds for a specific Actor:

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({
    token: 'YOUR_API_TOKEN',
});

const { items } = await client.actor('actor-id').builds().list({
    limit: 10,
    desc: true,
});

for (const build of items) {
    console.log(`Build ${build.buildNumber}: ${build.status}`);
}

Reference Files

This skill includes comprehensive documentation in the references/ directory:

llms-txt.md

Complete API reference documentation with detailed information on:

Actor Management: Creating, updating, and running Actors
Builds: Managing Actor builds and versions
Runs: Controlling Actor execution and monitoring
Tasks: Pre-configured Actor executions
Datasets: Structured data storage and retrieval
Key-Value Stores: Arbitrary data storage
Request Queues: URL queue management
Client SDK: JavaScript/Python client libraries
Logging: Configuring and managing logs

llms-full.md

Extensive documentation covering:

Complete Apify API v2 reference
All API endpoints with request/response examples
Authentication and rate limiting
Error handling
Webhooks and integrations

llms.md

High-level overview and getting started guide with:

Platform concepts and architecture
Quick start examples
Common patterns and workflows
Best practices for web scraping

Working with This Skill

For Beginners

Start with these concepts:

Understanding Actors: Review the Actors introduction to learn about the core building block
First Scraper: Use the link extraction examples to build your first web scraper
Data Storage: Learn about Datasets and Key-Value Stores for storing results
API Basics: Get familiar with the Apify Client for programmatic access

Key reference: llms.md for platform overview and getting started guides

For Intermediate Users

Focus on these areas:

Advanced Crawling: Implement request queues and link filtering
Actor Tasks: Set up pre-configured runs with custom inputs
Logging: Configure proper logging with ActorLogFormatter
Error Handling: Implement retry logic and error recovery
Webhooks: Set up notifications for Actor run events

Key reference: llms-txt.md for detailed API methods and parameters

For Advanced Users

Explore these topics:

Actor Builds: Manage versions and deployments
Metamorph: Transform running Actors into different Actors
Custom Integrations: Build complex workflows with the API
Performance Optimization: Tune concurrency and resource usage
Multi-Actor Orchestration: Chain multiple Actors together

Key reference: llms-full.md for complete API endpoint reference

Navigation Tips

Search by concept: Use keywords like "dataset", "actor", "build" to find relevant sections
Check examples: Look for code blocks in the documentation for working examples
API endpoints: All endpoints follow the pattern /v2/{resource}/{action}
Client methods: SDK methods mirror API endpoints (e.g., client.actor().run())

Common Patterns

Web Scraping Workflow

Set up the crawler with initial URLs
Extract links from pages
Filter and enqueue new URLs
Extract data from pages
Store results in a dataset
Handle errors and retries

Actor Development Workflow

Create Actor locally or in Apify Console
Write scraping logic with Cheerio/Puppeteer
Test locally with sample data
Build and deploy to Apify platform
Create tasks for different configurations
Monitor runs and debug issues

Data Pipeline Pattern

Run an Actor to scrape data
Store results in a dataset
Call another Actor to process the data
Export final results to external system
Use webhooks to trigger next steps

Resources

Official Documentation

API Reference: Complete API v2 documentation at https://docs.apify.com/api/v2
SDK Docs: JavaScript and Python SDK documentation
Academy: Web scraping tutorials and best practices
Examples: Ready-to-use Actor templates

Getting Help

Check the reference files for detailed API documentation
Review code examples for common patterns
Use the Apify Console for visual debugging
Monitor Actor runs with detailed logs

Notes

This skill was automatically generated from official Apify documentation
Reference files preserve structure and examples from source docs
Code examples include proper language detection for syntax highlighting
Documentation covers both JavaScript and Python SDKs
API version 2 is the current stable version

Best Practices

Always use API tokens for authentication (never hardcode)
Handle rate limits appropriately (respect platform quotas)
Store credentials securely using Actor secrets
Log appropriately (INFO for progress, DEBUG for details)
Clean up resources (close stores/datasets when done)
Use request queues for large-scale crawling
Implement retries for failed requests
Monitor Actor memory usage to prevent crashes

Updating

To refresh this skill with updated documentation:

Re-run the documentation scraper with the same configuration
The skill will be rebuilt with the latest information
Review the updated reference files for new features