cloudflare-r2

enuno/cloudflare-r2

DevOps

About

SKILL.md

cloudflare-r2

enuno/cloudflare-r2

DevOps

About

Cloudflare R2 object storage with S3-compatible API and zero egress fees

SKILL.md

Cloudflare-R2 Skill

Comprehensive assistance with cloudflare-r2 development, generated from official documentation.

When to Use This Skill

This skill should be triggered when:

Working with cloudflare-r2
Asking about cloudflare-r2 features or APIs
Implementing cloudflare-r2 solutions
Debugging cloudflare-r2 code
Learning cloudflare-r2 best practices

Quick Reference

Common Patterns

R2 Data Catalog - Apache Iceberg Integration

# Enable Data Catalog on bucket (Wrangler CLI)
npx wrangler r2 bucket catalog enable my-bucket

# Enable via Dashboard: R2 > Bucket > Settings > R2 Data Catalog > Enable
# Note the Warehouse and Catalog URI values

Python - PyIceberg Catalog Connection

from pyiceberg.catalog.rest import RestCatalog

# Connection configuration
catalog = RestCatalog(
    name="my_catalog",
    warehouse="<WAREHOUSE_ID>",  # From catalog settings
    uri="<CATALOG_URI>",          # From catalog settings
    token="<API_TOKEN>",          # Admin Read & Write token
)

# Create namespace
catalog.create_namespace_if_not_exists("default")

# Create table with schema
test_table = ("default", "people")
table = catalog.create_table(test_table, schema=df.schema)

# Append data
table.append(df)

# Query data
result = table.scan().to_arrow()

# Drop table
catalog.drop_table(test_table)

Data Catalog - API Token Setup

# 1. Navigate to: R2 > Manage API tokens > Create API token
# 2. Select "Admin Read & Write" permission (required for catalog access)
# 3. Save token for authentication

# Token must grant both R2 and catalog permissions for Iceberg clients

S3-Compatible SDK Usage (boto3)

import boto3

# Configure S3 client for R2
s3_client = boto3.client(
    's3',
    endpoint_url='https://<ACCOUNT_ID>.r2.cloudflarestorage.com',
    aws_access_key_id='<ACCESS_KEY_ID>',
    aws_secret_access_key='<SECRET_ACCESS_KEY>',
    region_name='auto'
)

# Upload object
s3_client.put_object(
    Bucket='my-bucket',
    Key='path/to/file.txt',
    Body=b'File contents'
)

# Download object
response = s3_client.get_object(Bucket='my-bucket', Key='path/to/file.txt')
data = response['Body'].read()

# List objects
response = s3_client.list_objects_v2(Bucket='my-bucket', Prefix='path/')
for obj in response.get('Contents', []):
    print(obj['Key'])

# Delete object
s3_client.delete_object(Bucket='my-bucket', Key='path/to/file.txt')

Presigned URLs

# Generate presigned URL for upload (expires in 1 hour)
presigned_url = s3_client.generate_presigned_url(
    'put_object',
    Params={
        'Bucket': 'my-bucket',
        'Key': 'uploads/file.txt'
    },
    ExpiresIn=3600
)

# Generate presigned URL for download
download_url = s3_client.generate_presigned_url(
    'get_object',
    Params={
        'Bucket': 'my-bucket',
        'Key': 'path/to/file.txt'
    },
    ExpiresIn=3600
)

Multipart Upload

# Initiate multipart upload
multipart = s3_client.create_multipart_upload(
    Bucket='my-bucket',
    Key='large-file.bin'
)
upload_id = multipart['UploadId']

# Upload parts
parts = []
for i, chunk in enumerate(file_chunks, start=1):
    part = s3_client.upload_part(
        Bucket='my-bucket',
        Key='large-file.bin',
        PartNumber=i,
        UploadId=upload_id,
        Body=chunk
    )
    parts.append({'PartNumber': i, 'ETag': part['ETag']})

# Complete multipart upload
s3_client.complete_multipart_upload(
    Bucket='my-bucket',
    Key='large-file.bin',
    UploadId=upload_id,
    MultipartUpload={'Parts': parts}
)

Workers Integration

export default {
  async fetch(request, env) {
    const bucket = env.MY_BUCKET; // R2 bucket binding

    // Upload to R2
    await bucket.put('key', 'value', {
      httpMetadata: {
        contentType: 'text/plain',
      },
      customMetadata: {
        user: 'example',
      },
    });

    // Retrieve from R2
    const object = await bucket.get('key');

    if (object === null) {
      return new Response('Object Not Found', { status: 404 });
    }

    // Return object with metadata
    return new Response(object.body, {
      headers: {
        'Content-Type': object.httpMetadata.contentType,
        'ETag': object.httpEtag,
      },
    });
  },
};

Bucket CORS Configuration

// Set CORS policy via S3 SDK
const corsConfig = {
  CORSRules: [
    {
      AllowedOrigins: ['https://example.com'],
      AllowedMethods: ['GET', 'PUT', 'POST', 'DELETE'],
      AllowedHeaders: ['*'],
      MaxAgeSeconds: 3000,
    },
  ],
};

await s3_client.put_bucket_cors(
  Bucket='my-bucket',
  CORSConfiguration=corsConfig
);

Object Metadata

# Upload with custom metadata
s3_client.put_object(
    Bucket='my-bucket',
    Key='document.pdf',
    Body=file_data,
    Metadata={
        'author': 'John Doe',
        'department': 'Engineering',
        'classification': 'internal',
    },
    ContentType='application/pdf',
)

# Retrieve metadata without downloading object
response = s3_client.head_object(Bucket='my-bucket', Key='document.pdf')
metadata = response['Metadata']
content_type = response['ContentType']

Data Catalog - Apache Spark Integration

from pyspark.sql import SparkSession

# Configure Spark with R2 Data Catalog
spark = SparkSession.builder \
    .config("spark.sql.catalog.r2", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.r2.catalog-impl", "org.apache.iceberg.rest.RESTCatalog") \
    .config("spark.sql.catalog.r2.uri", "<CATALOG_URI>") \
    .config("spark.sql.catalog.r2.warehouse", "<WAREHOUSE_ID>") \
    .config("spark.sql.catalog.r2.token", "<API_TOKEN>") \
    .getOrCreate()

# Create table
spark.sql("""
    CREATE TABLE r2.default.events (
        event_id STRING,
        timestamp TIMESTAMP,
        user_id STRING,
        action STRING
    ) USING iceberg
""")

# Insert data
spark.sql("""
    INSERT INTO r2.default.events
    VALUES ('evt123', current_timestamp(), 'user456', 'login')
""")

# Query data
df = spark.sql("SELECT * FROM r2.default.events WHERE action = 'login'")
df.show()

R2 SQL - Serverless Analytics Query Engine

# Query R2 Data Catalog tables with R2 SQL (Wrangler CLI)
npx wrangler r2 sql query "YOUR_WAREHOUSE_NAME" "SELECT * FROM default.table LIMIT 10"

# Authentication setup (required before querying)
export WRANGLER_R2_SQL_AUTH_TOKEN="YOUR_API_TOKEN"

# API token needs: Admin Read & Write + R2 SQL Read permissions
# Create at: R2 > Manage API tokens > Create API token

R2 SQL - Basic Query Patterns

# Select with filtering
npx wrangler r2 sql query "warehouse-123" \
  "SELECT user_id, event_type, product_id, amount
   FROM default.ecommerce
   WHERE event_type = 'purchase'
   LIMIT 10"

# Aggregation queries
npx wrangler r2 sql query "warehouse-123" \
  "SELECT
     user_id,
     COUNT(*) as transaction_count,
     SUM(amount) as total_spent
   FROM default.transactions
   GROUP BY user_id
   HAVING total_spent > 1000
   ORDER BY total_spent DESC"

# Time-based filtering
npx wrangler r2 sql query "warehouse-123" \
  "SELECT * FROM default.events
   WHERE timestamp >= '2024-01-01'
     AND timestamp < '2024-02-01'
   ORDER BY timestamp DESC"

# Join operations
npx wrangler r2 sql query "warehouse-123" \
  "SELECT
     u.user_id,
     u.name,
     t.transaction_id,
     t.amount
   FROM default.users u
   JOIN default.transactions t ON u.user_id = t.user_id
   WHERE t.fraud_flag = false"

R2 SQL - Stream Processing

# Insert from stream to sink table (continuous processing)
npx wrangler r2 sql query "warehouse-123" \
  "INSERT INTO ecommerce_sink
   SELECT * FROM ecommerce_stream"

# Filtered stream transformation
npx wrangler r2 sql query "warehouse-123" \
  "INSERT INTO high_value_transactions
   SELECT
     transaction_id,
     user_id,
     amount,
     timestamp
   FROM transaction_stream
   WHERE amount > 10000"

R2 SQL - Advanced Analytics

# Window functions for ranked queries
npx wrangler r2 sql query "warehouse-123" \
  "SELECT
     user_id,
     product_id,
     amount,
     ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY amount DESC) as purchase_rank
   FROM default.purchases
   QUALIFY purchase_rank <= 5"

# Time-series aggregations
npx wrangler r2 sql query "warehouse-123" \
  "SELECT
     DATE_TRUNC('hour', timestamp) as hour,
     COUNT(*) as event_count,
     AVG(response_time_ms) as avg_response_time
   FROM default.api_logs
   GROUP BY hour
   ORDER BY hour DESC
   LIMIT 24"

# Fraud detection pattern
npx wrangler r2 sql query "warehouse-123" \
  "SELECT
     user_id,
     COUNT(*) as transaction_count,
     SUM(CASE WHEN fraud_flag THEN 1 ELSE 0 END) as fraud_count,
     AVG(amount) as avg_amount
   FROM default.transactions
   WHERE timestamp >= CURRENT_DATE - INTERVAL '7' DAY
   GROUP BY user_id
   HAVING fraud_count > 0
   ORDER BY fraud_count DESC"

Reference Files

This skill includes comprehensive documentation in references/:

api.md - Api documentation
buckets.md - Buckets documentation
getting_started.md - Getting Started documentation
other.md - Other documentation

Use view to read specific reference files when detailed information is needed.

Working with This Skill

For Beginners

Start with the getting_started or tutorials reference files for foundational concepts.

For Specific Features

Use the appropriate category reference file (api, guides, etc.) for detailed information.

For Code Examples

The quick reference section above contains common patterns extracted from the official docs.

Resources

references/

Organized documentation extracted from official sources. These files contain:

Detailed explanations
Code examples with language annotations
Links to original documentation
Table of contents for quick navigation

scripts/

Add helper scripts here for common automation tasks.

assets/

Add templates, boilerplate, or example projects here.

Notes

This skill was automatically generated from official documentation
Reference files preserve the structure and examples from source docs
Code examples include language detection for better syntax highlighting
Quick reference patterns are extracted from common usage examples in the docs

Updating

To refresh this skill with updated documentation:

Re-run the scraper with the same configuration
The skill will be rebuilt with the latest information

About

SKILL.md

About

Cloudflare R2 object storage with S3-compatible API and zero egress fees

SKILL.md

Cloudflare-R2 Skill

Comprehensive assistance with cloudflare-r2 development, generated from official documentation.

When to Use This Skill

This skill should be triggered when:

Working with cloudflare-r2
Asking about cloudflare-r2 features or APIs
Implementing cloudflare-r2 solutions
Debugging cloudflare-r2 code
Learning cloudflare-r2 best practices

Quick Reference

Common Patterns

R2 Data Catalog - Apache Iceberg Integration

# Enable Data Catalog on bucket (Wrangler CLI)
npx wrangler r2 bucket catalog enable my-bucket

# Enable via Dashboard: R2 > Bucket > Settings > R2 Data Catalog > Enable
# Note the Warehouse and Catalog URI values

Python - PyIceberg Catalog Connection

from pyiceberg.catalog.rest import RestCatalog

# Connection configuration
catalog = RestCatalog(
    name="my_catalog",
    warehouse="<WAREHOUSE_ID>",  # From catalog settings
    uri="<CATALOG_URI>",          # From catalog settings
    token="<API_TOKEN>",          # Admin Read & Write token
)

# Create namespace
catalog.create_namespace_if_not_exists("default")

# Create table with schema
test_table = ("default", "people")
table = catalog.create_table(test_table, schema=df.schema)

# Append data
table.append(df)

# Query data
result = table.scan().to_arrow()

# Drop table
catalog.drop_table(test_table)

Data Catalog - API Token Setup

# 1. Navigate to: R2 > Manage API tokens > Create API token
# 2. Select "Admin Read & Write" permission (required for catalog access)
# 3. Save token for authentication

# Token must grant both R2 and catalog permissions for Iceberg clients

S3-Compatible SDK Usage (boto3)

import boto3

# Configure S3 client for R2
s3_client = boto3.client(
    's3',
    endpoint_url='https://<ACCOUNT_ID>.r2.cloudflarestorage.com',
    aws_access_key_id='<ACCESS_KEY_ID>',
    aws_secret_access_key='<SECRET_ACCESS_KEY>',
    region_name='auto'
)

# Upload object
s3_client.put_object(
    Bucket='my-bucket',
    Key='path/to/file.txt',
    Body=b'File contents'
)

# Download object
response = s3_client.get_object(Bucket='my-bucket', Key='path/to/file.txt')
data = response['Body'].read()

# List objects
response = s3_client.list_objects_v2(Bucket='my-bucket', Prefix='path/')
for obj in response.get('Contents', []):
    print(obj['Key'])

# Delete object
s3_client.delete_object(Bucket='my-bucket', Key='path/to/file.txt')

Presigned URLs

# Generate presigned URL for upload (expires in 1 hour)
presigned_url = s3_client.generate_presigned_url(
    'put_object',
    Params={
        'Bucket': 'my-bucket',
        'Key': 'uploads/file.txt'
    },
    ExpiresIn=3600
)

# Generate presigned URL for download
download_url = s3_client.generate_presigned_url(
    'get_object',
    Params={
        'Bucket': 'my-bucket',
        'Key': 'path/to/file.txt'
    },
    ExpiresIn=3600
)

Multipart Upload

# Initiate multipart upload
multipart = s3_client.create_multipart_upload(
    Bucket='my-bucket',
    Key='large-file.bin'
)
upload_id = multipart['UploadId']

# Upload parts
parts = []
for i, chunk in enumerate(file_chunks, start=1):
    part = s3_client.upload_part(
        Bucket='my-bucket',
        Key='large-file.bin',
        PartNumber=i,
        UploadId=upload_id,
        Body=chunk
    )
    parts.append({'PartNumber': i, 'ETag': part['ETag']})

# Complete multipart upload
s3_client.complete_multipart_upload(
    Bucket='my-bucket',
    Key='large-file.bin',
    UploadId=upload_id,
    MultipartUpload={'Parts': parts}
)

Workers Integration

export default {
  async fetch(request, env) {
    const bucket = env.MY_BUCKET; // R2 bucket binding

    // Upload to R2
    await bucket.put('key', 'value', {
      httpMetadata: {
        contentType: 'text/plain',
      },
      customMetadata: {
        user: 'example',
      },
    });

    // Retrieve from R2
    const object = await bucket.get('key');

    if (object === null) {
      return new Response('Object Not Found', { status: 404 });
    }

    // Return object with metadata
    return new Response(object.body, {
      headers: {
        'Content-Type': object.httpMetadata.contentType,
        'ETag': object.httpEtag,
      },
    });
  },
};

Bucket CORS Configuration

// Set CORS policy via S3 SDK
const corsConfig = {
  CORSRules: [
    {
      AllowedOrigins: ['https://example.com'],
      AllowedMethods: ['GET', 'PUT', 'POST', 'DELETE'],
      AllowedHeaders: ['*'],
      MaxAgeSeconds: 3000,
    },
  ],
};

await s3_client.put_bucket_cors(
  Bucket='my-bucket',
  CORSConfiguration=corsConfig
);

Object Metadata

# Upload with custom metadata
s3_client.put_object(
    Bucket='my-bucket',
    Key='document.pdf',
    Body=file_data,
    Metadata={
        'author': 'John Doe',
        'department': 'Engineering',
        'classification': 'internal',
    },
    ContentType='application/pdf',
)

# Retrieve metadata without downloading object
response = s3_client.head_object(Bucket='my-bucket', Key='document.pdf')
metadata = response['Metadata']
content_type = response['ContentType']

Data Catalog - Apache Spark Integration

from pyspark.sql import SparkSession

# Configure Spark with R2 Data Catalog
spark = SparkSession.builder \
    .config("spark.sql.catalog.r2", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.r2.catalog-impl", "org.apache.iceberg.rest.RESTCatalog") \
    .config("spark.sql.catalog.r2.uri", "<CATALOG_URI>") \
    .config("spark.sql.catalog.r2.warehouse", "<WAREHOUSE_ID>") \
    .config("spark.sql.catalog.r2.token", "<API_TOKEN>") \
    .getOrCreate()

# Create table
spark.sql("""
    CREATE TABLE r2.default.events (
        event_id STRING,
        timestamp TIMESTAMP,
        user_id STRING,
        action STRING
    ) USING iceberg
""")

# Insert data
spark.sql("""
    INSERT INTO r2.default.events
    VALUES ('evt123', current_timestamp(), 'user456', 'login')
""")

# Query data
df = spark.sql("SELECT * FROM r2.default.events WHERE action = 'login'")
df.show()

R2 SQL - Serverless Analytics Query Engine

# Query R2 Data Catalog tables with R2 SQL (Wrangler CLI)
npx wrangler r2 sql query "YOUR_WAREHOUSE_NAME" "SELECT * FROM default.table LIMIT 10"

# Authentication setup (required before querying)
export WRANGLER_R2_SQL_AUTH_TOKEN="YOUR_API_TOKEN"

# API token needs: Admin Read & Write + R2 SQL Read permissions
# Create at: R2 > Manage API tokens > Create API token

R2 SQL - Basic Query Patterns

# Select with filtering
npx wrangler r2 sql query "warehouse-123" \
  "SELECT user_id, event_type, product_id, amount
   FROM default.ecommerce
   WHERE event_type = 'purchase'
   LIMIT 10"

# Aggregation queries
npx wrangler r2 sql query "warehouse-123" \
  "SELECT
     user_id,
     COUNT(*) as transaction_count,
     SUM(amount) as total_spent
   FROM default.transactions
   GROUP BY user_id
   HAVING total_spent > 1000
   ORDER BY total_spent DESC"

# Time-based filtering
npx wrangler r2 sql query "warehouse-123" \
  "SELECT * FROM default.events
   WHERE timestamp >= '2024-01-01'
     AND timestamp < '2024-02-01'
   ORDER BY timestamp DESC"

# Join operations
npx wrangler r2 sql query "warehouse-123" \
  "SELECT
     u.user_id,
     u.name,
     t.transaction_id,
     t.amount
   FROM default.users u
   JOIN default.transactions t ON u.user_id = t.user_id
   WHERE t.fraud_flag = false"

R2 SQL - Stream Processing

# Insert from stream to sink table (continuous processing)
npx wrangler r2 sql query "warehouse-123" \
  "INSERT INTO ecommerce_sink
   SELECT * FROM ecommerce_stream"

# Filtered stream transformation
npx wrangler r2 sql query "warehouse-123" \
  "INSERT INTO high_value_transactions
   SELECT
     transaction_id,
     user_id,
     amount,
     timestamp
   FROM transaction_stream
   WHERE amount > 10000"

R2 SQL - Advanced Analytics

# Window functions for ranked queries
npx wrangler r2 sql query "warehouse-123" \
  "SELECT
     user_id,
     product_id,
     amount,
     ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY amount DESC) as purchase_rank
   FROM default.purchases
   QUALIFY purchase_rank <= 5"

# Time-series aggregations
npx wrangler r2 sql query "warehouse-123" \
  "SELECT
     DATE_TRUNC('hour', timestamp) as hour,
     COUNT(*) as event_count,
     AVG(response_time_ms) as avg_response_time
   FROM default.api_logs
   GROUP BY hour
   ORDER BY hour DESC
   LIMIT 24"

# Fraud detection pattern
npx wrangler r2 sql query "warehouse-123" \
  "SELECT
     user_id,
     COUNT(*) as transaction_count,
     SUM(CASE WHEN fraud_flag THEN 1 ELSE 0 END) as fraud_count,
     AVG(amount) as avg_amount
   FROM default.transactions
   WHERE timestamp >= CURRENT_DATE - INTERVAL '7' DAY
   GROUP BY user_id
   HAVING fraud_count > 0
   ORDER BY fraud_count DESC"

Reference Files

This skill includes comprehensive documentation in references/:

api.md - Api documentation
buckets.md - Buckets documentation
getting_started.md - Getting Started documentation
other.md - Other documentation

Use view to read specific reference files when detailed information is needed.

Working with This Skill

For Beginners

Start with the getting_started or tutorials reference files for foundational concepts.

For Specific Features

Use the appropriate category reference file (api, guides, etc.) for detailed information.

For Code Examples

The quick reference section above contains common patterns extracted from the official docs.

Resources

references/

Organized documentation extracted from official sources. These files contain:

Detailed explanations
Code examples with language annotations
Links to original documentation
Table of contents for quick navigation

scripts/

Add helper scripts here for common automation tasks.

assets/

Add templates, boilerplate, or example projects here.

Notes

This skill was automatically generated from official documentation
Reference files preserve the structure and examples from source docs
Code examples include language detection for better syntax highlighting
Quick reference patterns are extracted from common usage examples in the docs

Updating

To refresh this skill with updated documentation:

Re-run the scraper with the same configuration
The skill will be rebuilt with the latest information