openrouter

rawveg/openrouter

AI & ML

2 installs

About

SKILL.md

openrouter

rawveg/openrouter

AI & ML

2 installs

About

OpenRouter API - Unified access to 400+ AI models through one API

SKILL.md

OpenRouter Skill

Comprehensive assistance with OpenRouter API development, providing unified access to hundreds of AI models through a single endpoint with intelligent routing, automatic fallbacks, and standardized interfaces.

When to Use This Skill

This skill should be triggered when:

Making API calls to multiple AI model providers through a unified interface
Implementing model fallback strategies or auto-routing
Working with OpenAI-compatible SDKs but targeting multiple providers
Configuring advanced sampling parameters (temperature, top_p, penalties)
Setting up streaming responses or structured JSON outputs
Comparing costs across different AI models
Building applications that need automatic provider failover
Implementing function/tool calling across different models
Questions about OpenRouter-specific features (routing, fallbacks, zero completion insurance)

Quick Reference

Basic Chat Completion (Python)

from openai import OpenAI

client = OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key="<OPENROUTER_API_KEY>",
)

completion = client.chat.completions.create(
  model="openai/gpt-4o",
  messages=[{"role": "user", "content": "What is the meaning of life?"}]
)
print(completion.choices[0].message.content)

Basic Chat Completion (JavaScript/TypeScript)

import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'https://openrouter.ai/api/v1',
  apiKey: '<OPENROUTER_API_KEY>',
});

const completion = await openai.chat.completions.create({
  model: 'openai/gpt-4o',
  messages: [{"role": 'user', "content": 'What is the meaning of life?'}],
});
console.log(completion.choices[0].message);

cURL Request

curl https://openrouter.ai/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [{"role": "user", "content": "What is the meaning of life?"}]
  }'

Model Fallback Configuration (Python)

completion = client.chat.completions.create(
    model="openai/gpt-4o",
    extra_body={
        "models": ["anthropic/claude-3.5-sonnet", "gryphe/mythomax-l2-13b"],
    },
    messages=[{"role": "user", "content": "Your prompt here"}]
)

Model Fallback Configuration (TypeScript)

const completion = await client.chat.completions.create({
    model: 'openai/gpt-4o',
    models: ['anthropic/claude-3.5-sonnet', 'gryphe/mythomax-l2-13b'],
    messages: [{ role: 'user', content: 'Your prompt here' }],
});

Auto Router (Dynamic Model Selection)

completion = client.chat.completions.create(
    model="openrouter/auto",  # Automatically selects best model for the prompt
    messages=[{"role": "user", "content": "Your prompt here"}]
)

Advanced Parameters Example

completion = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Write a creative story"}],
    temperature=0.8,           # Higher for creativity (0.0-2.0)
    max_tokens=500,            # Limit response length
    top_p=0.9,                 # Nucleus sampling (0.0-1.0)
    frequency_penalty=0.5,     # Reduce repetition (-2.0-2.0)
    presence_penalty=0.3       # Encourage topic diversity (-2.0-2.0)
)

Streaming Response

stream = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end='')

JSON Mode (Structured Output)

completion = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{
        "role": "user",
        "content": "Extract person's name, age, and city from: John is 30 and lives in NYC"
    }],
    response_format={"type": "json_object"}
)

Deterministic Output with Seed

completion = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Generate a random number"}],
    seed=42,            # Same seed = same output (when supported)
    temperature=0.0     # Deterministic sampling
)

Key Concepts

Model Routing

OpenRouter provides intelligent routing capabilities:

Auto Router (openrouter/auto): Automatically selects the best model based on your prompt using NotDiamond
Fallback Models: Specify multiple models that automatically retry if primary fails
Provider Routing: Automatically routes across providers for reliability

Authentication

Uses Bearer token authentication with API keys
API keys can be managed programmatically
Compatible with OpenAI SDK authentication patterns

Model Naming Convention

Models use the format provider/model-name:

openai/gpt-4o - OpenAI's GPT-4 Optimized
anthropic/claude-3.5-sonnet - Anthropic's Claude 3.5 Sonnet
google/gemini-2.0-flash-exp:free - Google's free Gemini model
openrouter/auto - Auto-routing system

Sampling Parameters

Temperature (0.0-2.0, default: 1.0)

Lower = more predictable, focused responses
Higher = more creative, diverse responses
Use low (0.0-0.3) for factual tasks, high (0.8-1.5) for creative work

Top P (0.0-1.0, default: 1.0)

Limits choices to percentage of likely tokens
Dynamic filtering of improbable options
Balance between consistency and variety

Frequency/Presence Penalties (-2.0-2.0, default: 0.0)

Frequency: Discourages repeating tokens proportional to use
Presence: Simpler penalty not scaled by count
Positive values reduce repetition, negative encourage reuse

Max Tokens (integer)

Sets maximum response length
Cannot exceed context length minus prompt length
Use to control costs and enforce concise replies

Response Formats

Standard JSON: Default chat completion format
Streaming: Server-Sent Events (SSE) with stream: true
JSON Mode: Guaranteed valid JSON with response_format: {"type": "json_object"}
Structured Outputs: Schema-validated JSON responses

Advanced Features

Tool/Function Calling: Connect models to external APIs
Multimodal Inputs: Support for images, PDFs, audio
Prompt Caching: Reduce costs for repeated prompts
Web Search Integration: Enhanced responses with web data
Zero Completion Insurance: Protection against failed responses
Logprobs: Access token probabilities for confidence analysis

Reference Files

This skill includes comprehensive documentation in references/:

llms-full.md - Complete list of available models with metadata
llms-small.md - Curated subset of popular models
llms.md - Standard model listings

Use view to read specific reference files when detailed model information is needed.

Working with This Skill

For Beginners

Start with basic chat completion examples (Python/JavaScript/cURL above)
Use the standard OpenAI SDK for easy integration
Try simple model names like openai/gpt-4o or anthropic/claude-3.5-sonnet
Keep parameters simple initially (just model and messages)

For Intermediate Users

Implement model fallback arrays for reliability
Experiment with sampling parameters (temperature, top_p)
Use streaming for better UX in conversational apps
Try openrouter/auto for automatic model selection
Implement JSON mode for structured data extraction

For Advanced Users

Fine-tune multiple sampling parameters together
Implement custom routing logic with fallback chains
Use logprobs for confidence scoring
Leverage tool/function calling capabilities
Optimize costs by selecting appropriate models per task
Implement prompt caching strategies
Use seed parameter for reproducible testing

Common Patterns

Error Handling with Fallbacks

try:
    completion = client.chat.completions.create(
        model="openai/gpt-4o",
        extra_body={
            "models": [
                "anthropic/claude-3.5-sonnet",
                "google/gemini-2.0-flash-exp:free"
            ]
        },
        messages=[{"role": "user", "content": "Your prompt"}]
    )
except Exception as e:
    print(f"All models failed: {e}")

Cost-Optimized Routing

# Use cheaper models for simple tasks
simple_completion = client.chat.completions.create(
    model="google/gemini-2.0-flash-exp:free",
    messages=[{"role": "user", "content": "Simple question"}]
)

# Use premium models for complex tasks
complex_completion = client.chat.completions.create(
    model="openai/o1",
    messages=[{"role": "user", "content": "Complex reasoning task"}]
)

Context-Aware Temperature

# Low temperature for factual responses
factual = client.chat.completions.create(
    model="openai/gpt-4o",
    temperature=0.2,
    messages=[{"role": "user", "content": "What is the capital of France?"}]
)

# High temperature for creative content
creative = client.chat.completions.create(
    model="openai/gpt-4o",
    temperature=1.2,
    messages=[{"role": "user", "content": "Write a unique story opening"}]
)

Resources

Official Documentation

API Reference: https://openrouter.ai/docs/api-reference/overview
Quickstart Guide: https://openrouter.ai/docs/quickstart
Model List: https://openrouter.ai/docs/models
Parameters Guide: https://openrouter.ai/docs/api-reference/parameters

Key Endpoints

Chat Completions: POST https://openrouter.ai/api/v1/chat/completions
List Models: GET https://openrouter.ai/api/v1/models
Generation Info: GET https://openrouter.ai/api/v1/generation

Notes

OpenRouter normalizes API schemas across all providers
Uses OpenAI-compatible API format for easy migration
Automatic provider fallback if models are rate-limited or down
Pricing based on actual model used (important for fallbacks)
Response includes metadata about which model processed the request
All models support streaming via Server-Sent Events
Compatible with popular frameworks (LangChain, Vercel AI SDK, etc.)

Best Practices

Always implement fallbacks for production applications
Use appropriate temperature based on task type (low for factual, high for creative)
Set max_tokens to control costs and response length
Enable streaming for better user experience in chat applications
Use JSON mode when you need guaranteed structured output
Test with seed parameter for reproducible results during development
Monitor costs by selecting appropriate models per task
Use auto-routing when unsure which model performs best
Implement proper error handling for rate limits and failures
Cache prompts for repeated requests to reduce costs

About

SKILL.md

About

OpenRouter API - Unified access to 400+ AI models through one API

SKILL.md

OpenRouter Skill

When to Use This Skill

This skill should be triggered when:

Making API calls to multiple AI model providers through a unified interface
Implementing model fallback strategies or auto-routing
Working with OpenAI-compatible SDKs but targeting multiple providers
Configuring advanced sampling parameters (temperature, top_p, penalties)
Setting up streaming responses or structured JSON outputs
Comparing costs across different AI models
Building applications that need automatic provider failover
Implementing function/tool calling across different models
Questions about OpenRouter-specific features (routing, fallbacks, zero completion insurance)

Quick Reference

Basic Chat Completion (Python)

from openai import OpenAI

client = OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key="<OPENROUTER_API_KEY>",
)

completion = client.chat.completions.create(
  model="openai/gpt-4o",
  messages=[{"role": "user", "content": "What is the meaning of life?"}]
)
print(completion.choices[0].message.content)

Basic Chat Completion (JavaScript/TypeScript)

import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'https://openrouter.ai/api/v1',
  apiKey: '<OPENROUTER_API_KEY>',
});

const completion = await openai.chat.completions.create({
  model: 'openai/gpt-4o',
  messages: [{"role": 'user', "content": 'What is the meaning of life?'}],
});
console.log(completion.choices[0].message);

cURL Request

curl https://openrouter.ai/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [{"role": "user", "content": "What is the meaning of life?"}]
  }'

Model Fallback Configuration (Python)

completion = client.chat.completions.create(
    model="openai/gpt-4o",
    extra_body={
        "models": ["anthropic/claude-3.5-sonnet", "gryphe/mythomax-l2-13b"],
    },
    messages=[{"role": "user", "content": "Your prompt here"}]
)

Model Fallback Configuration (TypeScript)

const completion = await client.chat.completions.create({
    model: 'openai/gpt-4o',
    models: ['anthropic/claude-3.5-sonnet', 'gryphe/mythomax-l2-13b'],
    messages: [{ role: 'user', content: 'Your prompt here' }],
});

Auto Router (Dynamic Model Selection)

completion = client.chat.completions.create(
    model="openrouter/auto",  # Automatically selects best model for the prompt
    messages=[{"role": "user", "content": "Your prompt here"}]
)

Advanced Parameters Example

completion = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Write a creative story"}],
    temperature=0.8,           # Higher for creativity (0.0-2.0)
    max_tokens=500,            # Limit response length
    top_p=0.9,                 # Nucleus sampling (0.0-1.0)
    frequency_penalty=0.5,     # Reduce repetition (-2.0-2.0)
    presence_penalty=0.3       # Encourage topic diversity (-2.0-2.0)
)

Streaming Response

stream = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end='')

JSON Mode (Structured Output)

completion = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{
        "role": "user",
        "content": "Extract person's name, age, and city from: John is 30 and lives in NYC"
    }],
    response_format={"type": "json_object"}
)

Deterministic Output with Seed

completion = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Generate a random number"}],
    seed=42,            # Same seed = same output (when supported)
    temperature=0.0     # Deterministic sampling
)

Key Concepts

Model Routing

OpenRouter provides intelligent routing capabilities:

Auto Router (openrouter/auto): Automatically selects the best model based on your prompt using NotDiamond
Fallback Models: Specify multiple models that automatically retry if primary fails
Provider Routing: Automatically routes across providers for reliability

Authentication

Uses Bearer token authentication with API keys
API keys can be managed programmatically
Compatible with OpenAI SDK authentication patterns

Model Naming Convention

Models use the format provider/model-name:

openai/gpt-4o - OpenAI's GPT-4 Optimized
anthropic/claude-3.5-sonnet - Anthropic's Claude 3.5 Sonnet
google/gemini-2.0-flash-exp:free - Google's free Gemini model
openrouter/auto - Auto-routing system

Sampling Parameters

Temperature (0.0-2.0, default: 1.0)

Lower = more predictable, focused responses
Higher = more creative, diverse responses
Use low (0.0-0.3) for factual tasks, high (0.8-1.5) for creative work

Top P (0.0-1.0, default: 1.0)

Limits choices to percentage of likely tokens
Dynamic filtering of improbable options
Balance between consistency and variety

Frequency/Presence Penalties (-2.0-2.0, default: 0.0)

Frequency: Discourages repeating tokens proportional to use
Presence: Simpler penalty not scaled by count
Positive values reduce repetition, negative encourage reuse

Max Tokens (integer)

Sets maximum response length
Cannot exceed context length minus prompt length
Use to control costs and enforce concise replies

Response Formats

Standard JSON: Default chat completion format
Streaming: Server-Sent Events (SSE) with stream: true
JSON Mode: Guaranteed valid JSON with response_format: {"type": "json_object"}
Structured Outputs: Schema-validated JSON responses

Advanced Features

Tool/Function Calling: Connect models to external APIs
Multimodal Inputs: Support for images, PDFs, audio
Prompt Caching: Reduce costs for repeated prompts
Web Search Integration: Enhanced responses with web data
Zero Completion Insurance: Protection against failed responses
Logprobs: Access token probabilities for confidence analysis

Reference Files

This skill includes comprehensive documentation in references/:

llms-full.md - Complete list of available models with metadata
llms-small.md - Curated subset of popular models
llms.md - Standard model listings

Use view to read specific reference files when detailed model information is needed.

Working with This Skill

For Beginners

Start with basic chat completion examples (Python/JavaScript/cURL above)
Use the standard OpenAI SDK for easy integration
Try simple model names like openai/gpt-4o or anthropic/claude-3.5-sonnet
Keep parameters simple initially (just model and messages)

For Intermediate Users

Implement model fallback arrays for reliability
Experiment with sampling parameters (temperature, top_p)
Use streaming for better UX in conversational apps
Try openrouter/auto for automatic model selection
Implement JSON mode for structured data extraction

For Advanced Users

Fine-tune multiple sampling parameters together
Implement custom routing logic with fallback chains
Use logprobs for confidence scoring
Leverage tool/function calling capabilities
Optimize costs by selecting appropriate models per task
Implement prompt caching strategies
Use seed parameter for reproducible testing

Common Patterns

Error Handling with Fallbacks

try:
    completion = client.chat.completions.create(
        model="openai/gpt-4o",
        extra_body={
            "models": [
                "anthropic/claude-3.5-sonnet",
                "google/gemini-2.0-flash-exp:free"
            ]
        },
        messages=[{"role": "user", "content": "Your prompt"}]
    )
except Exception as e:
    print(f"All models failed: {e}")

Cost-Optimized Routing

# Use cheaper models for simple tasks
simple_completion = client.chat.completions.create(
    model="google/gemini-2.0-flash-exp:free",
    messages=[{"role": "user", "content": "Simple question"}]
)

# Use premium models for complex tasks
complex_completion = client.chat.completions.create(
    model="openai/o1",
    messages=[{"role": "user", "content": "Complex reasoning task"}]
)

Context-Aware Temperature

# Low temperature for factual responses
factual = client.chat.completions.create(
    model="openai/gpt-4o",
    temperature=0.2,
    messages=[{"role": "user", "content": "What is the capital of France?"}]
)

# High temperature for creative content
creative = client.chat.completions.create(
    model="openai/gpt-4o",
    temperature=1.2,
    messages=[{"role": "user", "content": "Write a unique story opening"}]
)

Resources

Official Documentation

API Reference: https://openrouter.ai/docs/api-reference/overview
Quickstart Guide: https://openrouter.ai/docs/quickstart
Model List: https://openrouter.ai/docs/models
Parameters Guide: https://openrouter.ai/docs/api-reference/parameters

Key Endpoints

Chat Completions: POST https://openrouter.ai/api/v1/chat/completions
List Models: GET https://openrouter.ai/api/v1/models
Generation Info: GET https://openrouter.ai/api/v1/generation

Notes

OpenRouter normalizes API schemas across all providers
Uses OpenAI-compatible API format for easy migration
Automatic provider fallback if models are rate-limited or down
Pricing based on actual model used (important for fallbacks)
Response includes metadata about which model processed the request
All models support streaming via Server-Sent Events
Compatible with popular frameworks (LangChain, Vercel AI SDK, etc.)

Best Practices

Always implement fallbacks for production applications
Use appropriate temperature based on task type (low for factual, high for creative)
Set max_tokens to control costs and response length
Enable streaming for better user experience in chat applications
Use JSON mode when you need guaranteed structured output
Test with seed parameter for reproducible results during development
Monitor costs by selecting appropriate models per task
Use auto-routing when unsure which model performs best
Implement proper error handling for rate limits and failures
Cache prompts for repeated requests to reduce costs