Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    K-Dense-AI

    pufferlib

    K-Dense-AI/pufferlib
    AI & ML
    8,232
    2 installs

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    High-performance reinforcement learning framework optimized for speed and scale.

    SKILL.md

    PufferLib - High-Performance Reinforcement Learning

    Overview

    PufferLib is a high-performance reinforcement learning library designed for fast parallel environment simulation and training. It achieves training at millions of steps per second through optimized vectorization, native multi-agent support, and efficient PPO implementation (PuffeRL). The library provides the Ocean suite of 20+ environments and seamless integration with Gymnasium, PettingZoo, and specialized RL frameworks.

    When to Use This Skill

    Use this skill when:

    • Training RL agents with PPO on any environment (single or multi-agent)
    • Creating custom environments using the PufferEnv API
    • Optimizing performance for parallel environment simulation (vectorization)
    • Integrating existing environments from Gymnasium, PettingZoo, Atari, Procgen, etc.
    • Developing policies with CNN, LSTM, or custom architectures
    • Scaling RL to millions of steps per second for faster experimentation
    • Multi-agent RL with native multi-agent environment support

    Core Capabilities

    1. High-Performance Training (PuffeRL)

    PuffeRL is PufferLib's optimized PPO+LSTM training algorithm achieving 1M-4M steps/second.

    Quick start training:

    # CLI training
    puffer train procgen-coinrun --train.device cuda --train.learning-rate 3e-4
    
    # Distributed training
    torchrun --nproc_per_node=4 train.py
    

    Python training loop:

    import pufferlib
    from pufferlib import PuffeRL
    
    # Create vectorized environment
    env = pufferlib.make('procgen-coinrun', num_envs=256)
    
    # Create trainer
    trainer = PuffeRL(
        env=env,
        policy=my_policy,
        device='cuda',
        learning_rate=3e-4,
        batch_size=32768
    )
    
    # Training loop
    for iteration in range(num_iterations):
        trainer.evaluate()  # Collect rollouts
        trainer.train()     # Train on batch
        trainer.mean_and_log()  # Log results
    

    For comprehensive training guidance, read references/training.md for:

    • Complete training workflow and CLI options
    • Hyperparameter tuning with Protein
    • Distributed multi-GPU/multi-node training
    • Logger integration (Weights & Biases, Neptune)
    • Checkpointing and resume training
    • Performance optimization tips
    • Curriculum learning patterns

    2. Environment Development (PufferEnv)

    Create custom high-performance environments with the PufferEnv API.

    Basic environment structure:

    import numpy as np
    from pufferlib import PufferEnv
    
    class MyEnvironment(PufferEnv):
        def __init__(self, buf=None):
            super().__init__(buf)
    
            # Define spaces
            self.observation_space = self.make_space((4,))
            self.action_space = self.make_discrete(4)
    
            self.reset()
    
        def reset(self):
            # Reset state and return initial observation
            return np.zeros(4, dtype=np.float32)
    
        def step(self, action):
            # Execute action, compute reward, check done
            obs = self._get_observation()
            reward = self._compute_reward()
            done = self._is_done()
            info = {}
    
            return obs, reward, done, info
    

    Use the template script: scripts/env_template.py provides complete single-agent and multi-agent environment templates with examples of:

    • Different observation space types (vector, image, dict)
    • Action space variations (discrete, continuous, multi-discrete)
    • Multi-agent environment structure
    • Testing utilities

    For complete environment development, read references/environments.md for:

    • PufferEnv API details and in-place operation patterns
    • Observation and action space definitions
    • Multi-agent environment creation
    • Ocean suite (20+ pre-built environments)
    • Performance optimization (Python to C workflow)
    • Environment wrappers and best practices
    • Debugging and validation techniques

    3. Vectorization and Performance

    Achieve maximum throughput with optimized parallel simulation.

    Vectorization setup:

    import pufferlib
    
    # Automatic vectorization
    env = pufferlib.make('environment_name', num_envs=256, num_workers=8)
    
    # Performance benchmarks:
    # - Pure Python envs: 100k-500k SPS
    # - C-based envs: 100M+ SPS
    # - With training: 400k-4M total SPS
    

    Key optimizations:

    • Shared memory buffers for zero-copy observation passing
    • Busy-wait flags instead of pipes/queues
    • Surplus environments for async returns
    • Multiple environments per worker

    For vectorization optimization, read references/vectorization.md for:

    • Architecture and performance characteristics
    • Worker and batch size configuration
    • Serial vs multiprocessing vs async modes
    • Shared memory and zero-copy patterns
    • Hierarchical vectorization for large scale
    • Multi-agent vectorization strategies
    • Performance profiling and troubleshooting

    4. Policy Development

    Build policies as standard PyTorch modules with optional utilities.

    Basic policy structure:

    import torch.nn as nn
    from pufferlib.pytorch import layer_init
    
    class Policy(nn.Module):
        def __init__(self, observation_space, action_space):
            super().__init__()
    
            # Encoder
            self.encoder = nn.Sequential(
                layer_init(nn.Linear(obs_dim, 256)),
                nn.ReLU(),
                layer_init(nn.Linear(256, 256)),
                nn.ReLU()
            )
    
            # Actor and critic heads
            self.actor = layer_init(nn.Linear(256, num_actions), std=0.01)
            self.critic = layer_init(nn.Linear(256, 1), std=1.0)
    
        def forward(self, observations):
            features = self.encoder(observations)
            return self.actor(features), self.critic(features)
    

    For complete policy development, read references/policies.md for:

    • CNN policies for image observations
    • Recurrent policies with optimized LSTM (3x faster inference)
    • Multi-input policies for complex observations
    • Continuous action policies
    • Multi-agent policies (shared vs independent parameters)
    • Advanced architectures (attention, residual)
    • Observation normalization and gradient clipping
    • Policy debugging and testing

    5. Environment Integration

    Seamlessly integrate environments from popular RL frameworks.

    Gymnasium integration:

    import gymnasium as gym
    import pufferlib
    
    # Wrap Gymnasium environment
    gym_env = gym.make('CartPole-v1')
    env = pufferlib.emulate(gym_env, num_envs=256)
    
    # Or use make directly
    env = pufferlib.make('gym-CartPole-v1', num_envs=256)
    

    PettingZoo multi-agent:

    # Multi-agent environment
    env = pufferlib.make('pettingzoo-knights-archers-zombies', num_envs=128)
    

    Supported frameworks:

    • Gymnasium / OpenAI Gym
    • PettingZoo (parallel and AEC)
    • Atari (ALE)
    • Procgen
    • NetHack / MiniHack
    • Minigrid
    • Neural MMO
    • Crafter
    • GPUDrive
    • MicroRTS
    • Griddly
    • And more...

    For integration details, read references/integration.md for:

    • Complete integration examples for each framework
    • Custom wrappers (observation, reward, frame stacking, action repeat)
    • Space flattening and unflattening
    • Environment registration
    • Compatibility patterns
    • Performance considerations
    • Integration debugging

    Quick Start Workflow

    For Training Existing Environments

    1. Choose environment from Ocean suite or compatible framework
    2. Use scripts/train_template.py as starting point
    3. Configure hyperparameters for your task
    4. Run training with CLI or Python script
    5. Monitor with Weights & Biases or Neptune
    6. Refer to references/training.md for optimization

    For Creating Custom Environments

    1. Start with scripts/env_template.py
    2. Define observation and action spaces
    3. Implement reset() and step() methods
    4. Test environment locally
    5. Vectorize with pufferlib.emulate() or make()
    6. Refer to references/environments.md for advanced patterns
    7. Optimize with references/vectorization.md if needed

    For Policy Development

    1. Choose architecture based on observations:
      • Vector observations → MLP policy
      • Image observations → CNN policy
      • Sequential tasks → LSTM policy
      • Complex observations → Multi-input policy
    2. Use layer_init for proper weight initialization
    3. Follow patterns in references/policies.md
    4. Test with environment before full training

    For Performance Optimization

    1. Profile current throughput (steps per second)
    2. Check vectorization configuration (num_envs, num_workers)
    3. Optimize environment code (in-place ops, numpy vectorization)
    4. Consider C implementation for critical paths
    5. Use references/vectorization.md for systematic optimization

    Resources

    scripts/

    train_template.py - Complete training script template with:

    • Environment creation and configuration
    • Policy initialization
    • Logger integration (WandB, Neptune)
    • Training loop with checkpointing
    • Command-line argument parsing
    • Multi-GPU distributed training setup

    env_template.py - Environment implementation templates:

    • Single-agent PufferEnv example (grid world)
    • Multi-agent PufferEnv example (cooperative navigation)
    • Multiple observation/action space patterns
    • Testing utilities

    references/

    training.md - Comprehensive training guide:

    • Training workflow and CLI options
    • Hyperparameter configuration
    • Distributed training (multi-GPU, multi-node)
    • Monitoring and logging
    • Checkpointing
    • Protein hyperparameter tuning
    • Performance optimization
    • Common training patterns
    • Troubleshooting

    environments.md - Environment development guide:

    • PufferEnv API and characteristics
    • Observation and action spaces
    • Multi-agent environments
    • Ocean suite environments
    • Custom environment development workflow
    • Python to C optimization path
    • Third-party environment integration
    • Wrappers and best practices
    • Debugging

    vectorization.md - Vectorization optimization:

    • Architecture and key optimizations
    • Vectorization modes (serial, multiprocessing, async)
    • Worker and batch configuration
    • Shared memory and zero-copy patterns
    • Advanced vectorization (hierarchical, custom)
    • Multi-agent vectorization
    • Performance monitoring and profiling
    • Troubleshooting and best practices

    policies.md - Policy architecture guide:

    • Basic policy structure
    • CNN policies for images
    • LSTM policies with optimization
    • Multi-input policies
    • Continuous action policies
    • Multi-agent policies
    • Advanced architectures (attention, residual)
    • Observation processing and unflattening
    • Initialization and normalization
    • Debugging and testing

    integration.md - Framework integration guide:

    • Gymnasium integration
    • PettingZoo integration (parallel and AEC)
    • Third-party environments (Procgen, NetHack, Minigrid, etc.)
    • Custom wrappers (observation, reward, frame stacking, etc.)
    • Space conversion and unflattening
    • Environment registration
    • Compatibility patterns
    • Performance considerations
    • Debugging integration

    Tips for Success

    1. Start simple: Begin with Ocean environments or Gymnasium integration before creating custom environments

    2. Profile early: Measure steps per second from the start to identify bottlenecks

    3. Use templates: scripts/train_template.py and scripts/env_template.py provide solid starting points

    4. Read references as needed: Each reference file is self-contained and focused on a specific capability

    5. Optimize progressively: Start with Python, profile, then optimize critical paths with C if needed

    6. Leverage vectorization: PufferLib's vectorization is key to achieving high throughput

    7. Monitor training: Use WandB or Neptune to track experiments and identify issues early

    8. Test environments: Validate environment logic before scaling up training

    9. Check existing environments: Ocean suite provides 20+ pre-built environments

    10. Use proper initialization: Always use layer_init from pufferlib.pytorch for policies

    Common Use Cases

    Training on Standard Benchmarks

    # Atari
    env = pufferlib.make('atari-pong', num_envs=256)
    
    # Procgen
    env = pufferlib.make('procgen-coinrun', num_envs=256)
    
    # Minigrid
    env = pufferlib.make('minigrid-empty-8x8', num_envs=256)
    

    Multi-Agent Learning

    # PettingZoo
    env = pufferlib.make('pettingzoo-pistonball', num_envs=128)
    
    # Shared policy for all agents
    policy = create_policy(env.observation_space, env.action_space)
    trainer = PuffeRL(env=env, policy=policy)
    

    Custom Task Development

    # Create custom environment
    class MyTask(PufferEnv):
        # ... implement environment ...
    
    # Vectorize and train
    env = pufferlib.emulate(MyTask, num_envs=256)
    trainer = PuffeRL(env=env, policy=my_policy)
    

    High-Performance Optimization

    # Maximize throughput
    env = pufferlib.make(
        'my-env',
        num_envs=1024,      # Large batch
        num_workers=16,     # Many workers
        envs_per_worker=64  # Optimize per worker
    )
    

    Installation

    uv pip install pufferlib
    

    Documentation

    • Official docs: https://puffer.ai/docs.html
    • GitHub: https://github.com/PufferAI/PufferLib
    • Discord: Community support available
    Repository
    k-dense-ai/claude-scientific-skills
    Files