Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    ianbarber

    strix-halo-setup

    ianbarber/strix-halo-setup
    AI & ML
    2
    6 installs

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Complete setup for AMD Strix Halo (Ryzen AI MAX+ 395) PyTorch environments.

    SKILL.md

    Strix Halo Setup

    Set up a new PyTorch project optimized for AMD Strix Halo (Ryzen AI MAX+ 395, gfx1151).

    When Claude Should Use This Skill

    This skill should be invoked when:

    • Setting up PyTorch on AMD Strix Halo (Ryzen AI MAX+ 395, gfx1151) hardware
    • User reports "HIP error: invalid device function" with PyTorch on AMD APU
    • Configuring environments for running LLMs on AMD integrated graphics
    • User mentions needing GTT memory configuration for ML workloads
    • Creating a new ML project specifically for Strix Halo hardware
    • User asks about running 30B parameter models on AMD Ryzen AI MAX+

    What This Skill Does

    1. Verifies system configuration (ROCm, GTT, user groups)
    2. Creates a conda environment with working PyTorch for gfx1151
    3. Sets up proper environment variables
    4. Creates test scripts to verify GPU functionality
    5. Provides a complete project template with best practices

    Critical Information

    PyTorch Installation: Official PyTorch wheels from pytorch.org DO NOT WORK with gfx1151. They detect the GPU but fail on compute with "HIP error: invalid device function". This skill installs community builds that actually work.

    ROCm Installation Note: For Strix Halo APUs, ROCm should be installed with --no-dkms flag to use the inbox kernel driver. If you have amdgpu-dkms installed, it may cause issues when upgrading kernels.

    Prerequisites Check

    Before running setup, verify the system with:

    ./scripts/verify_system.sh
    

    This checks:

    • ROCm installation (6.4.4+ or 7.x required)
    • User in render and video groups
    • GTT memory configuration
    • Python/Conda availability

    ROCm 7.x Note: ROCm 7.2+ offers significant performance improvements (~2.5x in BF16 compute). However, hipBLASLt is not fully optimized for gfx1151 yet and falls back to hipBLAS.

    If any checks fail, see .claude/skills/strix-halo-setup/docs/TROUBLESHOOTING.md for detailed fix instructions.

    Setup Process

    Step 1: System Verification

    Run the verification script:

    cd .claude/skills/strix-halo-setup
    ./scripts/verify_system.sh
    

    Expected output:

    • ✓ AMD GPU detected
    • ✓ ROCm installed
    • ✓ User in render/video groups
    • ✓ GTT configured (or warning if not)

    If issues found, follow the script's instructions to fix them.

    Step 2: Determine Project Name and Backend

    Ask the user for:

    1. Project name: If not specified, use strix-ml-project
    2. Backend choice: PyTorch (training/custom code) or Vulkan (inference only)

    Use the AskUserQuestion tool:

    Question 1: "What would you like to name your project?" Question 2: "Which backend do you want to set up?"

    • PyTorch with ROCm: For training, custom code, full ML framework (supports transformers, etc.)
    • Vulkan: For inference only (llama.cpp, Ollama) - simpler setup, often faster

    If PyTorch is chosen, continue with steps below. If Vulkan, skip to Vulkan setup section at the end.

    Step 3: Create Environment

    Using Conda (Recommended):

    # Create new environment with Python 3.14 (or 3.13)
    conda create -n {project_name} python=3.14 -y
    conda activate {project_name}
    

    Using uv (Alternative):

    # Create new environment with Python 3.14 (or 3.13)
    uv venv {project_name} --python 3.14
    source {project_name}/bin/activate
    

    Step 4: Install PyTorch (Community Build)

    CRITICAL: Must use community builds, not official wheels.

    Option 1: AMD Nightlies (Recommended)

    pip install --index-url https://rocm.nightlies.amd.com/v2/gfx1151/ --pre torch torchvision torchaudio
    

    As of January 2026, this provides PyTorch 2.11.0a0+ with ROCm 7.11.0 support.

    Option 2: TheRock Builds (Alternative) Pre-built wheels from the official ROCm TheRock project: https://github.com/ROCm/TheRock/releases

    Look for gfx1151 releases and install with pip install <wheel_file>.

    Verify Installation:

    python -c "import torch; print('PyTorch:', torch.__version__); print('HIP:', torch.version.hip)"
    

    Should show PyTorch 2.11+ and HIP 7.2+ with ROCm 7.x.

    Step 5: Configure Environment Variables

    Create activation script in the conda environment:

    mkdir -p $CONDA_PREFIX/etc/conda/activate.d
    
    cat > $CONDA_PREFIX/etc/conda/activate.d/strix_halo_env.sh << 'EOF'
    #!/bin/bash
    
    # Core ROCm settings for Strix Halo (gfx1151)
    export HSA_OVERRIDE_GFX_VERSION=11.5.1
    export PYTORCH_ROCM_ARCH=gfx1151
    
    # Unified Memory Configuration - CRITICAL for accessing full memory
    export HSA_XNACK=1
    export HSA_FORCE_FINE_GRAIN_PCIE=1
    
    # Memory allocation settings
    export GPU_MAX_HEAP_SIZE=100
    export GPU_MAX_ALLOC_PERCENT=100
    
    # Device visibility
    export ROCR_VISIBLE_DEVICES=0
    export HIP_VISIBLE_DEVICES=0
    
    # Performance optimizations
    export ROCBLAS_USE_HIPBLASLT=1
    export AMD_LOG_LEVEL=0
    export HSA_CU_MASK=0xffffffffffffffff
    
    # ROCm 7.x stability fixes for APUs
    export HSA_ENABLE_SDMA=0  # Prevents checkerboard artifacts in VAE decodes
    
    # PyTorch memory management (recommended for 32GB+ workloads)
    export PYTORCH_HIP_ALLOC_CONF="backend:native,expandable_segments:True,garbage_collection_threshold:0.9"
    
    echo "✓ Strix Halo environment variables set"
    EOF
    
    chmod +x $CONDA_PREFIX/etc/conda/activate.d/strix_halo_env.sh
    

    Create deactivation script:

    mkdir -p $CONDA_PREFIX/etc/conda/deactivate.d
    
    cat > $CONDA_PREFIX/etc/conda/deactivate.d/strix_halo_env.sh << 'EOF'
    #!/bin/bash
    
    unset HSA_OVERRIDE_GFX_VERSION PYTORCH_ROCM_ARCH HSA_XNACK HSA_FORCE_FINE_GRAIN_PCIE
    unset GPU_MAX_HEAP_SIZE GPU_MAX_ALLOC_PERCENT ROCR_VISIBLE_DEVICES HIP_VISIBLE_DEVICES
    unset ROCBLAS_USE_HIPBLASLT AMD_LOG_LEVEL HSA_CU_MASK HSA_ENABLE_SDMA PYTORCH_HIP_ALLOC_CONF
    EOF
    
    chmod +x $CONDA_PREFIX/etc/conda/deactivate.d/strix_halo_env.sh
    

    Step 6: Create Project Structure

    mkdir -p {project_name}/{scripts,notebooks,data,models,tests}
    cd {project_name}
    

    Step 7: Copy Test Scripts

    Copy the test scripts from the skill directory:

    cp .claude/skills/strix-halo-setup/scripts/*.py scripts/
    chmod +x scripts/*.py
    

    Step 8: Create Project README

    Create a README with project-specific information:

    cat > README.md << 'EOF'
    # {Project Name}
    
    PyTorch project optimized for AMD Strix Halo (gfx1151).
    
    ## Environment
    
    - **Hardware**: AMD Strix Halo (gfx1151)
    - **ROCm**: 6.4.2+
    - **PyTorch**: Community build for gfx1151
    - **Python**: 3.12
    
    ## Setup
    
    ```bash
    # Activate environment
    conda activate {project_name}
    
    # Verify GPU
    python scripts/test_gpu_simple.py
    
    # Test memory capacity
    python scripts/test_memory.py
    

    Hardware Capabilities

    • Compute: ~7 TFLOPS FP32, ~31 TFLOPS BF16 (with ROCm 7.x)
    • Memory: Up to 113GB GPU-accessible (with GTT configuration)
    • Model Capacity: 30B parameter models in FP16

    Best Practices

    1. Use BF16 for 1.6x speedup over FP32
    2. Keep batch size small (1-4) for inference
    3. Data in VRAM is faster than GTT memory
    4. Monitor memory: rocm-smi --showmeminfo gtt

    Troubleshooting

    If compute fails with "HIP error: invalid device function":

    • You're using official PyTorch wheels (don't work with gfx1151)
    • Reinstall: pip install --index-url https://rocm.nightlies.amd.com/v2/gfx1151/ --pre torch

    For more help, see .claude/skills/strix-halo-setup/docs/TROUBLESHOOTING.md

    Created: {date} EOF

    
    ### Step 9: Verify Installation
    
    Reactivate the environment to load variables:
    
    ```bash
    conda deactivate
    conda activate {project_name}
    
    # Should see: "✓ Strix Halo environment variables set"
    

    Run verification:

    python scripts/test_gpu_simple.py
    

    Expected output:

    ============================================================
    STRIX HALO GPU TEST
    ============================================================
    ✓ GPU detected: AMD Radeon Graphics
      Memory: 113.2 GB
      Compute test successful
    ✓ ALL TESTS PASSED
    ============================================================
    

    Step 10: Final Summary

    Tell the user:

    ✓ Setup complete! Your Strix Halo environment is ready.
    
    Project: {project_name}
    Location: {full_path}
    
    Next steps:
      1. Test GPU: python scripts/test_gpu_simple.py
      2. Test memory: python scripts/test_memory.py
    
    Hardware capabilities:
      - 7 TFLOPS FP32 / 31 TFLOPS BF16 (ROCm 7.x)
      - 113 GB GPU-accessible memory
      - Can run 30B parameter models in FP16
    
    Activate anytime with: conda activate {project_name}
    

    Success Criteria

    All of these should pass:

    • ✓ PyTorch detects GPU
    • ✓ Compute operations succeed (no HIP errors)
    • ✓ Can allocate 30GB+ memory
    • ✓ BF16 operations work

    Common Issues

    Issue: "HIP error: invalid device function"

    Cause: Using official PyTorch wheels (don't work with gfx1151)

    Solution:

    pip uninstall torch torchvision torchaudio
    pip install --index-url https://rocm.nightlies.amd.com/v2/gfx1151/ --pre torch torchvision torchaudio
    

    Verify it worked:

    python -c "import torch; a=torch.tensor([1.0]).cuda(); print('✓ Works:', (a+1).item())"
    

    Issue: Out of memory below 30GB

    Cause: GTT not configured (limited to ~33GB)

    Solution 1: Upgrade to kernel 6.16.9+ (no configuration needed)

    Solution 2: For older kernels, configure GTT:

    .claude/skills/strix-halo-setup/scripts/configure_gtt.sh
    

    This adds kernel parameters to GRUB for GPU to access more system RAM.

    Issue: GPU not detected

    Cause: User not in render/video groups

    Solution:

    sudo usermod -aG render,video $USER
    # Log out and back in (or reboot)
    groups | grep -E "render|video"  # Verify
    

    Vulkan Setup (Alternative to PyTorch)

    If the user chose Vulkan for inference-only workloads:

    Step V1: Install Vulkan Drivers

    sudo apt install mesa-vulkan-drivers vulkan-tools
    

    Step V2: Verify Vulkan

    vulkaninfo | grep "deviceName"
    # Should show: AMD Radeon Graphics or similar
    

    Step V3: Install Inference Tools

    For llama.cpp:

    git clone https://github.com/ggerganov/llama.cpp
    cd llama.cpp
    make LLAMA_VULKAN=1
    

    For Ollama:

    curl -fsSL https://ollama.com/install.sh | sh
    

    Step V4: Test Vulkan

    # With llama.cpp
    ./llama-cli -m /path/to/model.gguf -ngl 99 --gpu-backend vulkan
    
    # With Ollama
    ollama run llama2
    

    Vulkan Summary

    Tell the user:

    ✓ Vulkan setup complete!
    
    Backend: Vulkan (inference only)
    Use with: llama.cpp, Ollama, other Vulkan-enabled tools
    
    Note: Vulkan often provides better performance for inference than ROCm/HIP.
    For training or custom PyTorch code, set up PyTorch instead.
    

    References

    • Troubleshooting: .claude/skills/strix-halo-setup/docs/TROUBLESHOOTING.md
    • GTT Configuration: .claude/skills/strix-halo-setup/docs/GTT_MEMORY_FIX.md
    • Community PyTorch: https://github.com/ROCm/TheRock/releases

    Notes

    • GTT configuration needed for 30B+ models on kernels before 6.16.9 (kernel 6.16.9+ has automatic UMA support)
    • Vulkan backend often provides better performance for inference
    • Use BF16 precision in PyTorch for better performance

    ROCm 7.x Considerations

    Benefits of ROCm 7.2+:

    • Up to 5x performance improvement in image generation (ComfyUI, Flux, SDXL)
    • ~2.5x improvement in BF16 compute (31 TFLOPS vs 12 TFLOPS on ROCm 6.x)
    • Unified Windows/Linux release
    • Better long-term support path
    • JAX 0.8.0 support

    Known Limitations on gfx1151 (as of ROCm 7.2):

    • hipBLASLt: Falls back to hipBLAS (slower) due to unsupported architecture
    • LLM Decode: Memory-copy bound (~92-95% time in hipMemcpyWithStream), limiting token throughput
    • Flash Attention: May require TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 and still has issues

    Deprecations in ROCm 7.2:

    • HIPCC is deprecated; AMD Clang should be used directly for compilation
    • ROCTracer, ROCProfiler, rocprof, rocprofv2 are deprecated; use ROCprofiler-SDK and AMD SMI instead

    Recommended Environment Variables for ROCm 7.x:

    export HSA_ENABLE_SDMA=0  # Prevents artifacts in VAE decodes
    export PYTORCH_HIP_ALLOC_CONF="backend:native,expandable_segments:True,garbage_collection_threshold:0.9"
    
    Repository
    ianbarber/strix-halo-skills
    Files