Complete setup for AMD Strix Halo (Ryzen AI MAX+ 395) PyTorch environments.
Set up a new PyTorch project optimized for AMD Strix Halo (Ryzen AI MAX+ 395, gfx1151).
This skill should be invoked when:
PyTorch Installation: Official PyTorch wheels from pytorch.org DO NOT WORK with gfx1151. They detect the GPU but fail on compute with "HIP error: invalid device function". This skill installs community builds that actually work.
ROCm Installation Note: For Strix Halo APUs, ROCm should be installed with --no-dkms flag to use the inbox kernel driver. If you have amdgpu-dkms installed, it may cause issues when upgrading kernels.
Before running setup, verify the system with:
./scripts/verify_system.sh
This checks:
render and video groupsROCm 7.x Note: ROCm 7.2+ offers significant performance improvements (~2.5x in BF16 compute). However, hipBLASLt is not fully optimized for gfx1151 yet and falls back to hipBLAS.
If any checks fail, see .claude/skills/strix-halo-setup/docs/TROUBLESHOOTING.md for detailed fix instructions.
Run the verification script:
cd .claude/skills/strix-halo-setup
./scripts/verify_system.sh
Expected output:
If issues found, follow the script's instructions to fix them.
Ask the user for:
strix-ml-projectUse the AskUserQuestion tool:
Question 1: "What would you like to name your project?" Question 2: "Which backend do you want to set up?"
If PyTorch is chosen, continue with steps below. If Vulkan, skip to Vulkan setup section at the end.
Using Conda (Recommended):
# Create new environment with Python 3.14 (or 3.13)
conda create -n {project_name} python=3.14 -y
conda activate {project_name}
Using uv (Alternative):
# Create new environment with Python 3.14 (or 3.13)
uv venv {project_name} --python 3.14
source {project_name}/bin/activate
CRITICAL: Must use community builds, not official wheels.
Option 1: AMD Nightlies (Recommended)
pip install --index-url https://rocm.nightlies.amd.com/v2/gfx1151/ --pre torch torchvision torchaudio
As of January 2026, this provides PyTorch 2.11.0a0+ with ROCm 7.11.0 support.
Option 2: TheRock Builds (Alternative) Pre-built wheels from the official ROCm TheRock project: https://github.com/ROCm/TheRock/releases
Look for gfx1151 releases and install with pip install <wheel_file>.
Verify Installation:
python -c "import torch; print('PyTorch:', torch.__version__); print('HIP:', torch.version.hip)"
Should show PyTorch 2.11+ and HIP 7.2+ with ROCm 7.x.
Create activation script in the conda environment:
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
cat > $CONDA_PREFIX/etc/conda/activate.d/strix_halo_env.sh << 'EOF'
#!/bin/bash
# Core ROCm settings for Strix Halo (gfx1151)
export HSA_OVERRIDE_GFX_VERSION=11.5.1
export PYTORCH_ROCM_ARCH=gfx1151
# Unified Memory Configuration - CRITICAL for accessing full memory
export HSA_XNACK=1
export HSA_FORCE_FINE_GRAIN_PCIE=1
# Memory allocation settings
export GPU_MAX_HEAP_SIZE=100
export GPU_MAX_ALLOC_PERCENT=100
# Device visibility
export ROCR_VISIBLE_DEVICES=0
export HIP_VISIBLE_DEVICES=0
# Performance optimizations
export ROCBLAS_USE_HIPBLASLT=1
export AMD_LOG_LEVEL=0
export HSA_CU_MASK=0xffffffffffffffff
# ROCm 7.x stability fixes for APUs
export HSA_ENABLE_SDMA=0 # Prevents checkerboard artifacts in VAE decodes
# PyTorch memory management (recommended for 32GB+ workloads)
export PYTORCH_HIP_ALLOC_CONF="backend:native,expandable_segments:True,garbage_collection_threshold:0.9"
echo "✓ Strix Halo environment variables set"
EOF
chmod +x $CONDA_PREFIX/etc/conda/activate.d/strix_halo_env.sh
Create deactivation script:
mkdir -p $CONDA_PREFIX/etc/conda/deactivate.d
cat > $CONDA_PREFIX/etc/conda/deactivate.d/strix_halo_env.sh << 'EOF'
#!/bin/bash
unset HSA_OVERRIDE_GFX_VERSION PYTORCH_ROCM_ARCH HSA_XNACK HSA_FORCE_FINE_GRAIN_PCIE
unset GPU_MAX_HEAP_SIZE GPU_MAX_ALLOC_PERCENT ROCR_VISIBLE_DEVICES HIP_VISIBLE_DEVICES
unset ROCBLAS_USE_HIPBLASLT AMD_LOG_LEVEL HSA_CU_MASK HSA_ENABLE_SDMA PYTORCH_HIP_ALLOC_CONF
EOF
chmod +x $CONDA_PREFIX/etc/conda/deactivate.d/strix_halo_env.sh
mkdir -p {project_name}/{scripts,notebooks,data,models,tests}
cd {project_name}
Copy the test scripts from the skill directory:
cp .claude/skills/strix-halo-setup/scripts/*.py scripts/
chmod +x scripts/*.py
Create a README with project-specific information:
cat > README.md << 'EOF'
# {Project Name}
PyTorch project optimized for AMD Strix Halo (gfx1151).
## Environment
- **Hardware**: AMD Strix Halo (gfx1151)
- **ROCm**: 6.4.2+
- **PyTorch**: Community build for gfx1151
- **Python**: 3.12
## Setup
```bash
# Activate environment
conda activate {project_name}
# Verify GPU
python scripts/test_gpu_simple.py
# Test memory capacity
python scripts/test_memory.py
rocm-smi --showmeminfo gttIf compute fails with "HIP error: invalid device function":
pip install --index-url https://rocm.nightlies.amd.com/v2/gfx1151/ --pre torchFor more help, see .claude/skills/strix-halo-setup/docs/TROUBLESHOOTING.md
Created: {date} EOF
### Step 9: Verify Installation
Reactivate the environment to load variables:
```bash
conda deactivate
conda activate {project_name}
# Should see: "✓ Strix Halo environment variables set"
Run verification:
python scripts/test_gpu_simple.py
Expected output:
============================================================
STRIX HALO GPU TEST
============================================================
✓ GPU detected: AMD Radeon Graphics
Memory: 113.2 GB
Compute test successful
✓ ALL TESTS PASSED
============================================================
Tell the user:
✓ Setup complete! Your Strix Halo environment is ready.
Project: {project_name}
Location: {full_path}
Next steps:
1. Test GPU: python scripts/test_gpu_simple.py
2. Test memory: python scripts/test_memory.py
Hardware capabilities:
- 7 TFLOPS FP32 / 31 TFLOPS BF16 (ROCm 7.x)
- 113 GB GPU-accessible memory
- Can run 30B parameter models in FP16
Activate anytime with: conda activate {project_name}
All of these should pass:
Cause: Using official PyTorch wheels (don't work with gfx1151)
Solution:
pip uninstall torch torchvision torchaudio
pip install --index-url https://rocm.nightlies.amd.com/v2/gfx1151/ --pre torch torchvision torchaudio
Verify it worked:
python -c "import torch; a=torch.tensor([1.0]).cuda(); print('✓ Works:', (a+1).item())"
Cause: GTT not configured (limited to ~33GB)
Solution 1: Upgrade to kernel 6.16.9+ (no configuration needed)
Solution 2: For older kernels, configure GTT:
.claude/skills/strix-halo-setup/scripts/configure_gtt.sh
This adds kernel parameters to GRUB for GPU to access more system RAM.
Cause: User not in render/video groups
Solution:
sudo usermod -aG render,video $USER
# Log out and back in (or reboot)
groups | grep -E "render|video" # Verify
If the user chose Vulkan for inference-only workloads:
sudo apt install mesa-vulkan-drivers vulkan-tools
vulkaninfo | grep "deviceName"
# Should show: AMD Radeon Graphics or similar
For llama.cpp:
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make LLAMA_VULKAN=1
For Ollama:
curl -fsSL https://ollama.com/install.sh | sh
# With llama.cpp
./llama-cli -m /path/to/model.gguf -ngl 99 --gpu-backend vulkan
# With Ollama
ollama run llama2
Tell the user:
✓ Vulkan setup complete!
Backend: Vulkan (inference only)
Use with: llama.cpp, Ollama, other Vulkan-enabled tools
Note: Vulkan often provides better performance for inference than ROCm/HIP.
For training or custom PyTorch code, set up PyTorch instead.
.claude/skills/strix-halo-setup/docs/TROUBLESHOOTING.md.claude/skills/strix-halo-setup/docs/GTT_MEMORY_FIX.mdBenefits of ROCm 7.2+:
Known Limitations on gfx1151 (as of ROCm 7.2):
TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 and still has issuesDeprecations in ROCm 7.2:
Recommended Environment Variables for ROCm 7.x:
export HSA_ENABLE_SDMA=0 # Prevents artifacts in VAE decodes
export PYTORCH_HIP_ALLOC_CONF="backend:native,expandable_segments:True,garbage_collection_threshold:0.9"