Set up ML environments with PyTorch and auto-detect hardware. Use this when creating new ML projects, setting up PyTorch, or troubleshooting GPU/environment issues...
This skill helps you create and manage isolated ML environments with PyTorch. It auto-detects your hardware (NVIDIA GPU, AMD GPU, or CPU) and installs the appropriate PyTorch build.
I can help you set up a complete ML project with PyTorch in seconds. Here's what I'll do:
.gitignore for ML filesTo get started, tell me:
When you ask me to set up a new ML project, I will:
# 1. Create the project directory
mkdir -p ~/projects/my-ml-project
# 2. Create .gitignore
# (ignores ml-env/, data/, models/, logs/, etc.)
# 3. Run the setup script
bash ~/.claude/skills/ml-env/scripts/setup-universal.sh
# 4. Show you the results
The setup script will:
ml-env/ directory in your projectOnce created, activating is simple:
cd ~/projects/my-ml-project
source ml-env/bin/activate # Regular environments
# OR
source ml-env/activate-safe.sh # If you use conda (ignores conda settings)
Check it works:
python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CUDA: {torch.cuda.is_available()}')"
render and video groupssudo usermod -aG render,video $USER && newgrp render⚠️ This requires special handling - official PyTorch wheels do NOT work!
https://repo.amd.com/rocm/whl/gfx1151/ - ~31 TFLOPS BF16https://rocm.nightlies.amd.com/v2/gfx1151/ - ~12 TFLOPS BF16render and video groups, Linux kernel 6.14+ (6.16.9+ recommended for automatic UMA/GTT behavior)Reference project: See ~/Projects/amdtest for a working gfx1151 setup example.
See TROUBLESHOOTING.md for complete Strix Halo setup and GTT memory configuration.
If you already have a project and want to verify it works:
cd ~/your-ml-project
bash ~/.claude/skills/ml-env/scripts/validate.sh
This will check:
NVIDIA:
nvidia-smi # Check driver is installed
python -c "import torch; print(torch.cuda.is_available())"
AMD:
rocm-smi # Check ROCm installation
rocminfo | grep gfx # Check GPU architecture
torch.cuda.empty_cache() between batchessource ml-env/bin/activateuv pip install --upgrade torch --index-url https://download.pytorch.org/whl/cu128
See TROUBLESHOOTING.md for detailed Strix Halo troubleshooting, GTT memory setup, and performance optimization.
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = YourModel().to(device)
for batch in dataloader:
x, y = batch
x, y = x.to(device), y.to(device)
loss = model(x, y)
loss.backward()
from torch.cuda.amp import autocast, GradScaler
scaler = GradScaler()
for batch in dataloader:
with autocast():
loss = model(x, y)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
import torch
if torch.cuda.is_available():
print(f"Allocated: {torch.cuda.memory_allocated()/1e9:.2f}GB")
print(f"Reserved: {torch.cuda.memory_reserved()/1e9:.2f}GB")
print(f"Total: {torch.cuda.get_device_properties(0).total_memory/1e9:.2f}GB")
All scripts are in ~/.claude/skills/ml-env/scripts/:
Use me when you:
Ask me anything about: