Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Give agents more agency

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    chrisvoncsefalvay

    funsloth-local

    chrisvoncsefalvay/funsloth-local
    AI & ML
    4

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Training manager for local GPU training - validate CUDA, manage GPU selection, monitor progress, handle checkpoints

    SKILL.md

    Local GPU Training Manager

    Run Unsloth training on your local GPU.

    Prerequisites Check

    1. Verify CUDA

    import torch
    print(f"CUDA available: {torch.cuda.is_available()}")
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
    

    If CUDA not available:

    • Check NVIDIA drivers: nvidia-smi
    • Check CUDA: nvcc --version
    • Reinstall PyTorch: pip install torch --index-url https://download.pytorch.org/whl/cu121

    2. Check VRAM

    See references/HARDWARE_GUIDE.md for requirements:

    VRAM Recommended Setup
    8GB 7B, 4-bit, batch=1, LoRA r=8
    12GB 7B, 4-bit, batch=2, LoRA r=16
    16GB 7-13B, 4-bit, batch=2, LoRA r=16-32
    24GB 7-14B, 4-bit, batch=4, LoRA r=32

    3. Check Dependencies

    pip install unsloth torch transformers trl peft datasets accelerate bitsandbytes
    

    Docker Option

    Use the official Unsloth Docker image for a pre-configured environment (supports all GPUs including Blackwell/50-series):

    docker run -d \
      -e JUPYTER_PASSWORD="unsloth" \
      -p 8888:8888 \
      -v $(pwd)/work:/workspace/work \
      --gpus all \
      unsloth/unsloth
    

    Access Jupyter at http://localhost:8888. Example notebooks are in /workspace/unsloth-notebooks/.

    Environment variables:

    • JUPYTER_PASSWORD - Jupyter auth (default: unsloth)
    • JUPYTER_PORT - Port (default: 8888)
    • USER_PASSWORD - User/sudo password (default: unsloth)

    Run Training

    Option 1: Notebook

    jupyter notebook notebooks/sft_template.ipynb
    

    Option 2: Script

    # Edit configuration in script, then run
    python scripts/train_sft.py
    

    GPU Selection (Multi-GPU)

    import os
    os.environ["CUDA_VISIBLE_DEVICES"] = "0"  # Use first GPU
    

    Monitor Training

    Terminal

    # Watch GPU usage
    watch -n 1 nvidia-smi
    
    # Or use nvitop (more detailed)
    pip install nvitop && nvitop
    

    WandB (Optional)

    export WANDB_API_KEY="your-key"
    # Add report_to="wandb" in TrainingArguments
    

    Troubleshooting

    OOM Error

    Try in order:

    1. Reduce batch_size (to 1)
    2. Increase gradient_accumulation
    3. Reduce max_seq_length
    4. Reduce LoRA rank
    5. torch.cuda.empty_cache()

    Loss Not Decreasing

    1. Check learning rate (try higher or lower)
    2. Verify chat template matches model
    3. Inspect data format

    Training Too Slow

    1. Enable bf16 if supported
    2. Use packing=True for short sequences
    3. Reduce logging_steps

    See references/TROUBLESHOOTING.md for more solutions.

    Resume from Checkpoint

    TrainingArguments(
        resume_from_checkpoint=True,  # Auto-find latest
        # Or: resume_from_checkpoint="outputs/checkpoint-500"
    )
    

    Save Model

    Training script automatically saves:

    • outputs/lora_adapter/ - LoRA weights
    • outputs/merged_16bit/ - Merged model (optional)

    Test Inference

    from unsloth import FastLanguageModel
    
    model, tokenizer = FastLanguageModel.from_pretrained("outputs/lora_adapter")
    FastLanguageModel.for_inference(model)
    
    messages = [{"role": "user", "content": "Hello!"}]
    inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
    outputs = model.generate(inputs, max_new_tokens=100)
    print(tokenizer.decode(outputs[0]))
    

    Handoff

    Offer funsloth-upload for Hub upload with model card.

    Tips

    1. Close other GPU apps before training
    2. Monitor temps - keep under 85C
    3. Use UPS for long runs
    4. Save frequently with save_steps

    Bundled Resources

    • notebooks/sft_template.ipynb - Notebook template
    • scripts/train_sft.py - Script template
    • references/HARDWARE_GUIDE.md - VRAM requirements
    • references/TROUBLESHOOTING.md - Common issues
    Recommended Servers
    Local Model Suitability MCP
    Local Model Suitability MCP
    fillin
    fillin
    vastlint - IAB XML VAST validator and linter
    vastlint - IAB XML VAST validator and linter
    Repository
    chrisvoncsefalvay/funsloth
    Files