Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    pytorch

    triaging-issues

    pytorch/triaging-issues
    Productivity
    97,267
    22 installs

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Triages GitHub issues by routing to oncall teams, applying labels, and closing questions. Use when processing new PyTorch issues or when asked to triage an issue.

    SKILL.md

    PyTorch Issue Triage Skill

    This skill helps triage GitHub issues by routing issues, applying labels, and leaving first-line responses.

    Contents

    • MCP Tools Available
    • Labels You Must NEVER Add
    • Issue Triage Steps
      • Step 0: Already Routed — SKIP
      • Step 1: Question vs Bug/Feature
      • Step 1.5: Needs Reproduction — External Files
      • Step 2: Transfer
      • Step 2.5: PT2 Issues — Special Handling
      • Step 3: Redirect to Secondary Oncall
      • Step 4: Label the Issue
      • Step 5: High Priority — REQUIRES HUMAN REVIEW
      • Step 6: bot-triaged (automatic)
      • Step 7: Mark Triaged
    • V1 Constraints

    Labels reference: See labels.json for the full catalog of labels suitable for triage. ONLY apply labels that exist in this file. Do not invent or guess label names. This file excludes CI triggers, test configs, release notes, deprecated labels, and labels requiring human decision.

    PT2 triage guide: See pt2-triage-rubric.md for detailed labeling guidance when triaging PT2/torch.compile issues.

    Response templates: See templates.json for standard response messages.


    MCP Tools Available

    Use these GitHub MCP tools for triage:

    Tool Purpose
    mcp__github__issue_read Get issue details, comments, and existing labels
    mcp__github__issue_write Apply labels or close issues
    mcp__github__add_issue_comment Add comment (only for redirecting questions)
    mcp__github__search_issues Find similar issues for context

    Labels You Must NEVER Add

    Prefix/Category Reason
    Labels not in labels.json Only apply labels that exist in the allowlist
    ciflow/* CI job triggers for PRs only
    test-config/* Test suite selectors for PRs only
    release notes: * Auto-assigned for release notes
    ci-*, ci:* CI infrastructure controls
    sev* Severity labels require human decision
    merge blocking Requires human decision
    actionable Requires human decision
    Any label containing "deprecated" Obsolete
    oncall: releng Not a triage redirect target. Use module: ci instead

    If blocked: When a label is blocked by the hook, add ONLY triage review and stop. A human will handle it.

    These rules are enforced by a PreToolUse hook that validates all labels against labels.json.

    Never Override Human Labels

    If a human has already applied labels (especially ci: sev, severity labels, or priority labels), do NOT remove or replace them. Your job is to supplement, not override.


    Issue Triage (for each issue)

    0) Already Routed — SKIP

    If an issue already has ANY oncall: label, SKIP IT entirely. Do not:

    • Add any labels
    • Add triaged
    • Leave comments
    • Do any triage work

    That issue belongs to the sub-oncall team. They own their queue.

    1) Question vs Bug/Feature

    • If it is a question (not a bug report or feature request): close and use the redirect_to_forum template from templates.json.
    • If unclear whether it is a bug/feature vs a question: request additional information using the request_more_info template and stop.

    1.5) Needs Reproduction — External Files

    Check if the issue body contains links to external files that users would need to download to reproduce.

    Patterns to detect:

    • File attachments: .zip, .pt, .pth, .pkl, .safetensors, .onnx, .bin files
    • External storage: Google Drive, Dropbox, OneDrive, Mega, WeTransfer links
    • Model hubs: Hugging Face Hub links to model files

    Action:

    1. Edit the issue body to remove/redact the download links
      • Replace with: [Link removed - external file downloads are not permitted for security reasons]
    2. Add needs reproduction label
    3. Use the needs_reproduction template from templates.json to request a self-contained reproduction
    4. Do NOT add triaged — wait for the user to provide a reproducible example

    1.55) Needs Reproduction — Other Cases

    Also add needs reproduction when:

    • The user reports a hardware-specific issue (e.g., specific GPU model) without a self-contained repro script
    • The user references a specific model/checkpoint/dataset that is not publicly runnable in a few lines
    • The issue describes version-upgrade breakage but only provides a high-level description without a minimal script
    • The repro depends on a specific training setup, distributed environment, or non-trivial infrastructure

    1.6) Edge Cases & Numerical Accuracy

    If the issue involves extremal values or numerical precision differences:

    Patterns to detect:

    • Values near torch.finfo(dtype).max or torch.finfo(dtype).min
    • NaN/Inf appearing in outputs from valid (but extreme) inputs
    • Differences between CPU and GPU results
    • Precision differences between dtypes (e.g., fp32 vs fp16)
    • Fuzzer-generated edge cases

    IMPORTANT — avoid keyword-triggered mislabeling:

    Label based on the root cause, not keywords that appear in the error or title. A keyword tells you what failed, not why.

    • An undefined symbol: ncclAlltoAll error at import torch is a packaging issue (module: binaries), not a distributed training bug — the user never ran distributed code.
    • A nan in a parameter name or tolerance check is not module: NaNs and Infs unless the bug is actually about NaN propagation.
    • A stack trace mentioning autograd does not mean module: autograd — check whether the bug is in autograd itself or just on the call path.
    • A test failure with tolerance thresholds is module: tests, not module: numerical-stability.

    Ask: "Where would the fix need to be made?" That determines the label.

    Action:

    1. Add module: edge cases label
    2. If from a fuzzer, also add topic: fuzzer
    3. Use the numerical_accuracy template from templates.json to link to the docs
    4. If the issue is clearly expected behavior per the docs, close it with the template comment

    2) Transfer (domain library or ExecuTorch)

    If the issue belongs in another repo (vision/text/audio/RL/ExecuTorch/etc.), transfer the issue and STOP.

    2.5) PT2 Issues — Special Handling

    PT2 is NOT a redirect. oncall: pt2 is not like the other oncall labels in Step 3. PT2 issues continue through Steps 4–7 for full triage — add oncall: pt2, then proceed to label with module: labels, mark triaged, etc.

    Every oncall: pt2 issue MUST have at least one module: label. The PT2 oncall queue is too broad without a module label — the team needs to know which component is affected (e.g., module: dynamo, module: inductor, module: helion, module: dynamic shapes). If you cannot determine the specific module, use module: compile ux as a fallback, but always try to be specific first. See pt2-triage-rubric.md for detailed guidance.

    3) Redirect to Secondary Oncall

    CRITICAL: When redirecting issues to a non-PT2 oncall queue, apply exactly one oncall: ... label and STOP. Do NOT:

    • Add any module: labels
    • Mark it triaged
    • Do any further triage work

    The sub-oncall team will handle their own triage. Your job is only to route it to them.

    Oncall Redirect Labels

    Label When to use
    oncall: jit TorchScript issues
    oncall: distributed Distributed training (DDP, FSDP, RPC, c10d, DTensor, DeviceMesh, symmetric memory, context parallel, pipelining). Special handling: after applying this label, invoke the distributed triage sub-skill (/distributed-triage on this issue) for second-level triage — it will route to a sub-oncall, add module labels, and mark triaged.
    oncall: export torch.export issues
    oncall: quantization Quantization issues
    oncall: mobile Mobile (iOS/Android), excludes ExecuTorch
    oncall: profiler Profiler issues (CPU, GPU, Kineto)
    oncall: visualization TensorBoard integration

    Common routing mistakes to avoid:

    • MPS ≠ Mobile. MPS (Metal Performance Shaders) is the macOS/Apple Silicon GPU backend. Do NOT route MPS issues to oncall: mobile. MPS issues stay in the general queue with module: mps.
    • DTensor → oncall: distributed. DTensor issues should always be routed to oncall: distributed, even if they don't mention DDP/FSDP.
    • ONNX → module: onnx. There is no oncall: onnx. Use module: onnx and keep in the general queue.
    • CI/releng → module: ci. Do not use oncall: releng. Use module: ci for CI infrastructure issues.
    • torch.compile + distributed. When torch.compile mishandles a distributed op (e.g., dist.all_reduce), the issue typically needs BOTH oncall: pt2 and oncall: distributed since the fix may span both codebases.

    Note: oncall: cpu inductor is a sub-queue of PT2. For general triage, just use oncall: pt2.

    4) Label the issue (if NOT transferred/redirected)

    Only if the issue stays in the general queue:

    • Add 1+ module: ... labels based on the affected area
    • Prefer specific labels over general ones when both exist. Check labels.json descriptions for guidance on when a specific label supersedes a general one (e.g., module: sdpa instead of module: nn for SDPA issues, module: flex attention instead of module: nn for flex attention).
    • feature — wholly new functionality that does not exist today in any form
    • enhancement — improvement to something that already works (e.g., adding a native backend kernel for an op that already runs via fallback/composite, performance optimization, better error messages). If the enhancement is about performance, also add module: performance.
    • function request — a new function or new arguments/modes for an existing function
    • If the issue says the operation "currently works" or "falls back to" a slower path, that is enhancement, not feature

    Commonly missed labels — always check for these:

    Condition Label
    Segfault, illegal memory access, SIGSEGV module: crash
    Performance issue: regression, slowdown, or optimization request module: performance
    Issue on Windows module: windows
    Previously working feature now broken module: regression
    Broken docs/links that previously worked module: docs + module: regression (NOT enhancement)
    Issue about a test failing (not the underlying functionality) module: tests
    Backward pass / gradient computation bug module: autograd (in addition to the op's module label)
    torch.linalg ops or linear algebra ops (solve, svd, eig, inv, etc.) module: linear algebra
    has workaround Only add when the workaround is non-trivial and non-obvious. If the issue is "X doesn't work for non-contiguous tensors," calling .contiguous() is the tautological inverse of the bug, not a workaround. A real workaround is something like installing a specific package version, adding a synchronization point, inserting gc.collect(), or using a different API that isn't obviously implied by the bug description.

    Label based on the actual bug, not keywords. Read the issue to understand what is actually broken. A bug about broadcasting that happens to mention "nan" in a parameter name is a frontend bug, not a NaN/Inf bug.

    5) High Priority — REQUIRES HUMAN REVIEW

    CRITICAL: If you believe an issue is high priority, you MUST:

    1. Add triage review label and do not add triaged

    Do NOT directly add high priority without human confirmation.

    High priority criteria:

    • Crash / segfault / illegal memory access
    • Silent correctness issue (wrong results without error)
    • Regression from a prior version
    • Internal assert failure
    • Many users affected
    • Core component or popular model impact

    6) bot-triaged (automatic)

    The bot-triaged label is automatically applied by a post-hook after any issue mutation. You do not need to add it manually.

    7) Mark triaged

    If not transferred/redirected and not flagged for review, add triaged.


    V1 Constraints

    DO NOT:

    • Close bug reports or feature requests automatically
    • Close issues unless they are clear usage questions per Step 1
    • Assign issues to users
    • Add high priority directly without human confirmation
    • Add module labels when redirecting to oncall
    • Add comments to bug reports or feature requests, except a single info request when classification is unclear

    DO:

    • Close clear usage questions and point to discuss.pytorch.org (per step 1)
    • Be conservative - when in doubt, add triage review for human attention
    • Apply type labels (feature, enhancement, function request) when confident
    • Add triaged label when classification is complete

    Note: bot-triaged is automatically applied by a post-hook after any issue mutation.

    Recommended Servers
    GitHub
    GitHub
    Linear
    Linear
    Confluence
    Confluence
    Repository
    pytorch/pytorch
    Files