Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    openai

    transcribe

    openai/transcribe
    AI & ML
    7,309
    126 installs

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Transcribe audio files to text with optional diarization and known-speaker hints...

    SKILL.md

    Audio Transcribe

    Transcribe audio using OpenAI, with optional speaker diarization when requested. Prefer the bundled CLI for deterministic, repeatable runs.

    Workflow

    1. Collect inputs: audio file path(s), desired response format (text/json/diarized_json), optional language hint, and any known speaker references.
    2. Verify OPENAI_API_KEY is set. If missing, ask the user to set it locally (do not ask them to paste the key).
    3. Run the bundled transcribe_diarize.py CLI with sensible defaults (fast text transcription).
    4. Validate the output: transcription quality, speaker labels, and segment boundaries; iterate with a single targeted change if needed.
    5. Save outputs under output/transcribe/ when working in this repo.

    Decision rules

    • Default to gpt-4o-mini-transcribe with --response-format text for fast transcription.
    • If the user wants speaker labels or diarization, use --model gpt-4o-transcribe-diarize --response-format diarized_json.
    • If audio is longer than ~30 seconds, keep --chunking-strategy auto.
    • Prompting is not supported for gpt-4o-transcribe-diarize.

    Output conventions

    • Use output/transcribe/<job-id>/ for evaluation runs.
    • Use --out-dir for multiple files to avoid overwriting.

    Dependencies (install if missing)

    Prefer uv for dependency management.

    uv pip install openai
    

    If uv is unavailable:

    python3 -m pip install openai
    

    Environment

    • OPENAI_API_KEY must be set for live API calls.
    • If the key is missing, instruct the user to create one in the OpenAI platform UI and export it in their shell.
    • Never ask the user to paste the full key in chat.

    Skill path (set once)

    export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
    export TRANSCRIBE_CLI="$CODEX_HOME/skills/transcribe/scripts/transcribe_diarize.py"
    

    User-scoped skills install under $CODEX_HOME/skills (default: ~/.codex/skills).

    CLI quick start

    Single file (fast text default):

    python3 "$TRANSCRIBE_CLI" \
      path/to/audio.wav \
      --out transcript.txt
    

    Diarization with known speakers (up to 4):

    python3 "$TRANSCRIBE_CLI" \
      meeting.m4a \
      --model gpt-4o-transcribe-diarize \
      --known-speaker "Alice=refs/alice.wav" \
      --known-speaker "Bob=refs/bob.wav" \
      --response-format diarized_json \
      --out-dir output/transcribe/meeting
    

    Plain text output (explicit):

    python3 "$TRANSCRIBE_CLI" \
      interview.mp3 \
      --response-format text \
      --out interview.txt
    

    Reference map

    • references/api.md: supported formats, limits, response formats, and known-speaker notes.
    Recommended Servers
    Audioscrape
    Audioscrape
    LILT
    LILT
    Google Docs
    Google Docs
    Repository
    openai/skills
    Files