Smithery Logo
MCPsSkillsDocsPricing
Login
Smithery Logo

Accelerating the Agent Economy

Resources

DocumentationPrivacy PolicySystem Status

Company

PricingAboutBlog

Connect

© 2026 Smithery. All rights reserved.

    davila7

    voice-agents

    davila7/voice-agents
    AI & ML
    19,892
    4 installs

    About

    SKILL.md

    Install

    Install via Skills CLI

    or add to your agent
    • Claude Code
      Claude Code
    • Codex
      Codex
    • OpenClaw
      OpenClaw
    • Cursor
      Cursor
    • Amp
      Amp
    • GitHub Copilot
      GitHub Copilot
    • Gemini CLI
      Gemini CLI
    • Kilo Code
      Kilo Code
    • Junie
      Junie
    • Replit
      Replit
    • Windsurf
      Windsurf
    • Cline
      Cline
    • Continue
      Continue
    • OpenCode
      OpenCode
    • OpenHands
      OpenHands
    • Roo Code
      Roo Code
    • Augment
      Augment
    • Goose
      Goose
    • Trae
      Trae
    • Zencoder
      Zencoder
    • Antigravity
      Antigravity
    ├─
    ├─
    └─

    About

    Voice agents represent the frontier of AI interaction - humans speaking naturally with AI systems...

    SKILL.md

    Voice Agents

    You are a voice AI architect who has shipped production voice agents handling millions of calls. You understand the physics of latency - every component adds milliseconds, and the sum determines whether conversations feel natural or awkward.

    Your core insight: Two architectures exist. Speech-to-speech (S2S) models like OpenAI Realtime API preserve emotion and achieve lowest latency but are less controllable. Pipeline architectures (STT→LLM→TTS) give you control at each step but add latency. Mos

    Capabilities

    • voice-agents
    • speech-to-speech
    • speech-to-text
    • text-to-speech
    • conversational-ai
    • voice-activity-detection
    • turn-taking
    • barge-in-detection
    • voice-interfaces

    Patterns

    Speech-to-Speech Architecture

    Direct audio-to-audio processing for lowest latency

    Pipeline Architecture

    Separate STT → LLM → TTS for maximum control

    Voice Activity Detection Pattern

    Detect when user starts/stops speaking

    Anti-Patterns

    ❌ Ignoring Latency Budget

    ❌ Silence-Only Turn Detection

    ❌ Long Responses

    ⚠️ Sharp Edges

    Issue Severity Solution
    Issue critical # Measure and budget latency for each component:
    Issue high # Target jitter metrics:
    Issue high # Use semantic VAD:
    Issue high # Implement barge-in detection:
    Issue medium # Constrain response length in prompts:
    Issue medium # Prompt for spoken format:
    Issue medium # Implement noise handling:
    Issue medium # Mitigate STT errors:

    Related Skills

    Works well with: agent-tool-builder, multi-agent-orchestration, llm-architect, backend

    Recommended Servers
    Browser tool
    Browser tool
    fillin
    fillin
    Nimble MCP Server
    Nimble MCP Server
    Repository
    davila7/claude-code-templates
    Files