One-way TTS mode - Claude speaks aloud while user types...
You now have access to text-to-speech via the claude-say MCP server.
| Tool | Description |
|---|---|
speak(text, voice?, speed?) |
Add text to queue, returns immediately (preferred for natural flow) |
speak_and_wait(text, voice?, speed?) |
Speak and block until complete (use when expecting a response) |
stop_speaking() |
Stop immediately and clear the queue |
start_stop_hotkey(key?) |
Enable stop hotkey (default: cmd_r) - press to stop TTS anytime |
stop_stop_hotkey() |
Disable the stop hotkey |
The skill supports three voice communication modes:
When activating /speak mode:
# 1. Enable stop hotkey so user can interrupt anytime with Right Command
start_stop_hotkey()
# 2. Confirm activation (brief!)
speak("Voice mode activated.")
The user can press Right Command at any time to stop speech and clear the queue.
speak("I've successfully modified the file.")
speak("All tests pass. 15 tests executed, no failures.")
speak("The function is in the utils.ts file, line 42.")
speak("To solve this performance issue, we could explore several paths. What if we cached frequent results? We could also consider lazy loading. Another approach would be to parallelize the requests. Which one resonates with you?", speed=1.0)
speak("Interesting challenge. What's the main goal here? Are we optimizing for speed, maintainability, or user experience? That will guide our thinking.", speed=1.0)
speak("Let's start with the basics. The MVC pattern, Model View Controller, is an architecture that separates your application into three distinct layers. Each layer has a unique and well-defined responsibility.", speed=1.0)
speak("The Model is the data layer. It handles business logic, validation rules, and database access. It knows nothing about the user interface.", speed=1.0)
speak("The View is what the user sees. It displays data from the Model and captures interactions. It contains no business logic, just presentation.", speed=1.0)
speak("The Controller bridges the two. It receives user actions, calls the Model to process data, then updates the View with the results.", speed=1.0)
stop_speaking() immediatelyskip()stop_stop_hotkey() then stop using speak)The TTS backend is configured in ~/.mcp-claude-say/.env:
| Backend | Description |
|---|---|
macos |
Native macOS say command (default, instant, offline) |
kokoro |
Kokoro MLX - 54 neural voices, 9 languages, runs locally on Apple Silicon |
google |
Google Cloud TTS - neural voices, requires API key |
Pass voice ID as the voice parameter to use a specific voice:
| Language | Voice Examples |
|---|---|
| American English | af_heart (default), af_nova, am_adam, am_echo |
| British English | bf_emma, bf_alice, bm_george, bm_daniel |
| French | ff_siwis |
| Spanish | ef_dora, em_alex |
| Italian | if_sara, im_nicola |
| Portuguese | pf_dora, pm_alex |
| Japanese | jf_alpha, jm_kumo |
| Chinese | zf_xiaoxiao, zm_yunxi |
| Hindi | hf_alpha, hm_omega |
Example: speak("Bonjour!", voice="ff_siwis") for French.