ElevenLabs MCP Server

Remote

Scanned

Enable your applications to generate speech, clone voices, and transcribe audio effortlessly. Leverage powerful Text to Speech and audio processing APIs to enhance user interaction and create unique audio experiences. Start transforming your text and audio today with seamless integration into your projects.

Tools

text_to_speech

Convert text to speech with a given voice and save the output audio file to a given directory. Directory is optional, if not provided, the output file will be saved to $HOME/Desktop. Only one of voice_id or voice_name can be provided. If none are provided, the default voice will be used. ⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user. Args: text (str): The text to convert to speech. voice_name (str, optional): The name of the voice to use. stability (float, optional): Stability of the generated audio. Determines how stable the voice is and the randomness between each generation. Lower values introduce broader emotional range for the voice. Higher values can result in a monotonous voice with limited emotion. Range is 0 to 1. similarity_boost (float, optional): Similarity boost of the generated audio. Determines how closely the AI should adhere to the original voice when attempting to replicate it. Range is 0 to 1. style (float, optional): Style of the generated audio. Determines the style exaggeration of the voice. This setting attempts to amplify the style of the original speaker. It does consume additional computational resources and might increase latency if set to anything other than 0. Range is 0 to 1. use_speaker_boost (bool, optional): Use speaker boost of the generated audio. This setting boosts the similarity to the original speaker. Using this setting requires a slightly higher computational load, which in turn increases latency. speed (float, optional): Speed of the generated audio. Controls the speed of the generated speech. Values range from 0.7 to 1.2, with 1.0 being the default speed. Lower values create slower, more deliberate speech while higher values produce faster-paced speech. Extreme values can impact the quality of the generated speech. Range is 0.7 to 1.2. output_directory (str, optional): Directory where files should be saved. Defaults to $HOME/Desktop if not provided. Returns: Text content with the path to the output file and name of the voice used.

speech_to_text

Transcribe speech from an audio file and either save the output text file to a given directory or return the text to the client directly. ⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user. Args: file_path: Path to the audio file to transcribe language_code: ISO 639-3 language code for transcription (default: "eng" for English) diarize: Whether to diarize the audio file. If True, which speaker is currently speaking will be annotated in the transcription. save_transcript_to_file: Whether to save the transcript to a file. return_transcript_to_client_directly: Whether to return the transcript to the client directly. output_directory: Directory where files should be saved. Defaults to $HOME/Desktop if not provided. Returns: TextContent containing the transcription. If save_transcript_to_file is True, the transcription will be saved to a file in the output directory.

text_to_sound_effects

Convert text description of a sound effect to sound effect with a given duration and save the output audio file to a given directory. Directory is optional, if not provided, the output file will be saved to $HOME/Desktop. Duration must be between 0.5 and 5 seconds. ⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user. Args: text: Text description of the sound effect duration_seconds: Duration of the sound effect in seconds output_directory: Directory where files should be saved. Defaults to $HOME/Desktop if not provided.

search_voices

Search for existing voices, a voice that has already been added to the user's ElevenLabs voice library. Searches in name, description, labels and category. Args: search: Search term to filter voices by. Searches in name, description, labels and category. sort: Which field to sort by. `created_at_unix` might not be available for older voices. sort_direction: Sort order, either ascending or descending. Returns: List of voices that match the search criteria.

View 15 more tools

Smithery