Audio and Voice
Text-to-speech with ElevenLabs, OpenAI, and Edge TTS. Transcription with Deepgram Nova-3.
Text-to-Speech
Your agents can convert text into natural-sounding speech using three providers:
| Provider | What it offers |
|---|---|
| ElevenLabs | High-quality multilingual voices with realistic tone and emotion. Default voice: Multilingual V2. |
| OpenAI TTS | Multiple voice options, fast generation. Uses GPT-4o Mini TTS. |
| Edge TTS | Microsoft's neural voices via Edge. Wide language support, no API key needed. |
Use cases: voiceovers for videos, podcast intros, content narration, audio versions of written content, voice messages.
How to Use TTS
Ask your agent:
- "Read this blog post as a voiceover using ElevenLabs"
- "Create an audio intro for our podcast"
- "Generate a voice narration for this video script"
- "Turn this presentation into an audio walkthrough"
You can specify which provider to use, or let the agent pick based on the task.
Transcription
Your agents transcribe audio and video content using Deepgram Nova-3:
- Transcribe uploaded video or audio files
- Pull transcripts from YouTube URLs, podcast links, and recordings
- Extract key points and summaries from transcribed content
This powers workflows like turning a podcast episode into blog posts, converting a meeting recording into action items, or repurposing video content into written social posts.
How to Use Transcription
- "Transcribe this video and summarize the key points"
- "Pull the transcript from this YouTube URL"
- "Turn this podcast episode into 5 LinkedIn posts"
- "Transcribe this meeting recording and extract the action items"
Transcription and voice generation work together. Record a video, transcribe it, turn it into written content, then generate audio versions for different platforms - all in one conversation.

