VibeyDocs

Audio and Voice

Text-to-speech with ElevenLabs, OpenAI, and Edge TTS. Transcription with Deepgram Nova-3.

Text-to-Speech

Your agents can convert text into natural-sounding speech using three providers:

ProviderWhat it offers
ElevenLabsHigh-quality multilingual voices with realistic tone and emotion. Default voice: Multilingual V2.
OpenAI TTSMultiple voice options, fast generation. Uses GPT-4o Mini TTS.
Edge TTSMicrosoft's neural voices via Edge. Wide language support, no API key needed.

Use cases: voiceovers for videos, podcast intros, content narration, audio versions of written content, voice messages.

How to Use TTS

Ask your agent:

  • "Read this blog post as a voiceover using ElevenLabs"
  • "Create an audio intro for our podcast"
  • "Generate a voice narration for this video script"
  • "Turn this presentation into an audio walkthrough"

You can specify which provider to use, or let the agent pick based on the task.

Transcription

Your agents transcribe audio and video content using Deepgram Nova-3:

  • Transcribe uploaded video or audio files
  • Pull transcripts from YouTube URLs, podcast links, and recordings
  • Extract key points and summaries from transcribed content

This powers workflows like turning a podcast episode into blog posts, converting a meeting recording into action items, or repurposing video content into written social posts.

How to Use Transcription

  • "Transcribe this video and summarize the key points"
  • "Pull the transcript from this YouTube URL"
  • "Turn this podcast episode into 5 LinkedIn posts"
  • "Transcribe this meeting recording and extract the action items"

Transcription and voice generation work together. Record a video, transcribe it, turn it into written content, then generate audio versions for different platforms - all in one conversation.