ElevenLabs Unveils Eleven v3 (Alpha) Text to Speech Model

Eleven v3 brings expressive control to voice generation—enabling true performances instead of simple readings.

Published on:

07 Jun 2025, 5:58 am

3 min read

Built on a new architecture, Eleven v3 brings unprecedented realism and control to speech generation. It can shift tone mid-sentence, moves seamlessly between characters, and responds to cues like [whispers], [laughs], and [sighs]—all without breaking flow.

This is an alpha release. It requires more prompt engineering than previous models, but the output offers a step-change in expressiveness, nuance, and human realism. We have increased our language support from 33 to 70+ languages. With further fine-tuning we will increase the reliability and controllability.

Listen to samples made with Eleven v3 here:

70+ languages: Expanding from 33 to over 70 languages—growing our coverage from 60% to 90% of the world’s population.
Dialogue Mode: Handles natural interruptions, tone shifts, and emotional flow across multiple speakers.
Audio tags: Guide delivery with tags like [whispers], [angry], [laughs], or [door creaks]—controlling performance in fine detail.
Available now: The Eleven v3 (alpha) is available at https://elevenlabs.io
Streaming support coming soon: Coming soon for call centers and real-time conversational agents.
Public API for Eleven v3 (alpha) coming soon. For early access, please contact sales.

v3 is designed for creators, developers, and enterprises creating expressive content—stories, audiobooks, character dialogue, and interactive media that demand expressive speech. The model rewards experimentation and context-aware prompting.

For real-time, low-latency use cases like Conversational AI, we recommend continuing with our v2.5 Turbo and Flash models for now. A real-time version of v3 is in development.

Eleven v3 brings expressive control to voice generation—enabling true performances instead of simple readings. It can shift emotion, modulate delivery, and move fluidly between characters in a single generation. For the first time, AI speech can follow the rhythm and emotional nuance of human conversation—across more than 70 languages.

“Eleven v3 is the most expressive Text to Speech model ever - offering full control over emotions, delivery, and nonverbal cues. With audio tags, you can prompt it to whisper, laugh, change accents, or even sing. You can control the pacing, emotion, and style to match any script. And with our global mission, we are happy to extend the model with support for over 70 languages. This release is the result of the vision and leadership of my co-founder Piotr, and the incredible research team he’s built. Creating a good product is hard - creating an entirely new paradigm is almost impossible. I, and all of us at ElevenLabs, feel lucky to witness the magic this team brings to life - and with this release, we're excited to push the frontier once again.”— Mati Staniszewski, Co-Founder & CEO, ElevenLabs

Eleven v3 is available now at elevenlabs.io/v3 and is 80% off in the UI for the month of June. Explore the new capabilities and listen to samples here:

^{𝐒𝐭𝐚𝐲 𝐢𝐧𝐟𝐨𝐫𝐦𝐞𝐝 𝐰𝐢𝐭𝐡 𝐨𝐮𝐫 𝐥𝐚𝐭𝐞𝐬𝐭 𝐮𝐩𝐝𝐚𝐭𝐞𝐬 𝐛𝐲 𝐣𝐨𝐢𝐧𝐢𝐧𝐠 𝐭𝐡𝐞}^{WhatsApp Channel now!} ^👈📲

^{𝑭𝒐𝒍𝒍𝒐𝒘 𝑶𝒖𝒓 𝑺𝒐𝒄𝒊𝒂𝒍 𝑴𝒆𝒅𝒊𝒂 𝑷𝒂𝒈𝒆𝐬} 👉 ^Facebook^,^{LinkedIn, Twitter, Instagram}

Eleven Labs