elevenlabs/v2-multilingual

Generate multilingual text-to-speech audio in over 30 languages

2K runs

ElevenLabs Multilingual v2 is a lifelike, emotionally expressive speech synthesis model that produces natural-sounding speech with rich emotional range across 29 languages. It delivers consistent voice quality and personality across all supported languages while maintaining each speaker’s unique characteristics.

Multilingual v2 is the highest-quality model in the ElevenLabs lineup — ideal for professional content where natural, nuanced speech matters most.

Key features

  • High emotional range: Rich, contextually aware speech with natural expressiveness
  • 29 languages: Broad multilingual support with consistent voice quality
  • 10,000 character limit: Generate up to ~10 minutes of audio per request
  • Stable long-form output: Most consistent quality on longer generations

Supported languages (29)

Code Language Code Language
en English pl Polish
ja Japanese sv Swedish
zh Mandarin Chinese bg Bulgarian
de German ro Romanian
hi Hindi ar Arabic
fr French cs Czech
ko Korean el Greek
pt Portuguese fi Finnish
it Italian hr Croatian
es Spanish ms Malay
id Indonesian sk Slovak
nl Dutch da Danish
tr Turkish ta Tamil
fil Filipino uk Ukrainian
ru Russian

Inputs

Parameter Type Default Description
prompt string The text to convert to speech
voice string Rachel Voice choice for speech generation
language_code string en Language code (e.g., en, es, fr)
stability number 0.5 Voice consistency (0.0–1.0)
similarity_boost number 0.75 Similarity to the original voice (0.0–1.0)
style number 0 Style exaggeration (0.0–1.0)
speed number 1 Speed of speech (0.7–1.2)
previous_text string Previous text for context
next_text string Next text for context

Use cases

  • Audiobooks and narration: Stable, high-quality output for long-form content
  • Character voiceovers: Rich emotional range for gaming and animation
  • Professional content: Corporate videos, e-learning, and media production
  • Multilingual projects: Consistent voice quality across language switches

Choosing between ElevenLabs models

  • Multilingual v2: Highest quality, best for professional content and audiobooks
  • Turbo v2.5: Balanced quality and speed (~250ms), 32 languages, 40,000 character limit
  • Flash v2.5: Fastest (~75ms), best for real-time and cost-sensitive use cases
  • v3: Most expressive, with 70+ languages and multi-speaker dialogue support
Model created