ElevenLabs Multilingual v2 is a lifelike, emotionally expressive speech synthesis model that produces natural-sounding speech with rich emotional range across 29 languages. It delivers consistent voice quality and personality across all supported languages while maintaining each speaker’s unique characteristics.
Multilingual v2 is the highest-quality model in the ElevenLabs lineup — ideal for professional content where natural, nuanced speech matters most.
Key features
- High emotional range: Rich, contextually aware speech with natural expressiveness
- 29 languages: Broad multilingual support with consistent voice quality
- 10,000 character limit: Generate up to ~10 minutes of audio per request
- Stable long-form output: Most consistent quality on longer generations
Supported languages (29)
| Code | Language | Code | Language |
|---|---|---|---|
en |
English | pl |
Polish |
ja |
Japanese | sv |
Swedish |
zh |
Mandarin Chinese | bg |
Bulgarian |
de |
German | ro |
Romanian |
hi |
Hindi | ar |
Arabic |
fr |
French | cs |
Czech |
ko |
Korean | el |
Greek |
pt |
Portuguese | fi |
Finnish |
it |
Italian | hr |
Croatian |
es |
Spanish | ms |
Malay |
id |
Indonesian | sk |
Slovak |
nl |
Dutch | da |
Danish |
tr |
Turkish | ta |
Tamil |
fil |
Filipino | uk |
Ukrainian |
ru |
Russian |
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt |
string | — | The text to convert to speech |
voice |
string | Rachel |
Voice choice for speech generation |
language_code |
string | en |
Language code (e.g., en, es, fr) |
stability |
number | 0.5 |
Voice consistency (0.0–1.0) |
similarity_boost |
number | 0.75 |
Similarity to the original voice (0.0–1.0) |
style |
number | 0 |
Style exaggeration (0.0–1.0) |
speed |
number | 1 |
Speed of speech (0.7–1.2) |
previous_text |
string | — | Previous text for context |
next_text |
string | — | Next text for context |
Use cases
- Audiobooks and narration: Stable, high-quality output for long-form content
- Character voiceovers: Rich emotional range for gaming and animation
- Professional content: Corporate videos, e-learning, and media production
- Multilingual projects: Consistent voice quality across language switches
Choosing between ElevenLabs models
- Multilingual v2: Highest quality, best for professional content and audiobooks
- Turbo v2.5: Balanced quality and speed (~250ms), 32 languages, 40,000 character limit
- Flash v2.5: Fastest (~75ms), best for real-time and cost-sensitive use cases
- v3: Most expressive, with 70+ languages and multi-speaker dialogue support
Model created