ElevenLabs Multilingual v2 text to speech API | Replicate

ElevenLabs Multilingual v2 is a lifelike, emotionally expressive speech synthesis model that produces natural-sounding speech with rich emotional range across 29 languages. It delivers consistent voice quality and personality across all supported languages while maintaining each speaker’s unique characteristics.

Multilingual v2 is the highest-quality model in the ElevenLabs lineup — ideal for professional content where natural, nuanced speech matters most.

Key features

High emotional range: Rich, contextually aware speech with natural expressiveness
29 languages: Broad multilingual support with consistent voice quality
10,000 character limit: Generate up to ~10 minutes of audio per request
Stable long-form output: Most consistent quality on longer generations

Supported languages (29)

Code	Language	Code	Language
`en`	English	`pl`	Polish
`ja`	Japanese	`sv`	Swedish
`zh`	Mandarin Chinese	`bg`	Bulgarian
`de`	German	`ro`	Romanian
`hi`	Hindi	`ar`	Arabic
`fr`	French	`cs`	Czech
`ko`	Korean	`el`	Greek
`pt`	Portuguese	`fi`	Finnish
`it`	Italian	`hr`	Croatian
`es`	Spanish	`ms`	Malay
`id`	Indonesian	`sk`	Slovak
`nl`	Dutch	`da`	Danish
`tr`	Turkish	`ta`	Tamil
`fil`	Filipino	`uk`	Ukrainian
`ru`	Russian

Inputs

Parameter	Type	Default	Description
`prompt`	string	—	The text to convert to speech
`voice`	string	`Rachel`	Voice choice for speech generation
`language_code`	string	`en`	Language code (e.g., `en`, `es`, `fr`)
`stability`	number	`0.5`	Voice consistency (0.0–1.0)
`similarity_boost`	number	`0.75`	Similarity to the original voice (0.0–1.0)
`style`	number	`0`	Style exaggeration (0.0–1.0)
`speed`	number	`1`	Speed of speech (0.7–1.2)
`previous_text`	string	—	Previous text for context
`next_text`	string	—	Next text for context

Use cases

Audiobooks and narration: Stable, high-quality output for long-form content
Character voiceovers: Rich emotional range for gaming and animation
Professional content: Corporate videos, e-learning, and media production
Multilingual projects: Consistent voice quality across language switches

Choosing between ElevenLabs models

Multilingual v2: Highest quality, best for professional content and audiobooks
Turbo v2.5: Balanced quality and speed (~250ms), 32 languages, 40,000 character limit
Flash v2.5: Fastest (~75ms), best for real-time and cost-sensitive use cases
v3: Most expressive, with 70+ languages and multi-speaker dialogue support

Model created 7 months, 3 weeks ago