Transcribe audio to text in multiple languages. Whether you need a quick transcript, speaker labels, or real-time transcription, there's a model here for it.
GPT-4o Transcribe uses GPT-4o to transcribe audio with the best word error rate available. It handles accents, technical terms, and noisy audio better than traditional Whisper models. Supports optional text prompts to guide transcription style.
Incredibly Fast Whisper lives up to its name — it can transcribe 150 minutes of audio in under 2 minutes. Supports 98 languages with optional speaker diarization and word-level timestamps. The go-to choice for high-volume transcription.
Need to know who said what? WhisperX adds speaker diarization and word-level timestamps on top of Whisper Large v3. 70x faster than real-time. Great for meetings, interviews, and podcasts.
GPT-4o Mini Transcribe gives you GPT-4o-level transcription quality at a lower price point. Same improved accuracy for accents and technical vocabulary, just at a more cost-effective tier.
SeamlessM4T handles speech-to-speech translation, speech-to-text translation, text-to-speech, and automatic speech recognition in one model. Use it when you need to translate between languages, not just transcribe.
You can also check out our speaker diarization collection for models that identify speakers from audio and video.
Featured models

openai/gpt-4o-transcribeA speech-to-text model that uses GPT-4o to transcribe audio
Updated 5Â months, 2Â weeks ago
41.2K runs

victor-upmeet/whisperxAccelerated transcription, word-level timestamps and diarization with whisperX large-v3
Updated 1Â year, 7Â months ago
7.2M runs

vaibhavs10/incredibly-fast-whisperwhisper-large-v3, incredibly fast, powered by Hugging Face Transformers! 🤗
Updated 2Â years, 2Â months ago
31.6M runs
Recommended Models
If speed is your top priority, vaibhavs10/incredibly-fast-whisper and openai/gpt-4o-transcribe are among the fastest models in the speech-to-text collection. They’re designed for low-latency transcription, which makes them ideal for live or near real-time scenarios like voice notes, quick interviews, or interactive applications.
Keep in mind that faster models may not include advanced features like speaker labeling or word-level timestamps.
openai/whisper is a reliable general-purpose option that works well with clean audio and single-speaker recordings. It offers multilingual support and solid accuracy for most everyday transcription needs.
If you need more structure—like timestamps or speaker labels—victor-upmeet/whisperx adds those capabilities without a massive jump in runtime.
For clear recordings like lectures, podcasts, or voice memos, vaibhavs10/incredibly-fast-whisper or openai/whisper are great choices. They deliver accurate transcripts quickly and handle common accents well.
If your audio includes multiple speakers—like team meetings, interviews, or panel discussions—victor-upmeet/whisperx is your best bet. It adds speaker diarization and word-level timestamps so you can keep track of who said what.
If you need transcription in multiple languages or want translations built in, cjwbw/seamless_communication is a strong option. It supports multiple languages and can handle more complex audio scenarios like mixed-language conversations.
Most models produce plain text transcripts. Some also include:
You can package your own model with Cog and push it to Replicate. This lets you control how it’s run, updated, and shared, whether you’re adapting an open-source model or deploying a fine-tuned one.
Many models in the speech-to-text collection allow commercial use, but licenses vary. Some models have conditions or attribution requirements, so always check the model page before using transcripts in commercial projects.
Recommended Models

Google's most advanced reasoning Gemini model
Updated 4Â months, 4Â weeks ago
1.1M runs

openai/gpt-4o-mini-transcribeA speech-to-text model that uses GPT-4o mini to transcribe audio
Updated 5Â months, 2Â weeks ago
14.5K runs

thomasmol/whisper-diarization⚡️ Blazing fast audio transcription with speaker diarization | Whisper Large V3 Turbo | word & sentence level timestamps | prompt
Updated 1Â year, 2Â months ago
7.5M runs

openai/whisperConvert speech in audio to text
Updated 1Â year, 4Â months ago
143.8M runs

🗣️ Nvidia + Suno.ai's speech-to-text conversion with high accuracy and efficiency 📝
Updated 2Â years, 3Â months ago
24.8K runs

adidoes/whisperx-video-transcribeASR from video URL based on whisperx using large-v2 model
Updated 2Â years, 7Â months ago
19.7K runs

cjwbw/seamless_communicationSeamlessM4T—Massively Multilingual & Multimodal Machine Translation
Updated 2Â years, 7Â months ago
107.7K runs

daanelson/whisperxAccelerated transcription of audio using WhisperX
Updated 2Â years, 9Â months ago
94.6K runs

m1guelpf/whisper-subtitlesGenerate subtitles from an audio file, using OpenAI's Whisper model.
Updated 3Â years, 6Â months ago
74K runs