Voxtral: The Affordable Open-Source Speech Recognition Model from Mistral Shaking Up the Market

The French company Mistral, known for its expertise in artificial intelligence, has launched Voxtral—a speech recognition model that is open source. This model aims to provide an alternative to proprietary solutions at half the cost.

Voxtral models come in two variants: the 24B version, suitable for industrial applications, and a compact 3B model designed for local and edge deployment. Both options feature a context window of 32,000 tokens, which Mistral representatives claim is capable of processing audio files lasting up to 30 minutes for transcription and up to 40 minutes for comprehension.

Unlike traditional transcription tools, Voxtral allows users to pose questions and receive answers, as well as summarize information, all without relying on separate speech recognition and language models. Users can also utilize voice commands to directly trigger server functions, seamlessly converting vocal requests into API calls.

The models support automatic speech recognition in English, Spanish, French, Portuguese, Hindi, German, Dutch, and Italian, while maintaining the text comprehension capabilities inherent in the language model foundations of Mistral Small 3.1.

Mistral’s testing indicates that Voxtral Small outperforms leading open-source models like Whisper large-v3, as well as GPT-4o mini Transcribe and Gemini 2.5 Flash across all evaluated tasks. Particularly in English short text transcription and the Mozilla Common Voice test, it reportedly surpasses ElevenLabs Scribe, one of the most effective models available.

In multilingual speech recognition tests conducted with FLEURS, Voxtral Small is said to exceed Whisper’s performance across all nine tested languages. Its results in audio comprehension are comparable to GPT-4o-mini and Gemini 2.5 Flash, delivering top-notch speech translation quality.

Mistral positions Voxtral as an affordable solution, with API pricing starting at $0.001 per minute. The Voxtral Mini Transcribe is claimed to surpass OpenAI’s Whisper in quality while being twice as cost-effective for budget-sensitive applications. Voxtral Small performs on par with ElevenLabs Scribe, providing similar savings.

Enterprise features include options for private deployment for regulated industries and fine-tuning for specific domains. Future updates will introduce speaker segmentation, audio annotations for age/emotion detection, and word-level timestamps.

Both versions of Voxtral can be downloaded under the Apache-2.0 license on Hugging Face, and Mistral also provides API access. These models will be utilized in voice mode in Le Chat, which will soon be available to all users.

**Delegate some of your routine tasks with** [**BotHub**](https://bothub.chat/?utm_source=contentmarketing&utm_medium=habr&utm_campaign=news&utm_content=VOXTRAL_A_NEW_SPEECH_RECOGNITION_MODEL_FROM_MISTRAL_THAT_IS_CHEAPER_THAN_PROPRIETARY_ALTERNATIVES)**!** No VPN is required to access this service, and Russian cards can be used. [Follow this link](https://bothub.chat/?invitedBy=m_aGCkuyTgqllHCK0dUc7) to claim 100,000 free tokens for your initial tasks and start experimenting with neural networks right now!

For the source of this news, click [here](https://the-decoder.com/mistral-unveils-voxtral-an-open-source-speech-model-with-lower-costs-than-proprietary-rivals/).