Mistral Releases Voxtral Transcribe 2 Speech-to-Text Family
Narrative
Next-gen speech-to-text: Voxtral Mini Transcribe V2 (batch) + Realtime (streaming). State-of-the-art speed, accuracy, privacy (on-device/local), affordability ($0.003–$0.006/min), precision diarization, ultra-low latency (<200ms for realtime). Open weights (Apache 2.0) for Realtime variant; new audio playground for testing.
Reality
Released early February 2026 (around Feb 4–6). Available via Mistral API, Hugging Face, Le Chat playground. Supports 13+ languages; outperforms competitors in on-device/edge benchmarks per claims.
Implication
Pushes multimodal/edge AI forward; enables privacy-focused voice agents, live captioning, transcription disruption at low cost. Reinforces Mistral's strength in efficient, open models—punches above weight vs. closed giants. Sets stage for seamless voice integration in apps/agents.