Enterprise AI company Cohere on Thursday launched its first voice model called Transcribe, an open source automatic speech recognition model designed for tasks like note-taking and speech analysis.
The launch marks a significant step for Cohere, a company that has primarily been known for its text-based large language models aimed at enterprise customers. With Transcribe, Cohere is now expanding into the rapidly growing voice AI space — a market that has seen explosive demand over the past two years as businesses and consumers alike seek better tools for converting speech into text.
Small But Powerful
One of the most notable aspects of Transcribe is its relatively compact size — just 2 billion parameters. This makes it lightweight enough to run on consumer-grade GPUs for those who want to self-host the model. In an era where AI models are becoming increasingly massive and expensive to operate, Cohere's decision to keep Transcribe small and efficient is a deliberate move aimed at accessibility and practicality.
The model currently supports 14 languages: English, French, German, Italian, Spanish, Portuguese, Greek, Dutch, Polish, Chinese, Japanese, Korean, Vietnamese, and Arabic. This multilingual support makes it a versatile option for global enterprises dealing with diverse customer bases and multilingual teams.
Benchmark Performance
According to Cohere, Transcribe outperforms several competing models on the Hugging Face Open ASR leaderboard, including Zoom Scribe v1, IBM Granite 4.0 1B, ElevenLabs Scribe v2, and Qwen3-ASR-1.7B Speech. The model achieved an average word error rate (WER) of 5.42, which is lower than any other model on the benchmark.
In human evaluations, Cohere claims Transcribe had an average win rate of 61% over competing models when assessors judged transcriptions for accuracy, coherence, and usability. These are promising numbers, though the company acknowledged the model has some weaknesses. Transcribe fell behind its rivals when transcribing Portuguese, German, and Spanish content.
On the speed front, Cohere says the model can process 525 minutes of audio in a single minute, a figure that highlights its efficiency and makes it well-suited for high-volume enterprise use cases where large amounts of audio data need to be transcribed quickly.
Enterprise Integration and Availability
Cohere plans to integrate Transcribe into its enterprise agent orchestration platform, North, and is making the model available through its API for free. This free API access is a strategic move that could help Cohere attract developers and businesses who want to test the model before committing to deeper integrations.
The model will also be available on Model Vault, Cohere's managed inference platform. By offering multiple deployment options — self-hosted, API-based, and managed inference — Cohere is giving enterprises the flexibility to choose the setup that best fits their infrastructure and security requirements.
A Growing Market
The timing of this launch is no coincidence. Speech recognition models are growing increasingly popular as demand surges for note-taking and dictation applications like Granola and Wispr Flow. From meeting transcription to real-time captioning, medical documentation to customer service analytics, the use cases for accurate and fast speech-to-text technology are expanding rapidly.
The open source nature of Transcribe also positions it as a direct competitor to proprietary solutions from major tech companies. By releasing the model openly, Cohere allows researchers and developers to fine-tune it for specialized domains — something that could give it an edge in industries like healthcare, legal, and finance where domain-specific vocabulary is critical.
Cohere's Bigger Picture
Earlier this year, Cohere reportedly told investors it was generating annual recurring revenue of $240 million in 2025, and CEO Aidan Gomez has indicated that the startup may go public soon. The launch of Transcribe adds another product to Cohere's growing portfolio as the company positions itself for a potential IPO.
With this move, Cohere is signaling that it intends to be more than just a text-based AI company. By combining its existing language model capabilities with voice transcription, the company is building toward a more complete AI platform — one that can handle everything from understanding documents to transcribing conversations, all within a single enterprise ecosystem.







