
Vogent Turn recently launched!
Founded by Jagath Vytheeswaran & Vignesh Varadarajan
What makes conversation feel natural isn't just what we say; it's knowing when to speak and when to listen. Current voice AI systems struggle with this fundamental challenge, either interrupting users mid-thought or waiting awkwardly long after they've finished. The problem? Existing turn detection relies on either audio-only signals (missing semantic context) or text-only analysis (losing critical vocal cues). When a user says "I'm flying from San Francisco..." are they pausing to think, or done answering? Audio and text alone can't tell; but humans know instantly.
Vogent-Turn-80M solves this by doing what humans do naturally: processing both how you're speaking and what you're saying simultaneously. Built on a streamlined 80M parameter architecture, their model fuses acoustic features from audio with conversational context from text, enabling it to understand that "123 456" with rising intonation means more is coming, while "I'm departing from San Francisco" is incomplete when asked about two airports. This multimodal approach achieves 94.1% accuracy while running in just ~7ms on a T4 GPU—fast enough to feel completely natural in real-time conversations.
The result? Voice agents that finally feel like talking to a human. No more frustrating interruptions. No more awkward pauses. Just natural, flowing conversation that responds at the right moment, every time.
Hugging Face: https://huggingface.co/vogent/Vogent-Turn-80M
GitHub: https://github.com/vogent/vogent-turn
Blog Post: https://blog.vogent.ai/posts/voturn-80m-state-of-the-art-turn-detection-for-voice-agents