Voice AI issues – the clunky speech, weird pauses and inaccuracies — are being fixed, execs say
Voice AI technology has been around for years. But clunky voices, awkward pauses, and problems with accuracy have been roadblocks to widespread adoption.
Many of those issues are now being resolved as more startups jump into the voice AI fray, Twilio and Zoom CEOs said recently at the Goldman Sachs Communacopia + Technology conference.
Twilio CEO Khozema Shipchandler said that internal research shows customers would prefer to interact with voice AI as opposed to humans — especially in healthcare. That’s because customers feel there’s an “asymmetry in knowledge between the two sides” when it comes to human agents, and weird interactions disappear with virtual voice agents, Shipchandler said.
“You don’t have these awkward pauses when you have these interactions take place between a human on one side and then a voice AI agent on the other side,” Shipchandler said.
Latency — or the reaction time by voice AI agents — has historically been an issue, but that is now close to being resolved, Shipchandler said.
Zoom has invested heavily in its voice AI agents, which are multilingual and have natural voices, said Zoom CEO Eric Yuan. The goal is to make sure those sometimes odd pauses go away.
But real-world experiments have had mixed results. According to reports, restaurant chains such as Taco Bell and McDonald’s have stopped voice AI efforts at drive-throughs as the AI couldn’t interpret vocal orders correctly.
The technology still has a long way to go, as it’s much harder to implement than text-based AI, said Jack Gold, principal analyst at J. Gold Associates. “Voice, even with a single language like English, has a huge amount of variability, with accents — think southern drawl vs. New England ‘ahs’ — and even the same language meaning different things to different people,” he said.
On the plus side, voice is a natural way to handle inquiries, as not everyone types well, Gold said.
In areas such as food delivery, 35% of orders still come in over the phone — and voice AI agents can help make those interactions faster and more efficient. “The voice AI’s capacity is unlimited,” Shipchandler said.
Thousands of venture-backed voice AI companies are now trying to solve these issues, he said.
More people are now talking to ChatGPT instead of using text prompts, which shows the potential of voice AI, Yuan said. “I think pretty sure in the next two to three years, a lot of new solutions will be built upon voice technology,” he said.
There are still risks involved in voice spoofing that will need to be resolved. If systems could identify a voice signature up front and then do light verification on the back end, customers can get right into the conversation and drive the interaction and outcomes. “You’ve got to take out spoofing, because that is a real thing,” Shipchandler said.
Meanwhile, Zoom is working with chief information security officers and publishing papers on how to deploy its AI technologies.
nice AIThere will be continuous improvement in voice AI over the next couple of years to eliminate many of the errors being discovered in voice-based AI systems, Gold said. “That will improve especially as the data input to the models gets better,” he said.Anthropic releases new version of its smaller Haiku model – ComputerworldRead More