Microsoft technology can now recognize speech as well as humans can

Microsoft Artificial Intelligence group researchers, from top left, back row: Wayne Xiong, Geoffrey Zweig, Frank Seide, front row, Xuedong Huang, Dong Yu, Mike Seltzer, Jasha Droppo and Andreas Stolcke. Photo by Dan DeLong — Microsoft Artificial Intelligence and Research team members, from top left, back row: Wayne Xiong, Geoffrey Zweig, Frank Seide, front row, Xuedong Huang, Dong Yu, Mike Seltzer, Jasha Droppo and Andreas Stolcke. Photo by Dan DeLong

In what it calls a “major breakthrough in speech recognition,” Microsoft has built technology that can decipher conversation as well as people can.

A group of researchers and engineers with Microsoft’s Artificial Intelligence and Research team published a paper Monday on a computer system that makes about the same number of errors, or less, as professional transcriptionists.

“We’ve reached human parity,” Xuedong Huang, the Microsoft’s chief speech scientist told the company blog. “This is an historic achievement.”

That doesn’t mean the system is perfect, just that it didn’t make any more mistakes than humans. A person has a “word error rate of 5.9 percent,” and the research team was close to matching that about a year ago. Microsoft was able to reach parity with human transcriptionists by employing “neural language models in which words are represented as continuous vectors in space, and words like “fast” and “quick” are close together,” according to the blog post.

Microsoft said this breakthrough has wide-ranging applications from its products, from Xbox, to an instant voice-to-text service to a much smarter version of its digital assistant Cortana.

Microsoft has been researching speech innovations for a long-time. Before this, its biggest recent breakthrough was Skype Translator, which allows people who speak different languages to converse over Skype. The next step, Microsoft says, is to make it easier for speech recognition systems to work in real-life settings, such as parties, or while a user is driving on the highway.

Microsoft’s next long-term goal involves moving from recognizing speech to understanding it. That would mean computers could answer questions and react to speakers. While the prevalence of digital assistants like Cortana and Amazon’s Alexa may make it seem like that kind of technology isn’t far off, Microsoft says fully reactive artificial intelligence is a big lift.

“It will be much longer, much further down the road until computers can understand the real meaning of what’s being said or shown,” Microsoft Executive Vice President Harry Shum said in the blog post.

Microsoft technology can now recognize speech as well as humans can

Most Popular on GeekWire

Job Listings on GeekWork

Related Stories

Most Popular on GeekWire

Job Listings on GeekWork