Speech technology from Microsoft’s Tellme group is quickly becoming a bigger part of the company’s products, starting with the upcoming revamps of Xbox Live and Windows Phone. Users will have more opportunities to use voice commands to interact with and control the on-screen experience — listening to text messages in Windows Phone, for example, and responding to them by voice.
And people should find the voice recognition to be considerably better than in the past, said Ilya Bukshteyn, a Microsoft Tellme senior director, when we met up today on the company’s Redmond campus.
Here’s why: The technology has been improved by the diversity of voice searches coming in through applications such as Bing on mobile phones. In addition, Microsoft is using a unified, cloud-based service across its different voice applications. With a larger collection of data to work from, the unified system can learn more quickly.
“We’ve seen more improvement in the last 18 months to two years than we saw in a decade before that,” Bukshteyn said.
In the video above, Bukshteyn shows the expanded voice features for Microsoft’s Kinect sensor on Xbox 360 — including a more seamless approach to the Xbox Live menu, and voice integration into games. Those improvements are rolling out this fall along with the broader Xbox Live upgrade.
The deeper Windows Phone integration will come with the release of the Mango update, also this fall.
It’s part of the broader push toward “natural user interfaces” to supplement the keyboard and mouse.
Long term, Microsoft is aiming to turn voice technology into more of a natural conversation with the machine, as opposed to the commands used today. The company released this “glimpse of the future” video last week to demonstrate where it hopes to go over the next three to five years.
For more, here are excerpts from what Bukshteyn had to say today …
How things have improved: “The science of speech gets better through two things: machine learning, and a massive amount of data. In the cloud, we have built a feedback loop that learns from usage and improves the service right away. We can deliver a better experience tomorrow than we had today.”
How web search helps improve voice recognition: “There’s only so much you’re going to learn from a lot of people saying “agent,” or a limited set of words. The thing that’s so cool with Bing voice search is that you would get a really diverse set of utterances, and we saw that really take off. Across the industry, anywhere from 25 to 30 percent of mobile search are now done using voice. The interesting thing for us is that on Windows Phone, we actually much higher, and we attribute to that to speech becoming core to the user interface (in Windows Phone).”
Shift to the cloud: “One cloud for speech is incredibly important powerful. Our goal is having one feedback loop and one cloud that actually learns across domains. The key there is really having varied utterances. We get about 11 billion utterances a year right now in our cloud. We believe it’s the most-used speech cloud in the industry — it’s a little hard to get stats. So literally that’s multiple utterances a second, where each one is an opportunity to get better and learn.”
Voice in Internet Explorer: “There’s nothing that we’ve announced. I think you’ve seen, out in the industry, some work that we’re participating in with standards bodies to, at some point in the future, have a voice/speech tag in html. Hasn’t been agreed to yet. We’re very active with the standards bodies. Once you have that, it opens up a whole bunch of opportunities. The way it’s being discussed, the tag could point to a local engine or a cloud engine — any HTML5 app could make use of a local speech engine, or could point to a cloud for any part of the app.”
What about Windows? “You’ve probably seen that Windows 8 apps are HTML5-based. Nothing announced or concrete, but you can sort of see your way into the future, how that could play out. That’s certainly our vision in the future: You have a cloud service that’s available to any developer — Microsoft and external — that can be used in applications in Windows marketplaces, like phone or Xbox; could be used in line-of-business applications, Azure; or can be used for web applications, whether those run on the device, if you will, in a Windows 8 type of way, or anywhere.”