Microsoft’s Artificial Intelligence and Research Group, a major new engineering and research division formed last year inside the Redmond company, is debuting a new technology that lets developers customize Microsoft’s speech-to-text engine for use in their own apps and online services.
The new Custom Speech Service is set for release today as a public preview. Microsoft says it lets developers upload a unique vocabulary — such as alien names in Human Interact’s VR game Starship Commander — to produce a sophisticated language model for recognizing voice commands and other speech from users.
It’s the latest in a series of “cognitive services” from Microsoft’s Artificial Intelligence and Research Group, a 5,000-person division led by Microsoft Research chief Harry Shum. The company says it has expanded from four to 25 cognitive services in the last two years, including 19 in preview and six that are generally available.
The company says it will bring two more cognitive services, Content Moderator and Bing Speech API, out of preview and make them generally available next month. Content Moderator analyze images and video with technology including optical-character and object recognition, helping companies filter out unwanted content. The Bing Speech API converts audio into text, interprets the intent of the language and converts text back to speech.
Microsoft formed the group to accelerate its artificial intelligence advances, aiming get more of its technologies out of the labs and into its own products as well as its services for third party developers. The AI and Research Group also includes Microsoft’s Cortana voice-based assistant and Bing search engine.
The company is competing against rivals including Amazon, Google and others in the booming field of artificial intelligence. AI and machine learning are increasingly becoming integral parts of their cloud platforms, as well.
Microsoft’s new Custom Speech Service also includes an acoustic model that cancels out background noise to improve speech recognition. Microsoft cited the example of using Custom Speech Service at an airport kiosk where the environmental noise would otherwise make speech recognition very difficult.
“The combination of a language model and this acoustic model in a single API that is customizable for your vocabulary is truly unique in the market,” said Irving Kwong, group program manager, in an interview. In going from a private preview to a public preview, the service will be able to take on tens of thousands of new customers.