Text-to-speech technology isn’t great. I’ve always found the robotic drone of computerized voices a bit grating — a sentiment that came up on a recent episode of GeekWire Radio when I bashed my editor’s favorite reading app.
That’s why Google’s new WaveNet audio generator feels like something of a breakthrough. The program, from Google’s DeepMind artificial intelligence division, learns to mimic recordings of human speech.
Other text-to-speech applications typically play snippets of human speech recordings or use computer-generated voices that have been programmed with language conventions. WaveNet generates a voice based on what it learns from human recordings, allowing it to adopt distinct cadences, male and female qualities, even breathing patterns.
“We could provide additional inputs to the model, such as emotions or accents, to make the speech even more diverse and interesting,” Google’s DeepMind team said in a blog post.
For an in-depth explanation of how WaveNet generates human-like speech, check out Google’s paper on the program.
WaveNet’s machine learning technology can also be applied to music. Researchers trained the program on a dataset of piano music and then let it generate its own eccentric compositions.
The program is still in its infancy but it could have powerful implications for Google, as tech companies race and compete to create a more natural-sounding A.I.