An AI with language capabilities like 'Samantha' may be just around the corner. (Photo via HerTheMovie.com)
An A.I. with language capabilities like ‘Samantha’ may be just around the corner. (Photo via HerTheMovie.com)

Text-to-speech technology isn’t great. I’ve always found the robotic drone of computerized voices a bit grating — a sentiment that came up on a recent episode of GeekWire Radio when I bashed my editor’s favorite reading app.


That’s why Google’s new WaveNet audio generator feels like something of a breakthrough. The program, from Google’s DeepMind artificial intelligence division, learns to mimic recordings of human speech.

Other text-to-speech applications typically play snippets of human speech recordings or use computer-generated voices that have been programmed with language conventions. WaveNet generates a voice based on what it learns from human recordings, allowing it to adopt distinct cadences, male and female qualities, even breathing patterns.

“We could provide additional inputs to the model, such as emotions or accents, to make the speech even more diverse and interesting,” Google’s DeepMind team said in a blog post.

For an in-depth explanation of how WaveNet generates human-like speech, check out Google’s paper on the program.

WaveNet’s machine learning technology can also be applied to music. Researchers trained the program on a dataset of piano music and then let it generate its own eccentric compositions.

The program is still in its infancy but it could have powerful implications for Google, as tech companies race and compete to create a more natural-sounding A.I.

Listen to samples of WaveNet’s speech and music capabilities here.

Like what you're reading? Subscribe to GeekWire's free newsletters to catch every headline

Job Listings on GeekWork

Find more jobs on GeekWork. Employers, post a job here.