How to type Cantonese in a web browser: Seattle Googlers help make it possible

`

Albert Wong.

Albert Wong.

Albert Wong had spent his whole life trying —and failing — to type Cantonese characters on a computer.

But now, thanks to time allowed by Google’s 20-percent projects, some terrific teamwork and a whole lot of effort, he’s helped build an answer.

A year’s worth of work has come to fruition today, as Google just released the first tool ever that allows users to easily input Cantonese into a web browser using English phonetics, solving a problem that’s existed for decades. Similar tools exist for Mandarin Chinese, the national dialect used in China, but none for Cantonese until today.

Wong, a Seattle native and software engineer at the Google office here in the Emerald City, was traveling last year with Google software engineer Hannah Tang in Hong Kong, where Cantonese is widely spoken. The two had trouble using a Yelp-equivalent app because they couldn’t figure out how to enter particular Cantonese characters.

“We sat there and thought about how crazy of a problem this was,” Wong recalled. “It should have been solved years ago.”

cantonese screenshotSo for the next 12 months, Wong and Tang rallied up fellow Googlers from Beijing, Singapore, Mountain View, and New York City to help them develop a way to fix this difficult and complex problem. Everyone had different skills that were crucial to the development process, whether it was a linguistics background to parse apart input issues, language analysis A.I. expertise or just people who knew a lot about the Cantonese dialect. And of course it couldn’t have been done without Google’s massive data set.

A big problem for the team was that contrary to how Mandarin Chinese is learned, Cantonese does not rely on a formal English romanization system. On top of that, people around the globe from Malaysia to Quebec have different Cantonese standards for spelling and tonal pronunciation.

The Googlers had to analyze linguistic backgrounds of all kinds of people and come up with a model to segment apart English text and figure out how to map that into a set of Cantonese characters.

“Basically, we needed to take any random garbage you wrote in English and turn it into Cantonese,” said Wong, a University of Washington graduate and seven-year Google vet.

cantonesegoogle

What they’ve all helped create is somewhat like a spellcheck on steroids, as it allows you to type out Cantonese phrases using your best English-sounding guess. The technology they’ve built into the platform is excellent at predicting what a user is trying to type, much like how Google search works. The demo video above shows this more clearly. 

Wong says he hopes the 59 million Cantonese speakers around the world and especially those in Hong Kong will find this tool useful, as well as those who may know enough to speak in Cantonese but lack the writing knowledge. He’s also excited about bringing this to the elderly Chinese community who know enough English to guess the phonetics and now can use a computer to do things never possible before.

Wong, who calls this one of the “proudest launches of his career,” plans on making improvements to the technology over the next six months as feedback rolls in. For now, the translator is available via a Chrome Extension and will soon be integrated to Google Docs and Gmail.

  • http://twitter.com/Vroo Vroo (Bruce Leban)

    This is very cool. Very clever to recognize that this is essentially like the spelling correction problem. The article doesn’t mention it but I’m sure they’re going to use the feedback loop of what users type for particular characters to improve future results, so it’s only going to get better. Congrats, Albert and Hannah.

  • http://www.facebook.com/profile.php?id=510814710 Kohen Chia

    Not to discount their excellent work, but this has been around for several years already: http://www.cantoneseinput.com/?lang=en