Secrets of the $500K Amazon Alexa Prize winner: Inside the Univ. of Washington's 'socialbot'

The Sounding Board team from the University of Washington won first place in the inaugural Alexa Prize competition last year. From left to right: Hao Cheng; Maarten Sap; Elizabeth Clark; Ari Holtzman; and Hao Fang. (GeekWire photos / Taylor Soper)

How long can a robot have an intelligent conversation with a human?

That was the challenge posed to hundreds of university students last year by Amazon as part of its inaugural Alexa Prize competition, which tested the boundaries of the company’s artificial intelligence-powered voice platform, also known as Alexa.

The winning team came right out of Amazon’s backyard in Seattle, as five University of Washington students won $500,000 for its Sounding Board “socialbot” that impressed a panel of judges for its ability to hold a conversation about pop culture, news events, and more. It received an average score of 3.17 on a 5-point scale from the panel of judges and achieved an average conversation duration of 10:22.

Ari Holtzman, a member of the Sounding Board team, interacts with Amazon’s Alexa at a UW event on Tuesday.

The students showed off the insides of Sounding Board at a UW alumni event this week on campus at the Paul G. Allen School of Computer Science & Engineering. The team talked about how it tackled both technical and social challenges related to people interacting with its socialbot, which had to hold a meaningful conversation with a user beyond accomplishing basic tasks like playing music or controlling lights. It was a fascinating look into how engineers are designing artificial intelligence services with human interaction, many of which could have a big impact on society over the next several years and decades.

“We see the socialbot as a two-way interaction,” said Hao Fang, a fifth-year electrical engineering PhD student and Sounding Board team leader.

The Alexa Prize teams used Amazon cloud-based technologies and the Alexa Skills Kit to design their socialbots. The competition is part of the company’s big push into AI and voice interaction, spanning across the Alexa platform and products like the Echo speaker device. Amazon opened up its Alexa platform to developers and third-party device manufacturers in 2015; last week it unveiled a new feature that lets just about anyone build their own Alexa skills.

Amazon is battling other tech giants also heavily investing in similar technology. For example, Microsoft has spent years developing XiaoIce, its own socialbot that has more than 200 million users in Asia.

In addition to the Alexa Prize, Amazon has other programs to expand the Alexa platform like its $100 million Alexa Fund, which launched in 2015 and is used by Amazon to invest in companies that will push the boundaries of voice-based interaction. There’s also the Alexa Accelerator, a Seattle-based program supporting early-stage companies that are working on B2C and B2B technologies related to Alexa.

The Alexa Prize teams retain ownership of their social bots, though Amazon does have a non-exclusive license to any technology or software developed in connection with the competition. There are eight new teams competing for the 2018 Alexa Prize competition.

Read on to learn more about how the Sounding Board team programmed its socialbot. Check out this Wired story for a deeper look at the Alexa Prize competition. Echo owners can give Sounding Board a try for the next week — tell the device, “let’s chat,” and you’ll get one of the three prize winners from last year’s competition. Keep asking until you get the UW team.

The approach

Sounding Board and each competing Alexa Prize team had access to Amazon’s automatic speech recognition service to get a textual interpretation of what a user said, and their text-to-speech technology to provide a response. But after that, it was up to the university students to design a robust framework that allowed Alexa to come up with interesting and relevant questions and responses to keep the conversation going.

“You can think of Sounding Board as a conversational gateway that stands between a user and a ton of online content — user-generated content from Reddit, factual content from Wikipedia, media-specific content from IMDb,” said Ari Holtzman, a PhD student studying AI.

Holtzman said his team used a two-pronged design strategy focused on being user-centric and content-driven. “User-centric” means being sensitive to what the user said — are they positive about what the socialbot said? Are they negative, and should the bot switch topics? And who is the user, and what topics are they interested in?

“Are they introverted? Are they extroverted?” Holtzman explained. “That ties to different content.”

Holtzman said the socialbot had to be clever in how it suggested topics and brought up facts. “We can get ourselves into a corner if we have a topic that we don’t have much to say about,” he noted.

Spoken language understanding

The socialbot needed to understand what the user was saying in order to have a conversation, taking speech and extracting meaning out of it.

“This is a fairly easy task for people, but if you’ve ever tried talking to Alexa or Siri and have it misunderstand what you were trying to say, you know that this is still a complicated problem for artificial intelligence systems,” said Elizabeth Clark, a third-year PhD student studying national language processing.

The Sounding Board team took an approach that mapped speech to text, and then extracted meaning from said text. It built technology that could tell the difference between commands, questions, topics, and user reactions.

“User reactions are important, because if we present a fact to someone and they say, ‘that’s really interesting,’ you will want to respond differently than if they say, ‘that’s really boring,'” Clark explained.

She also noted that one response can involve a combination of a command, question, topic, etc. That’s why the team used a “multi-dimensional representation” to understand both reactions and commands at once.

Hierarchical Dialog Manager

Once the system deciphers what a user is saying, it uses a “hierarchical dialog manager” that keeps the conversation coherent and keeps the user engaged. It’s a combination of a “master dialogue manager” in charge of switching between “miniskills” that dictate subsegments of the conversation.

Maarten Sap, a PhD student studying contextual language modelling and social science NLP applications, noted that one important miniskill the Sounding Board team created was a personality analysis tool inspired by social psychology and personality theory. The point was to provide more relevant content for users.

“We wanted to present content that was more relevant to users with different personalities,” he said. “We had to ask them personality-related questions and mapped them on to five different personality traits.”

Another part of dialogue management is making transitions smooth — for example, asking the user if they have read an article.

“If they don’t respond, we can hear a pause, and ask the user to say ‘next’ if they want to move on,” Sap said.

Content manager

The socialbot needed to have a large collection of information covering various topics in order to carry out a conversation. Sounding Board built its content system using a knowledge graph that crawled the internet and extracted information from various sources, ranging from user-generated data on sites like Reddit, to more formal written articles.

Hao Cheng, a PhD student researching natural language processing and machine learning, said that the team spent a lot of time programming the socialboat to filter out inappropriate content. There was also a focus on surfacing content that was positive and uplifting.

The knowledge graph, which had 80,000 entries covering 300,000 topics, was updated daily. The slide above shows how the socialbot starts with a news article (middle) and can take the conversation in different directions, depending on what part of the article a user is interested in.

Improvement

One key aspect of Sounding Board’s process that helped it win the Alexa Prize was how the team improved its socialbot over time. The students scoured through transcripts of interactions to find where conversations went poorly and figured out how to tweak the system to prevent future slipups. They were able to better understand what makes a good topic; how to tell when people were bored with a topic; how to balance different types of content; and more.

“These are just a few of the questions we couldn’t answer well until we saw how people interact with the socialbot,” Clark said. “Once we identified common issues in the conversation logs, we could start brainstorming solutions.”