You know Siri and Alexa. Maybe even Cortana. But have you met Sara?
She’s pretty amazing. Sure, she can do basic tasks, like help you find a movie that you might like to watch, or match you with someone you should meet at an event. But behind the scenes, she’s also watching your expressions, assessing your emotions, determining the strength of your budding relationship, and adjusting her approach to form a lasting bond.
Sara is actually “SARA,” the Socially Aware Robot Assistant, a prototype virtual assistant created by Carnegie Mellon University researchers in a larger quest to use machine learning to better understand human behavior and personalize user experiences.
The idea is “to build systems that remind us of what we care most about — that sustain and scaffold and protect those aspects of being human that are really important, like relationships,” said professor Justine Cassell, an associate dean in CMU’s School of Computer Science and a specialist in human-computer interaction.
It’s one piece of an initiative called “Project InMind.” Yahoo and CMU announced the five-year, $10 million partnership in February 2014, giving CMU researchers the funding and tools to rethink — and potentially reinvent — the relationship between humans and machines.
Project InMind made a splash when it was announced at a news conference. One Yahoo executive said it “feels like the next very large step in a journey towards a grand dream of the far-flung future where computers will work in very close partnership with humans in ways that are very natural.”
Or, as Mashable put it at the time, “Yahoo Wants To Develop A Better Siri.”
However, Project InMind has largely flown under the radar since then, in part because of its distributed approach, funding a variety of projects before doubling-down on the most promising ones.
So what has happened in four years? A lot. Nine months after Project InMind launched, Amazon unveiled Alexa as part of its Echo smart speaker, showing how virtual assistants would escape the bounds of the smartphone to populate a much wider range of devices. Verizon completed its $4.5 billion purchase of Yahoo last year, folding the struggling Internet pioneer into its Oath subsidiary, alongside AOL.
The course of Project InMind has changed, as well. The original vision was to operate a prototype system across the Carnegie Mellon campus in Pittsburgh, with students using the Project InMind technologies every day. However, the demands of running a system reliable enough for daily usage outstripped the resources available to the researchers.
But as the initial five-year Project InMind partnership enters its final year, CMU researchers have developed a variety of working prototypes for smartphones. In the process, they’ve zeroed in on a few areas of focus and ambition:
- Giving virtual agents the ability to develop a relationship and rapport with users over time.
- Creating agents with a persistent, long-term knowledge of the user’s tasks and goals.
- Allowing agents to learn from users through explicit instruction.
Taken together, this represents a vision for the way virtual agents will work in 2025, said CMU professor Tom Mitchell, who founded and led CMU’s Machine Learning Department for its first 10 years, and is the co-leader of Project InMind along with Cassell.
Apart from building rapport and a long-term relationship, Mitchell said, the ability for a user to teach a virtual agent about his or her own preferences and habits will play a big role in interactions between humans and machines in this future. Up to now, much of machine learning has been focused on computers observing and analyzing huge data sets to figure out what to do.
As it turns out, there’s an obvious shortcut.
“Since we’re in the first decade when computers are no longer blind and deaf, we can actually converse with our phones,” Mitchell explained. “And so nobody has yet picked up on what I think is an inevitable development in conversational agents, which is that, in the next 10 years I’m quite sure we’re going to start seeing people use those conversations to directly instruct their phones — teach the phones what they want them to do, or, if you like, program them in natural language.”
The potential of this approach is evident in this video of a prototype “instructable agent,” as the user teaches the virtual assistant what to do in response to specific words or commands that the assistant wouldn’t otherwise understand.
Teaching a machine in this way might seem a laborious process, but it’s an early glimpse of the long-term potential, and the idea is that once the assistant learns something, it doesn’t need to be taught again.
The next time Alexa says, “Sorry, I’m having trouble understanding right now,” imagine if you could teach Amazon’s agent what you mean, rather than simply giving up and moving on to a more basic task.
“We have these devices now that are connected to your world, and speech processing and natural language capabilities that have improved dramatically. And I think now is the time where some of that can be synthesized,” said Igor Labutov, a CMU post-doctoral fellow who is working on the project. “In one sense, these assistants will be more like real human assistants, rather than these general fact-answerers, a Google search with a natural language interface, which is more like what it is right now” in the industry.
So what’s next for Project InMind? Verizon’s Oath, as the new Yahoo parent, remains committed to the initiative, and has funded the project through February 2019. Some of its representatives met with CMU researchers in December for an update. Oath has been looking at the technologies developed through Project InMind to see if there are initiatives that could be commercialized as part of the company’s product offerings. CMU could also commercialize the technologies in other ways.
At this point, it’s not clear if CMU’s Project InMind with Verizon/Oath will be renewed beyond the initial five years. But on the research front, Mitchell said the rise of smart speakers provide an interesting path forward, revisiting some of the project’s initial ambitions.
“It’s absolutely changed my vision of the future,” he said. “We’ve continued to focus mostly on phones for this project. But in year six, one thing we want to do is consider an environment that has those kinds of devices — stationary, non-mobile assistants — scattered through the environment, interacting with your mobile phones on a 5G network, which obviously ought to be a campus-wide 5G experimental testbed for the future for, say, Verizon and other companies.”
SARA, the Socially Aware Robot Assistant, was tested at the 2017 World Economic Forum as a way for attendees to connect with one another, and is also being tested in applications including a movie recommendation assistant for Android. The underlying technology is still a prototype, and not everything works on its own, requiring some human intervention to classify different aspects of conversations that might fall out of a predefined context, for example.
One of the next challenges on the list, Cassell said, is to build a dialog manager for SARA to interact socially while completing a task — chit-chatting and ultimately relying on past interactions to inform the discussion and build ongoing rapport.
Whereas the virtual assistants of today are always starting the relationship from scratch, she said, a rapport-building agent might instead say something like, “Yo, did you do your homework or not?”
“Little by little, we want to have SARA be the front end for InMind, so that there’s one single agent that you’re interacting with, who gives you access to all of these programs, and that’s going to take more than a year,” Cassell said. “But it’s a great challenge. And I think it’s a great challenge for Oath/Verizon/Yahoo. It’s going to give an embodiment to this system that’s going to be your counselor, your information source.”
And ultimately, in a surprise twist, this approach could help machines remind humans about the best parts of humanity.
“There’s a lot of technologies out there that don’t do that,” Cassell said. “They go straight to the task. There’s a lot of worry that children are going to start to think you don’t have to say ‘thank you,’ because Siri doesn’t need you to say ‘thank you.’ “