'Hello, computer!' UW prof and students search outside the box

DSCN2911 — Huang and Ritter work in Etzioni's lab as phd students on natural-language search tools

Like Scotty in Star Trek, the day is coming soon when we’ll talk with our computers instead of just typing at them, according to a prominent UW computer scientist and his team of researchers.

Back in August, when it was still sunny in Seattle, and on the 20th anniversary of the alt.hypertext news group birthed by Tim Berners-Lee, the director of the UW’s Turing Center, Oren Etzioni, called for a new way of doing searches online.

In an essay penned for Nature, Dr. Etzioni reiterated what computer scientists have known for several years: it’s high time we thought outside of the old text box.

Highlighting what he says is a “curious lack of ambition and imagination,” Etzioni says that the software industry’s momentum has been behind what has worked, and not on new methods.

He has been pushing against this inertia with the UW’s KnowItAll research group since 2003. Their work has focused on open-information extraction, a search process that digs into sentences for “syntactic clues” instead of operating by a rules-based, “keyword”-hunting system that tracks down a particular term or set of terms. His method aims, he writes, to “locate the verbs in a sentence, identify entities related by the verb, and use these to create statements of fact”

Etzioni is the brains behind Farecast, Bing’s airfare-predicting service. Just like how Farecast aggregates flight prices, open-information extraction tries to stack up many millions of queries to tease out the intuitive meaning behind a question. He explains some of the finer points in this demo from July. And those who’d like to look at some of the open-coded nitty-gritty can look here.

The first step is getting “the key information out of the documents,” he says, and building on some of the steps taken by Apple’s Siri (and those playing with it to control their home furnaces, for example).

This “machine reading” of the Internet is ambitious, yes. The Defense Advanced Research Projects Agency (DARPA) has gotten in on the action, funding projects related to the many challenges associated getting computers to think like people, even just a little bit. IBM’s Watson is another, if narrower approach, famous for its appearance on Jeopardy!

But “we need more research on this topic by academia, big companies like Google and Bing, and startups as well,” he says, if we’re going to avoid drowning in the coming waves of web-based knowledge, Etzioni says. His own startup, Decide.com, tries to do this with consumer-electronics-related searches.

Some of his Ph.D. students, including Jeff Huang and Alan Ritter, are also hard at work on the perils and promise of machine reading.

“Search in the past has been about document analysis,” says Huang, but “more recently, it’s been about user behavior.”

A human-level understanding of what it means to ask a question about a local restaurant goes beyond just figuring out where it is and how late it’s open, adds Ritter, who specializes in searching for meaning in the real-time trends found on sites such as Twitter.

“A person can’t just sit down and read the whole web,” he says. And while Watson gets attention, it’s not necessarily the future for most people, who will be searching for more and more stuff on their tablets, pads and phones, and not just from home.

“Google will be the best for a long time” at text hunting, says Huang, but Bing and others have more to gain by investing heavily in non-keyword search tools, even as such tools become more and more refined.

The move will be toward speedier, ranked results based on inferred natural-language searches, and not simply on “a bag of words” that you toss at an engine and hope helps you, Huang says.

Huang notes that about a quarter of searches online today are not readily answered by traditional keyword queries, and as the web expands and grows deeper, that number will grow. That means that the need for better, more intuitive answer-finders will grow keener, too.

So, how far are we from conversations with the data cloud?

“We still have a ways to go,” Etzioni says, “but [we’re] getting much closer!”

GeekWire contributor Will Mari is a Ph.D. student in the UW’s Dept. of Communication, and studies the history of technology and journalism. You can reach him at: wtm2@uw.edu.

Related Stories

Most Popular on GeekWire

Job Listings on GeekWork