The inspiration for the latest extension of the Semantic Scholar search engine, adding tens of millions of biomedical studies, may well have come from Marie Hagman’s aching stomach.
As senior product manager at Seattle’s Allen Institute for Artificial Intelligence, or AI2, Hagman played a key role in figuring out how to incorporate documents from PubMed and other biomedical databases in the academic search tool.
She drew upon her personal experience from 15 years earlier, when she was a software engineer suffering from two stomach ulcers and gastritis. Her specialist gave her a prescription to deal with the issue, but told her she’d probably have to keep taking pills for the rest of her life.
“I was thinking, ‘Hmm … I’m young and healthy. That just doesn’t sound right,'” Hagman recalled. “They still couldn’t tell me why I had this problem. So I decided to be my own advocate.”
She searched through the medical literature on stomach ulcers, and found a study in which researchers pointed to a type of bacteria known as Helicobacter pylori as a potential cause. Armed with that knowledge, she persuaded another specialist to put her on a two-week round of antibiotics.
“I’ve been cured ever since,” Hagman told GeekWire.
Now her objective is to help researchers, and even regular folks, find the most relevant studies that address the medical questions they want to answer.
“The literature is out there,” Hagman said. “We’re paying for it, right? Our tax dollars go to fund this research. I think that it really makes sense for people to be able to find it — but most importantly, for researchers who are actually investing their careers in these areas to be able to find things more easily.”
With backing from Microsoft co-founder Paul Allen, AI2 launched Semantic Scholar two years ago as a specialized search tool for computer science studies. The software took advantage of textual analysis and machine learning to extract relevant terms that a human indexer might not notice.
Last year, the database was expanded to take in neuroscience research as well. Since then, Hagman and her colleagues have been developing new algorithms and overhauling the database to allow its breadth to be expanded from 12 million documents to 40 million.
“The challenge for us was actually scaling it up to all 80-plus medical domains. … We had to come up with an entirely new approach,” Hagman said. AI2’s researchers will explain exactly how they did it in a paper to be published in the near future, she said.
Hagman said the upgraded search tool should help novices as well as researchers who are getting into a field where they may not be completely familiar with the nomenclature.
“If I type in ‘stomach ulcer,’ ‘gastric ulcer’ will come up, which is apparently the medical term for that condition — which I never would have known,” Hagman said. “We also provide topic summaries so you can drill into that particular topic. You can see what else is related to it, and see what are the best papers to start with if you want to start informing yourself.”
That doesn’t mean AI2 is in competition with the others. On the contrary: AI2 is teaming up with Google, Microsoft and Baidu for an initiative known as Open Academic Search. The aim is to facilitate data sharing on a basic level so that participants can devote more resources to creating true innovations.
For Semantic Scholar, that means going for depth as well as breadth.
“We would like to go deeper on both computer science and biomedicine, to prove out some hypotheses about the interesting things we can do,” Hagman said. “But there’s also a lot of overlap between computer science and mathematics and physics, or between medicine and chemistry. So over time, we will expand out as well.”