Could Amazon reviews keep you from getting sick? Researchers analyze text to predict food recalls

A group from the UW's Data Science for Social Good program developed the recall predictor this summer. From left to right: Lead Data Scientist Valentina Staneva, Data Science Fellow Michael Munsell, Data Science Fellow Cynthia Vint, Data Science Fellow Kara Woo, Data Science Fellow Kiren Verma and Project Lead Elaine Nsoesie. Photo: eScience Institute. — A group from the UW’s Data Science for Social Good program developed the recall predictor this summer. From left to right: Lead Data Scientist Valentina Staneva, Data Science Fellow Michael Munsell, Data Science Fellow Cynthia Vint, Data Science Fellow Kara Woo, Data Science Fellow Kiren Verma and Project Lead Elaine Nsoesie. Photo: eScience Institute.

For the past year, Amazon has been cracking down on sellers and third parties offering money for positive reviews. But for one team of data scientists, negative reviews on Amazon are much more interesting than positive ones.

Researchers from the University of Washington’s Data Science for Social Good program have set out to harness Amazon reviews to predict food product recalls. The team has developed a machine learning platform that can mine the text of Amazon reviews to make predictions about the safety of products. While the program is still in progress, it has the potential to revolutionize the process of food recalls.

Elaine Nsoesie, the team’s project lead and an assistant professor of global health at the UW’s Institute for Health Metrics and Evaluation, said the current recall process can be incredibly slow.

“Some of [the cases] were as long as one year, from the time we saw someone write a review saying ‘there is something wrong with this product’ to the time the FDA actually issued a recall,” she said.

This delay is largely because recalls only happen after an official investigative process, which is normally triggered when hospitals report patients with food poisoning. This program could shorten that process by using information from Amazon reviews to trigger investigations in real-time.

It might be a simple concept, but the technology required to make it a reality is complex. The idea is to compare the text of reviews with previously recalled products to “learn what in the text is actually indicating that this thing should be recalled, or this thing shouldn’t be recalled,” explained the project’s lead data scientist, Valentina Staneva.

Processing text this way is challenging, and a team of Data Science Fellows from the UW’s Data Science for Social Good program worked with Staneva and Nsoesie to test tested a variety of approaches to the problem. They flagged words in the reviews, like “mold,” “sick,” and “vomit,” and worked to discover which ones can predict recalls. They have also created an interface to display their data, and the connection between reviews and recalls.

The team's interactive display shows the number of ratings for a product over time. The red bar allows readers to compare reviews to the FDA recall of the product. Screenshot: — The team’s interactive display shows the number of ratings for a product over time. The red bar allows readers to compare reviews to the FDA recall of the product. Screenshot:

But the volume of reviews poses a unique challenge to the program’s predictive abilities.

“We’re dealing with something that occurs very rarely,” Staneva said, which makes predicting recalls difficult. In a nutshell, the program is quick to identify products that were not recalled, but has more difficulty identifying those that were recalled, she said.

Although the work by the Data Science for Social Good team has finished, Staneva and Nsoesie will continue to improve the program over the next year, with assistance from UW students, and are interested in expanding their text sources to include tweets and other publicly available social media.

In the meantime, they are looking towards possible partnerships to put the tech in motion.

The Washington State Department of Health has already shown an interest in using the program, Nsoesie said, and the team plans to build a real-time dashboard that could help health officials trigger recalls on a variety of foods and other products.

Staneva pointed out that the program would need to be tested more rigorously if such a partnership was pursued, considering the stakes involved in delaying or missing recalls. “The cost of missed recalled product could be somebody’s death,” she said.

The team is hoping to work with Amazon, as well. “I think it would be amazing if we could have a collaboration with them and they could help us get data and get it faster,” Nsoesie said. “So if we’re going to develop something that is real-time, then we need to have the data in real time.”

So the next time you throw out your salmonella-infected veggies, you might just have an Amazon review to thank.

Most Popular on GeekWire

Job Listings on GeekWork

Related Stories

Most Popular on GeekWire

Job Listings on GeekWork