$1.2M crowdsourced contest aims to improve breast-cancer detection through deep machine learning

Current breast-cancer screening methods result in too many false positives, causing unnecessary anxiety and additional testing, and creating additional cost to the healthcare system. (Digital Mammography DREAM Challenge)

Breast-cancer screening using mammograms certainly saves lives, but too many women receive false positives from the test and — even worse — some cancers are missed.

Now Seattle researchers and healthcare providers are leading a global, X-prize style contest that could ultimately result in big improvements in breast-cancer diagnoses. They’re helping organize the crowdsourced event that asks commercial and nonprofit organizations to develop algorithms that use deep machine learning to more accurately identify breast cancers.

Diana Buist, the director of Research and Strategic Partnership for Seattle’s Group Health Research Institute

In addition to providing an opportunity to help combat a leading cause of death in women, the Digital Mammography DREAM Challenge will be awarding $1.2 million in prizes.

The effort pulls together experts in cancer care, computer science and many other disciplines to address the problem.

“It’s the kind of work that is pushing the frontier forward,” said Diana Buist, the director of Research and Strategic Partnership for Seattle’s Group Health Research Institute.

The project originated with the institute and Sage BioNetworks, a Seattle-based nonprofit promoting the use of computational models in health care. The event is organized by DREAM Challenges, a not-for-profit group that for 10 years has been holding open-science, crowdsourced contests to advance science and health research.

Between the cash prizes and the reputation of the DREAM Challenges, “this challenge will be able to attract top teams in the world,” said Justin Guinney, director of Computational Oncology and Data Science at Sage.

Some 372 teams have registered since June. The event unfolds in two phases, one competitive, the other a collaborative “community” phase. The competition officially starts Sept. 7.

The contest uses 640,000 digital mammography images from nearly 87,000 patients. All personally identifiable information is removed from the records.

Example of a mammogram. (Radish Medical Solutions)

The dataset also includes information about a woman’s risk of having breast cancer, such as whether her mother, sister or other relatives have had the disease. It has details about whether they’ve had a biopsy as well as their breast density, body-mass index, race and ethnicity. And the records show whether the patient did or did not ultimately have breast cancer.

Without assistance, it’s a lot of data points for a doctor to assess. “There’s a point where you’re juggling so many variables, it’s taxing and becomes inefficient,” said Eric Fogel, CEO of Radish Medical Solutions, a Seattle startup and one of the event organizers. The creation of an accurate diagnostic tool could relieve some of that pressure for healthcare providers.

Dr. Christoph Lee, a radiologist at Seattle Cancer Care Alliance, agreed.

Justin Guinney, director of Computational Oncology and Data Science at Sage BioNetworks, a Seattle nonprofit.

Lee reviews hundreds of mammograms every day, he said. “I’m human. My eyes can pick up the majority of things, but I’m not perfect.” A tool using deep machine learning could be a great complement to his experience and expertise.

Mammography radiologists already have a computerized tool to aid their work — called computer-aided detection, or CAD — but its performance over the past 14 years has been disappointing. About one year ago a journal article by Group Health Research Institute’s Buist and others showed that CAD didn’t improve accuracy in breast cancer detection. In fact, it sometimes resulted in radiologists missing cancers, perhaps due to their reliance on the technology.

“We realized that CAD was leading to a lot more false positives,” Lee said. At this point, “it is fairly discredited.”

Out of 1,000 women screened using mammograms in the U.S., about 100 are recalled for another mammogram, biopsy or other test. The number of women ultimately found to have breast cancer is five out of 1,000.

The DREAM challenge “is completely different” from the CAD tool, Lee explained. “It’s not a static tool. It’s changing and getting better every time the computer sees another mammogram.”

The de-identified images and patient information used in the challenge come from Group Health through the Breast Cancer Surveillance Consortium, which is part of the National Institutes of Health, and the Icahn School of Medicine at Mount Sinai in New York.

Eric Fogel, CEO of Radish Medical Solutions, a Seattle startup developing health diagnostic tools.

For the most part, the contest participants don’t themselves receive the images for testing, but rather they submit their algorithm and contest organizers run it against the dataset. In the first month of the contest, participants are able to test their models and work out major kinks.

On Oct. 4, the competition begins, with winners and prizes awarded every four weeks. The final round of the contest begins in February and ends in March. Teams are able to continue fine-tuning their models during the contest.

After the competitive phase ends, select teams will be invited to join the community phase, in which they’ll work together to develop a final product. The prizes will be larger in the community phase, Sage’s Guinney said, to encourage collaboration that will lead to an algorithm that can be turned into a product suitable for clinical use.

Sponsors of the effort include the Laura and John Arnold Foundation and the White House Office of Science and Technology Policy. The challenge was highlighted this summer as part of President Obama and Vice President Joe Biden’s Cancer Moonshot, an effort to dramatically speed up advances in cancer diagnosis and treatment.

Dr. Christoph Lee, a radiologist at Seattle Cancer Care Alliance.

Guinney said this project is unusual compared to other DREAM challenges. While past contests have focused more on genomics and research, this challenge has a clear clinical application for doctors and their patients. And this contest utilizes an unusually gigantic database that’s packed with images.

Amazon Web Services and IBM are donating cloud computing time to the effort.

In the spirit of open science, the contest requires that participants publish the results of their algorithms and make their models transparent and reproducible. The idea is not to create a secret, proprietary solution.

No one knows how quickly the project could transition from the contest into the doctor’s office. But there is an FDA employee on the organizing committee who’s providing advice to ensure that the project is robust and structured in a way that would help it clear regulatory hurdles.

Organizers are hopeful.

“There are some great candidates — some large labs and large global companies who are competing,” Lee said. “It will be exciting to see how this plays out.”

Most Popular on GeekWire

Job Listings on GeekWork

Related Stories

Most Popular on GeekWire

Job Listings on GeekWork