At Fred Hutchinson Cancer Research Center, I created and now lead the Hutch Data Commonwealth. Think of it as the place where we tether the amazing science — which has always defined us — with the accelerating forces of computing and data science in our region. It all ties together in the cloud.
Two years ago, I moved to the state of Washington from the Commonwealth of Massachusetts. Once in charge of computing at the Broad Institute of MIT and Harvard in Cambridge, I now do the same for Fred Hutch in Seattle. Both are stellar biomedical research sites, but what drew me to the Pacific Northwest was its wealth of expertise in cloud computing and data science.
The Hutch Data Commonwealth embraces a concept that is taking root: the idea of the cloud as the foundation of a scientific data commons. The cloud becomes the place where we convene and collaborate, a place where everybody comes together for the common good.
A commons employs a trio of tools for big data research: applied data science, advanced computational techniques and the cloud. In our Commonwealth, a remarkable Fred Hutch group called the Hutchinson Institute for Cancer Outcomes Research, or HICOR, is employing this trio to analyze health records — pooled from more than a dozen providers and insurers throughout the state — to guide medical decision makers toward best practices and lower costs.
At the Hutch, we increasingly work across scientific disciplines on projects with a growing appetite for data. Soon, we believe that a data-sharing culture eventually will bridge institutional borders as well. We are already building partnerships with external collaborators — with the Amazons, Microsofts and others working on data at a scale required to answer the kind of questions our researchers are asking. We want to hear from others.
A move to collaboration in the cloud is virtually inevitable, because nobody in cancer science has enough data on their own. The University of Washington, for example, tracks medical data on more than 4 million lives, including 400,000 cancer patients. But pick a given disease, and those numbers start to get very small. It becomes very difficult to have enough statistical power to answer our questions.
A wider dragnet for data is needed, but datasets themselves are becoming so large that the simple act of moving information becomes more difficult than the computation itself. The idea of a data commons is to leave it all in one place, in the cloud, and invite people to come and work on it, using common tools. This setup can place powerful resources at the fingertips of all researchers.
The cloud was built by retailers and social networking pioneers. They have learned how to use big data to drive their businesses. We can do something similar in the life sciences and health care. We are already seeing data sharing consortia in the cancer world. The ORIEN Network, originally built out by Moffitt Cancer Center and Merck, now has 16 cancer centers participating. The American Society of Clinical Oncology formed CancerLinQ, focused on other aspects of biomedical data; and Project GENIE, funded by the American Association for Cancer Research, also has people coming together to share their data.
Our soaring ability to generate data that has relevance to biomedicine includes traditional molecular and genomic data, but also consumer data, wearable data, and increasingly data from image-analyses of X-rays, MRIs and pathology slides. To wrangle all this data and sort through it for patterns, we need the common ground and vast capacity of the cloud.
This is a transformative time in our quest to cure cancer faster, and at Fred Hutch, we are making that leap to the cloud. We will not only speed the pace of our experiments, we can vastly expand the scope of what is possible.