Evolution of coronavirus in U.S.
A phylogenetic tree tracks the evolution of SARS-CoV-2, the virus that causes COVID-19, as it spread throughout the United States. An orange dot at lower left indicates WA-1, the first confirmed case in the U.S., which was detected in Washington state. (Nextstrain / GISAID Graphic)

From the early days of the coronavirus pandemic, genetic sleuths have been at the forefront in the global effort to monitor SARS-CoV-2, the virus that causes COVID-19. By comparing the molecular fingerprints of different virus samples collected in Washington state, they were able to track down the first signs of community spread in the U.S.

In a paper published today by Nature Medicine, some of the pioneers of genomic epidemiology have laid out a 10-point plan for creating a well-supported scientific ecosystem — not only to fight COVID-19, but to head off future pandemics as well.

“COVID really accelerated the pace at which this work was happening,” said Trevor Bedford, a computational biologist at Seattle’s Fred Hutchinson Cancer Research Center who’s a co-author of the plan. “Now the community is trying to figure out how to buckle down for the long haul.”

Genomic epidemiology didn’t start with COVID-19: The effort to trace how the virus is spreading and evolving is taking advantage of tools and networks that were created to address other diseases, ranging from polio to the Zika virus.

Bedford and his colleagues were able to track the course of the pandemic in the U.S. and other countries thanks to genomes that were sequenced for pre-existing projects such as the Seattle Flu Study and Nextstrain.

The roots of today’s call to action go back more than a year, to a meeting of bioinformatics experts and public health officials that was convened at the Bill & Melinda Gates Foundation’s Seattle headquarters. That meeting led to the creation of the Public Health Alliance for Genomic Epidemiology, or PHA4GE (pronounced like “phage”), a good two months before COVID-19 emerged in China.

Over the past six months, more than 450 labs around the world have contributing 45,000 SARS-CoV-2 genomes to the GISAID data bank. Tracing the similarities and differences in those genomes can reveal the evolutionary relationships between them, leading to detailed maps of viral flow.

Much has been learned along the way: For example, the initial analysis suggested that the coronavirus had spread from one person to as many as 1,500 people in Washington state by late February, when experts confirmed community transmission. But a later analysis, based on additional genomic data, favored a scenario with multiple routes of transmission instead.

Bedford said that shift in scientists’ view of the pandemic’s origins highlighted the importance of being able to share data easily and quickly.

“Basically, none of this stuff works if everything is siloed,” he said. “If you only sequence and don’t share your data for Washington state, you’re going to have no idea how it connects with the rest of the world. … Your ability to resolve the epidemiological story really depends upon being able to connect up different dots.”

Another challenge relates to the software that’s been created to analyze genomic data for epidemiological purposes. “It’s not too well-maintained or well-documented, and it’s hard to use,” said principal study author Allison Black, an epidemiologist at Fred Hutch and the University of Washington.

Black conducted dozens of interviews with epidemiologists around the world to get a sense of the challenges they were facing, and what she heard helped shape the 10 recommendations listed in the paper published today:

  • Support data hygiene and interoperability by developing and adopting a consistent data model.
  • Strengthen application programming interfaces.
  • Develop guidelines for management and stewardship of genomic data.
  • Make bioinformatics pipelines fully open-sources and broadly accessible.
  • Develop modular pipelines for data visualization and exploration.
  • Improve the reproducibility of bioinformatics analysis.
  • Utilize cloud computing to improve the scalability and accessibility of bioinformatics analyses.
  • Support new infrastructure and software development demands with an expanded technical workforce.
  • Improve the integration of genomic epidemiology with traditional epidemiology.
  • Develop best practices to support open data sharing.

Black acknowledged that finding the resources to develop standardized analytical software and provide cloud-computing firepower is another big challenge. “The traditional funding mechanisms within academia don’t really incentivize that work, so we’re going to need new funding mechanisms to incentivize this work and build the ecosystem,” she said. “I can’t speak to exactly what that looks like right now.”

One possibility might be to enlist the tech industry: In March, for example, the White House Office of Science and Technology Policy assembled a COVID-19 High-Performance Computing Consortium that brought together Amazon, Microsoft, Google and other tech powerhouses to support big-data research projects related to the pandemic.

Another study co-author, Duncan MacCannell of the Centers for Disease Control and Prevention, emphasized the need to train researchers.

“Bringing a bioinformatic workforce into public health is an enormous challenge,” he said. “In doing so, you’re competing against academia, you’re competing against the many aspects of the private sector, biotech industry, the pharmaceutical industry.”

Coronavirus Live Updates: The latest COVID-19 developments in Seattle and the world of tech

Despite the competition, MacCannell said the CDC’s fellowship program is generally able to hold its own.

“Typically, somewhere around 70% of our graduates stay in some sort of public health career, either at the federal or state level,” he said. “We’re not able to necessarily compete on the basis of salary, but we are able to compete on the basis of fascinating and rewarding questions. And you really feel like you’re making an impact.”

Looking ahead, Bedford said genomic epidemiologists could work hand in hand with traditional epidemiologists to answer some of the crucial questions that have come up just in the past few days. For example, is Arizona’s renewed outbreak primarily due to the state’s reopening, or due to infected visitors reintroducing the virus?

“Genomics could help with that kind of detailed understanding of what’s really driving epidemic spread,” Bedford said. And those insights, in turn, could point to the best policies for curbing a resurging epidemic.

The researchers acknowledge that their 10-point plan has lots of blank spaces yet to be filled in. “I see this paper as a starting point for discussion,” Black said. “I don’t think we’re going to end up with the dream ecosystem overnight. We’re going to find a lot more out … and iterate on it.”

But Bedford said it’s high time to have a plan to work with.

“We’re at this critical moment here,” he said. “COVID has really accelerated things. Everyone is running at this thing so quickly that having this coordination and cat-herding is really necessary.”

In addition to Black, MacCannell and Bedford, Fred Hutch researcher Thomas Sibley is a co-author of the paper published by Nature Medicine, titled “Ten Recommendations for Supporting Open Pathogen Genomic Analysis in Public Health.”

Like what you're reading? Subscribe to GeekWire's free newsletters to catch every headline

Job Listings on GeekWork

Find more jobs on GeekWork. Employers, post a job here.