Pencil and test tube of DNA
The pink smear of DNA at the end of this test tube can store incredible amounts of digital data. (Credit: Tara Brown Photography / UW)

Computer scientists from Microsoft and the University of Washington say they’ve set a new standard for DNA storage of digital data – but they acknowledge that the standard won’t last long.

For now, the bar is set at 200 megabytes. That’s how much data the researchers were able to encode in synthetic DNA pairings, and then correctly read out again. The encoded files included a high-definition music video by the band OK Go, titled “This Too Shall Pass”… the Universal Declaration of Human Rights in more than 100 languages … the top 100 books from Project Gutenberg … and the Crop Trust’s global seed database.

But Karin Strauss, the principal Microsoft researcher on the project, acknowledges that so much more is theoretically possible.

“You could pack an exabyte of data in an inch cubed,” she told GeekWire. An exabyte is equal to 8 quintillion bits of information, which is much more information than is contained in the Library of Congress. (Exactly how much more? That’s a matter of debate.)

One thing is certain: The hunger for data storage is growing by leaps and bounds, in large part because of our appetite for data-heavy video and the rise of big-data applications. To answer that need, DNA strands can store data much more densely than silicon-based hard drives can. And if the DNA is kept in a cold, dry, protected place, the data could be preserved for centuries intact.

“It’s the ultimate backup media,” said project co-leader Luis Ceze, a UW computer science and engineering professor.

The concept adapts Mother Nature’s pairings of nucleobases in the DNA molecule: Adenine pairs with thymine, guanine pairs with cytosine. A string of such pairings, such as ATGGGGCCAGT, can serve the same function as the binary code of 1’s and 0’s used in a traditional data storage device.

There are challenges, of course: The researchers had to build in error-correction mechanisms as well as molecular markers that allowed for random access to the encoded files. And once the digital files were converted to ATGC code, they had to be turned into molecular strands by Twist Bioscience in San Francisco and sent back to UW-Microsoft team in Seattle.

“It’s essentially a test tube, and you can barely see what’s in it,” Strauss said in a Microsoft blog posting about the project. “It looks like a little bit of salt was dried in the bottom.”

The commercial cost of encoding a megabyte’s worth of data is on the order of a few thousand dollars, and it takes on the order of minutes to convert the files, Ceze said. But the researchers expect both the cost and the conversion time to drop dramatically.

“We’re going to create all sorts of incentives for writing DNA and reading DNA,” Ceze said.

Harvard geneticist George Church, who’s also developing DNA data storage systems, agrees that the field is rapidly changing. He and his research group recently reported error-free storage and retrieval of 22 megabytes of DNA data, and they’re aiming next for the gigabyte range.

“How far we are from doing this in a meaningful, optimally compressed form is not yet clear,” Church said in an email to GeekWire. But when the technology is ready for prime time, there’ll be a market for it.

Technicolor, for example, is interested in DNA data archiving for its large film library. “This, we believe, is what the future of movie archiving will look like,” the company’s vice president for research and innovation, Jean Bolot, said recently in Hollywood as he showed off a vial containing the DNA code for the 1902 silent classic “A Trip to the Moon.”

Previously: Researchers store cat video (and lots more digital data) in DNA molecules

Church said DNA storage technology would be well-suited for archiving vast video surveillance data sets. Ceze and Strauss’ list of potential applications also includes health records, research data and sensor readings from the rising “Internet of Things.”

In the dawning age of cloud computing, the end users may not know or care that their information is being stored in synthetic DNA molecules. They might notice only that their “cloud” was becoming much more commodious, measured in exabytes rather than mere terabytes.

There’s lots of work still to be done. “We’re really focused as a team on developing an end-to-end system,” Ceze said. That means figuring out more precisely how the mechanisms of DNA molecules work. Ceze compares the task to building a complicated Rube Goldberg machine, like the one featured in the OK Go music video that he and his colleagues encoded.

“If you look at DNA at the nanoscale,” he said, “it looks like an incredible but very reliable Rube Goldberg machine.”

Like what you're reading? Subscribe to GeekWire's free newsletters to catch every headline

Job Listings on GeekWork

Find more jobs on GeekWork. Employers, post a job here.