Researchers at the University of Washington and Microsoft are developing a digital storage system that can archive data in DNA molecules, with the random-access readability and error correction protocols that’d be required for real-world applications.
Once they’ve overcome those hurdles, they just have to figure out how to make the technology affordable. Eventually, such research could help open the way for data storage devices that can pack information millions of times more tightly than current silicon-based methods.
“Life has produced this fantastic molecule called DNA that efficiently stores all kinds of information about your genes and how a living system works — it’s very, very compact and very durable,” Luis Ceze, UW associate professor of computer science and engineering, said in a news release. “We’re essentially repurposing it to store digital data — pictures, videos, documents — in a manageable way for hundreds or thousands of years.”
Ceze and his colleagues describe their work in a paper being presented this week in Atlanta at the ACM International Conference on Architectural Support for Programming Languages and Operating Systems, or ASPLOS.
In a series of experiments, the researchers encoded the digital data from image files into the molecular sequences of synthetic DNA. The “letters” of DNA code — adenine, thymine, cytosine and guanine, or A-T-C-G — stood in for the 1’s and 0’s of a computer’s binary code. The images included a 5-kilobyte picture of a smiling monkey, a 12KB cat picture and a 24KB scene from Australia’s Sydney Harbor.
The team inserted file-tagging snippets of DNA that allowed them to retrieve the data with a widely used technology known as polymerase chain reaction, or PCR. That way, the researchers didn’t have to read out the entire set of DNA coding. Instead, they could zero in on the information they were looking for, much like random-access memory in a computer.
The researchers also used an error-correction method that took advantage of exclusive-or logic, or XOR. As a result, the images could be reconstructed without losing a single byte of information.
Another experiment involved encoding and retrieving data to authenticate archival video files from UW’s “Voices From the Rwanda Tribunal” project.
For more than a decade, researchers around the world have been investigating the use of DNA for data storage. One team encoded a 53,000-word book and associated images in DNA molecules in 2012. Another team did something similar with William Shakespeare’s sonnets the following year.
In an email, Ceze told GeekWire that he and his colleagues advanced the state of the art by demonstrating a method for random access, describing an end-to-end system and analyzing the sources of DNA read-write errors in detail.
The cost of synthesizing DNA, which easily runs into thousands of dollars, makes it unlikely that the technology will pose a threat to thumb drives anytime soon. But DNA-based storage systems are still worth pursuing because so much information can be stored in so small a space, and because DNA molecules can remain intact for hundreds or thousands of years under the right conditions.
“We are actively working towards building an actual system and making progress,” Ceze said.
In addition to Ceze, the authors of “A DNA-Based Archival Storage System” include James Bornholt, Randolph Lopez and Georg Seelig of the University of Washington, and Douglas Carmean and Karin Strauss of Microsoft Research. The research was funded by Microsoft Research, the National Science Foundation and the David Notkin Endowed Graduate Fellowship.