Beyond Silicon: Using DNA As The Next Medium For Storage

Imagine if the samples stored in biobanks the size of football fields could be stored in someone’s desk and medicine could be mapped to a person’s unique molecular and genetic profile.

These are just some of the ways that using DNA as the next medium of storage is expected to change the world in the coming years. By 2040 DNA will likely be used to store specific biological information about the entire human race, offering unprecedented insights into the health of the world’s population, helping usher in the age of truly personalized medicine, says James Banal, co-founder of Silicon Valley startup Cache DNA, a spinout from Mark Bathe’s lab at MIT and a National Science Foundation (NSF) grant recipient. “The vision is clear to me,” says Banal. “We are going to be able to understand everything about the human body as well as every living organism, unlocking the keys to life.”

Inspired by nature’s information storage medium, Cache DNA, is using a proprietary molecular barcoding and encapsulation technology that allows DNA to be used as a scalable platform for handling nucleic acids critical to areas like biobanking, biopharmaceutical library assets and personalized medicine. The company’s scientific advisors include Mark Bathe, an American biological engineer working on DNA nanotechnology and co-founder of Cache DNA and Kano Therapeutics, Paul Blainey, an American technologist and serial entrepreneur, George Church, an American geneticist, molecular engineer, chemist, serial entrepreneur, and pioneer in personal genomics and synthetic biology, and Jeremiah A. Johnson, an American chemist working on synthetic polymers.

Cache DNA’s is one of the companies scheduled to speak on a panel about DNA storage at PUZZLE X, a global event about frontier tech that will take place in Barcelona November 15-17.  The idea of using DNA as a medium for storing digital information was first suggested in 1959 by the American physicist Richard Feynman, who was awarded the Nobel Prize in Physics in 1965. Breakthroughs announced in the last few months underscore the progress being made to turn the vision into reality.

Why DNA?

DNA has many advantages when it comes to storing data. For one, the same amount of information may be stored in a much smaller physical volume than is possible with conventional technologies. DNA is also very stable – it is so durable it can last centuries – making it suitable for long-term archiving. Using DNA to store data is also instinctual, since its main function in nature is to store the genetic information for all living organisms. What’s more DNA is an inherently environmentally friendly medium in terms of power, space, and sustainability, which will place significantly lower burdens than legacy storage technologies on the ecosystem.

DNA molecules are comprised of long chains of nucleotides. Each DNA nucleotide contains a base (adenine, guanine, cytosine, or thymine) that encodes the information cellular machinery uses to code and express the human genome.

Due to dramatic advances over the past few decades, it is now possible to construct a synthetic DNA molecule, base-by-base, with the bases strung together in any order.

By mapping the 1’s and 0’s of a digital object (e.g., file, image, etc.) onto the four DNA bases, synthesized DNA becomes an encoded version of the original digital data. When it’s time to read the data, DNA is sequenced, extracting the bases and decoding them back into 1’s and 0’s. With the introduction of parallelized chemical and enzymatic synthesis, the large-scale production of DNA for data storage has become viable. But several factors -including cost and complexity – have been holding the sector back.

For instance, today’s approaches for use in health-related applications are limited by the need for energy-intensive cooling to maintain nucleic acid integrity and large warehouses with expensive robotics to efficiently access samples. CacheDNA’s technology aims to maintain the integrity of nucleic acids for decades at room temperature using novel materials while simultaneously allowing search and retrieval operations. The built-in proprietary barcoding strategy enables labeling and retrieval operations similar to an Internet search engine, says Banal.

The NSF grant was to determine if the young startup’s technology could store DNA in the way amber naturally does it over the course of millions of years- but do it in a matter of minutes. “I am happy to say that ‘yes we can.’ This is real,” he says.

Placing All Of The World’s Intelligence In A Shoebox

Although it is currently focusing on handling nucleic acids, Cache DNA’s technology can also be used to create massive DNA-based file systems for archival data storage. The foundational papers from Mark Bathe’s lab at MIT demonstrated an alternative way to label DNA ‘files’ using DNA barcodes. Because of this unique barcoding system, they showed for the first time the ability to do search in DNA files, similar to the way  you would try to find things out using a search engine in Google.

“We asked if the Internet uses metadata to search efficiently, why don’t we just do the same for DNA data storage? That question haunted us for years with a lot of failures, but we figured out the kinks and demonstrated a unique file system dedicated for molecules,” says Banal. “But those demonstrations are just the tip of an iceberg, we have a few more up our sleeves.”.

A growing number of companies have entered the space. They have a big incentive. In 2020, humanity produced 45 zettabytes of digital data, according to CRNS, France’s national scientific research organization. This volume should amount to 175 Zb by 2025. Faced with this staggering increase, today’s storage media (optical, magnetic tape, and hard drives) appear to have reached their limits: they are fragile, with a life expectancy of 5-7 years; the energy-hungry data centres housing them now consume nearly 2% of global electricity production; and they take up space, with the ever-growing surface they cover now extending to 167kilometers  worldwide.

When stored on DNA, all of the world’s intelligence could be contained in a shoebox, according to CNRS. This technology thus represents a potential solution for what is known as cold data (approximately 70% of the material generated each year), which is rarely consulted but nevertheless invaluable, such as archives.

To put it into perspective, a single gram of DNA can hold over 215,000 terabytes of data – equivalent to storing 45 million DVDs combined,”Associate Professor Poh Chueh Loo from the College of Design and Engineering at the National University of Singapore, and the NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), said in a statement on the university’s site.

In a July 3 article in Nature Communications Poh and his team  unveiled a ‘biological camera’ which they say bypasses the constraints of current DNA storage methods, harnessing living cells and their inherent biological mechanisms to encode and store data. “This represents a significant breakthrough in encoding and storing images directly within DNA, creating a new model for information storage reminiscent of a digital camera,” says a statement by the university.

Despite its immense potential, current research in DNA storage focuses on synthesizing DNA strands outside the cells. This process is expensive and relies on complex instruments, which are also prone to errors.

To overcome this bottleneck, Poh and his team turned to live cells, which contain an abundance of DNA that can act as a ‘data bank’, circumventing the need to synthesize the genetic material externally. The team developed ‘BacCam’ – a novel system that merges various biological and digital techniques to emulate a digital camera’s functions using biological components.

“Imagine the DNA within a cell as an undeveloped photographic film,”  Poh explained in a university blog post. “Using optogenetics – a technique that controls the activity of cells with light akin to the shutter mechanism of a camera, we managed to capture ‘images’ by imprinting light signals onto the DNA ‘film’.”

Next, using barcoding techniques akin to photo labeling, the researchers marked the captured images for unique identification. Machine-learning algorithms were employed to organize, sort, and reconstruct the stored images. These constitute the ‘biological camera’, mirroring a digital camera’s data capture, storage, and retrieval processes. “As we push the boundaries of DNA data storage, there is an increasing interest in bridging the interface between biological and digital systems,” said Poh.

“Our method represents a major milestone in integrating biological systems with digital devices,” says Poh in the blog post. “By harnessing the power of DNA and optogenetic circuits, we have created the first ‘living digital camera,’ which offers a cost-effective and efficient approach to DNA data storage. Our work not only explores further applications of DNA data storage but also re-engineers existing data-capture technologies into a biological framework. We hope this will lay the groundwork for continued innovation in recording and storing information.”

In  a paper published in the May 4 issue of Nature nanotechnology, a team at Eindhoven University of Technology proposed a different technique  that promises to make DNA data storage practical, scalable, and highly efficient. To selectively retrieve data encoded in the DNA, a polymerase chain reaction (PCR) is used to create millions of copies of the required piece of DNA. The Eindhoven University team researchers led by Tom de Greefs used microreactors, the membranes of which have temperature-dependent permeabilities, to encapsulate the DNA and enhance the PCR process. The researchers say this method outperforms current DNA storage methods and provides a new approach for repeated random access to archived DNA files. The researchers anchor one DNA file per capsule. Above 50°C, the capsules seal themselves thanks to their reduced permeability. This allows the PCR process to take place separately in each capsule. Next, they lower the temperature to room temperature, which increases the capsule membrane’s permeability again and makes the file copies detach from the capsule. Importantly, since the original file remains anchored to a capsule, its quality does not deteriorate, in contrast to that observed in previous PCR-based DNA data storage techniques, according to the research team. Indeed, de Greef told physicsworld that losses currently stand at 0.3% after three reads compared with 35% for existing methods.

No Longer Science Fiction

Developments such as these demonstrate that “the science of manipulating DNA at a molecular level, which has been used in medical and other scientific applications for over 30 years, has progressed to where storing digital data in DNA is no longer science fiction, but an emerging reality,” engineer Dave Landsman, Western Digital’s senior director of industry standards and one of the principals in the company’s exploration of DNA data storage said in a company blog posting.

The DNA Data Storage Alliance was formed in 2020 by Western Digital, Microsoft, Illumina, and Twist Bioscience to create and promote an interoperable storage ecosystem based on DNA as a data storage medium.  Since the DNA can be preserved indefinitely, research organization Gartner sees archival storage of music, video, and statistical data as potential applications for DNA storage. Indeed, the video streaming industry has already produced a compelling of the emerging use of DNA for archival data storage. Twist Bioscience worked with Netflix in 2020 to demonstrate the feasibility of DNA for video preservation. Researchers at ETH Zurich encoded the first episode of the Netflix series Biohackers into DNA nucleotides, which it then synthesized into DNA strands using Twist Bioscience’s silicon platform. Raw, uncompressed 4K video runs about 250 MBps, which translates to 750 GB for a 50-minute episode.

The potential has been established, said Landsman. The challenge now is to scale the science. “This is what the tech industry excels at, and this is where the magic will be over the next 5-10 years,” he was quoted saying on the company blog.

Tackling Diversity And Ethical Challenges

The intersection of the demand to “digitize everything” combined with the ability to manipulate synthetic DNA offers an opportunity to create a new layer in the storage hierarchy that could radically change the scale of what we store and for how long we store it, notes the DNA Data Storage Alliance. Preserving the world’s digital legacy in turn opens possibilities to extract, and even create or discover, new knowledge. The ability of today’s biobanks, which require so-called freezer farms to keep nucleic acids at the right temperature, to store samples is limited. “In the future we won’t have to choose,” says Cache DNA’s Banal. “Diversity is key. Much of the genome information that has been collected pertains to Caucasians. We need to collect a diverse pool of samples, collect everything, and never throw samples away because we don’t know what we don’t know.”   He uses the case of COVID swabs as an example. Often people had negative results even though it turned out they had been infected. Because the swabs were thrown away it took researchers a long time to determine that variations of the virus had developed.

In addition to saving everything, “the idea is to have a diverse pool of samples that investigators can activate anytime so that medicine can finally be developed according to specific genomes,” he says. “Our mission is to store and retrieve information forever. The scary future is that we’ll have to face data shortages or data rationing and that only rich people will be able to buy storage. We don’t want that future. We don’t want ‘datageddon’ That is why we need alternative media.”

Along with progress comes ethical challenges, says Banal. “We need to make sure that business samples and sources materials in an ethical way.  We are now at a point where we can capture DNA from the environment. We need to recognize that and have discussions around informed consent because this will pose the same problems as digital data. The entire industry needs to discuss this early on.”

This article is content that would normally only be available to subscribers. Sign up for a four-week free trial to see what you have been missing.

To read more of The Innovator’s FutureScope articles click here.

FutureScope is a series of articles created by The Innovator in partnership with PuzzleX, a conference about future technologies. The articles give insights to both The Innovator’s subscribers and PUZZLE X‘s attendees on topics that will be tackled at the November conference in Barcelona.

About the author

Jennifer L. Schenker

Jennifer L. Schenker, an award-winning journalist, has been covering the global tech industry from Europe since 1985, working full-time, at various points in her career for the Wall Street Journal Europe, Time Magazine, International Herald Tribune, Red Herring and BusinessWeek. She is currently the editor-in-chief of The Innovator, an English-language global publication about the digital transformation of business. Jennifer was voted one of the 50 most inspiring women in technology in Europe in 2015 and 2016 and was named by Forbes Magazine in 2018 as one of the 30 women leaders disrupting tech in France. She has been a World Economic Forum Tech Pioneers judge for 20 years. She lives in Paris and has dual U.S. and French citizenship.