Biologists take care when synthesizing DNA to avoid creating or spreading a harmful stretch of genetic code that may be utilized to make a poison or, worse, an infectious illness. However, one group of biohackers has showed how DNA can convey an unanticipated threat—one designed to infect computers rather than humans or animals.
A group of researchers from the University of Washington has demonstrated for the first time that malicious software can be encoded into physical strands of DNA, so that when a gene sequencer analyzes it, the resulting data becomes a program that corrupts gene-sequencing software and takes control of the underlying system.
While such an attack is unlikely to be carried out by a real spy or criminal, the researchers believe it will grow more likely as DNA sequencing gets more popular, sophisticated, and executed by third-party services on sensitive computer systems. It also represents an outstanding sci-fi feat of hacker creativity, which is arguably more relevant to the cybersecurity community.
When comparing the technique to traditional hacker attacks that package malicious code in web pages or an email attachment, Tadayoshi Kohno, the University of Washington computer science professor who led the project, says, “We know that if an adversary has control over the data a computer is processing, it can potentially take over that computer. That means when you’re looking at the security of computational biology systems, you’re not only thinking about the network connectivity and the USB drive and the user at the keyboard but also the information stored in the DNA they’re sequencing. It’s about considering a different class of threat.”
A Science-Fiction Hack
For the time being, that threat seems more like something out of a Michael Crichton novel than anything that should worry computational biologists. That DNA-borne malware tactic becomes slightly more possible as genetic sequencing is increasingly handled by centralized services—often administered by academic labs that possess the expensive gene sequencing equipment. Especially when the DNA samples are from unknown origins that may be difficult to verify.
If the hackers succeed, the researchers believe they may get access to valuable intellectual property or taint genetic data, such as criminal DNA testing. Companies may even embed malicious programming in the DNA of genetically engineered products to protect trade secrets. “There are a lot of interesting—or threatening may be a better word—applications of this coming in the future,” says Peter Ney, a project researcher.
Regardless of the research’s practical purpose, the idea of creating a computer attack—known as an “exploit”—using nothing but information stored in a strand of DNA provided a massive hacker challenge for the University of Washington team. The researchers began by designing a well-known exploit known as a “buffer overflow,” which is designed to fill the space in a computer’s memory designated for a specific piece of data and then spill out into another portion of the memory to plant harmful commands.
However, encoding that attack in actual DNA proved to be more difficult than scientists had anticipated. DNA sequencers function by combining DNA with chemicals that attach to DNA’s basic units of code—the chemical bases A, T, G, and C—in different ways and produce distinct colors of light, which are captured in a snapshot of the DNA molecules.
The photos of millions of bases are divided into thousands of chunks and evaluated in parallel to speed up the processing. To improve the possibility that their attack would remain intact during the sequencer’s concurrent processing, all of the data that made up their attack had to fit into just a few hundred of those bases.
The researchers discovered that DNA has other physical limits when they transmitted their carefully prepared attack to the DNA manufacturing service Integrated DNA Technologies in the form of As, Ts, Gs, and Cs. Because the inherent stability of DNA is dependent on a regular proportion of A-T and G-C pairs, they had to maintain a precise ratio of Gs and Cs to As and Ts in order for their DNA sample to remain stable.
While a buffer overflow is often caused by repeatedly using the same strings of data, doing so in this case caused the DNA strand to fold in on itself. All of this meant that the group had to repeatedly rewrite their exploit code in order to find a form that could also survive as actual DNA, which the synthesis service would eventually send them in the mail in a finger-sized plastic vial.
The result, finally, was a piece of attack software that could survive the translation from physical DNA to the digital format, known as FASTQ, that’s used to store the DNA sequence. When that FASTQ file is compressed with fqzcomp, a common compression program—FASTQ files are often compressed because they can stretch to gigabytes of text—it uses its buffer overflow exploit to break out of the program and into the memory of the computer running the software, allowing it to run its own arbitrary commands.
A Remote Threat
Even so, the attack was only fully translated approximately 37% of the time because the sequencer’s concurrent processing often cut it short or the software decoded it backward, which is another risk of writing code in a physical item. (While a strand of DNA can be sequenced in either manner, coding can only be read in one direction.) In their article, the researchers speculate that future, enhanced versions of the attack could be encoded as a palindrome.)
Despite the time-consuming and unreliable method, the researchers confess that they had to take some drastic shortcuts in their proof-of-concept that bordered on cheating. Rather than exploiting an existing fault in the fqzcomp program, as real-world hackers do, they altered the open-source code to provide their own bug, allowing the buffer overflow.
Aside from crafting the DNA attack code to target their intentionally vulnerable version of fqzcomp, the researchers also conducted a review of commonly used DNA sequencing software and discovered three true buffer overflow flaws.
Ney notes that “A lot of this software wasn’t written with security in mind.” This indicates that a future hacker would be able to carry out the assault in a more realistic situation, especially as more powerful gene sequencers begin processing greater pieces of data, perhaps preserving an exploit’s code better.
Needless to say, any DNA-based hacking is a long way off. In a statement in response to the University of Washington paper, Illumina, the major manufacturer of gene-sequencing equipment, stated as much. We agree with the premise of the study that this does not pose an imminent threat and is not a typical cyber security capability,” writes Jason Callahan, the company’s chief information security officer. “We are vigilant and routinely evaluate the safeguards in place for our software and instruments. We welcome any studies that create a dialogue around a broad future framework and guidelines to ensure security and privacy in DNA synthesis, sequencing, and processing.”
But, aside from hacking, Seth Shipman, a member of a Harvard team that recently encoded a video in a DNA sample, thinks that using DNA to handle computer information is progressively becoming a reality. Due to DNA’s capacity to preserve its structure considerably longer than magnetic encoding in flash memory or on a hard drive, that storage mechanism, while purely theoretical for now, could one day allow data to be retained for hundreds of years. And, he adds, if DNA-based computer storage is on the way, DNA-based computer attacks may not be so distant.