Digital DNA: Hacking DNA Databases – An Interview with Peter Ney
The Digital DNA series will explore the role of large-scale genetic testing in science, industry and society. We aim to understand both the benefits and risks of this emerging technology and see what the future may hold.
In the first of our Digital DNA series, we talked to Dr. Peter Ney, a researcher in the Tadayoshi Kohno Group at the University of Washington. He describes how DNA database systems can be infiltrated and why companies should be tightening their security against cyber-attacks on genetic data.
FLG: What Got Your Research Group Interested the Cyber-security of Genetic Database Systems?
PN: We have entered a new era where DNA and other biological molecules are not only used for their original biological properties, but also for information storage. There’s a lot of work being done to see how you can store digital data inside DNA and research groups are using molecular information in creative and new ways. That made us realise that DNA sequencing and other forms of DNA processing are basically a conversion between two different types of information. You start with information in the form of the DNA bases in A, C, T and Gs, you run it through a DNA sequencer and then it outputs digital files. My background is on the computer security side, and we’ve seen historically that any time digital data comes into a computer system there’s the possibility that that data could be compromised. We realised the same thing is happening with the DNA analysis pipeline.
FLG: How Did Your Group Successfully Compromise a Computer System using Synthetic DNA?
PN: As the physical DNA is ending up as digital data, we wanted to see whether it was possible to put computer code in DNA.
We designed a simple computer code that could compromise the system. We converted the code into a DNA sequence using synthetic DNA. You can easily order synthetic DNA with the sequence you want from many companies. When the DNA was sequenced by the sequencing machine it produced the digital data, which contained the malware code we wrote. This gave us the ability to complete a remote attack and access the computer. We were looking at this from a proof-of-concept level to try and understand the different problems and challenges you might run into if you tried to do this in the real world.
As the entire thing was so new, there were lots of hard questions to figure out. You can’t order any DNA you want because of the physical properties of DNA, so putting computer code into DNA bases creates a lot of difficult-to-synthesize DNA. We spent a lot of time figuring out how we could put computer code into DNA that would function as a piece of malware.
FLG: Are There Any Other Ways in which DNA Analysis Software is Vulnerable to Cyber-Attacks?
PN: Absolutely yes! I would say the study we did was the hardest possible way to do this. In one sense our study was very future-looking as we were trying to see what vulnerabilities exist as DNA processing gets more common. But there’s lots of more standard ways in which programmes on computers can become vulnerable and compromised. For example, if they are plugged into the internet and process network data. And it’s not just bioinformatics data; it’s the instruments themselves. Things like DNA sequencers and other instruments are all computers. So, you have this complicated pipeline with the possibility of malware anywhere along that path.
If people are downloading sequencing data files and processing them on their programmes then it’s possible for those files to contain malware, which doesn’t have to originate from physical DNA. Anywhere inputted data is being sent to different programmes, there’s the possibility for hackers to compromise the computers.
FLG: Can You Explain How Information Leakage Can Create Problems for Genetic Database Systems?
PN: Information leakage is an unintended side effect of processing data, where leaks can sometimes occur and reveal personal information to third parties.
We were focused on how multiple DNA samples are sequenced together on the same sequencing machine at the same time. This can sometimes lead to DNA from different samples mixing back and forth, giving the possibility that the sensitive information could flow either way.
There was a research group not too long ago who published a paper showing that you could use the sounds a DNA synthesis machine makes when it’s synthesizing DNA to tell what sequence is being made. If you had a microphone in the same room when the DNA synthesis process was happening, you could infer the sequence of the DNA, which could contain sensitive or proprietary information. That would be an example of “side channel access”, where you are using something unexpected like sound to infer something that you wouldn’t normally have access to.
FLG: What Could be the Impact on Patients if their Genetic Data is Intercepted?
PN: If any of the computers processing sensitive patient information are compromised or have vulnerabilities, there’s the risk that remote attackers could steal any data that’s been processed by that machine.
Other than theft there is also the risk of data manipulation. This could result in data that says patients have a medical condition when they don’t or vice versa.
FLG: How Would Data Breaches Affect Scientists and Companies?
PN: Lots of companies doing drug development or other research have DNA and other valuable biological data. I would not be shocked if companies were targeted for their intellectual property. Corporate espionage could pose a problem. Data security is a practical concern that companies definitely should consider.
FLG: What Steps Should Scientists and Companies Take to Safeguard their Data?
PN: It depends on the specific ways each individual company is processing their data. The first step is being aware that these risks are possible so you can start thinking about security. The software doing the DNA analysis should be rigorously tested. Traditional computer security concerns, such as secure networks, strong passwords and up to date software are also important. Usually, the simple things are where a lot of the attacks come from. Having good standard security practices and taking security seriously is the first step.
FLG: What are you and the Research Group Working on Now?
PN: The research group and I are interested in understanding the role that computers are playing in biotechnology and what impact this has on the industry. We have seen in other technologies that when computer systems start being put into systems that formerly didn’t have many computers new security problems come up.
For example, can a car be hacked? A few years ago, there weren’t many computer systems in cars, so the possibility wasn’t there. However, now automobile cyber security is a well-known problem.
The same thing is now happening in biotechnology, and we are interested in studying whether this creates new cyber security risks. We have also started looking into the security practices of the consumer genetic testing industry.
Our Digital DNA series will continue soon with more interviews and articles exploring the benefits, risks and potential of genetic testing. If you or your company would like to contribute to the Digital DNA series, please email email@example.com
Frontline Genomics are delighted to launch our latest events taking place in Basel and Cambridge, MA this October. D4 (Data-Driven Drug Development) Europe and USA are the only events where attendees will receive data, evidence and case studies from the world’s leading minds in pharma. Find out more here:
D4 Europe – www.d4-europe.com
D4 USA – www.d4-pharma.com