We talked to Laxmi Parida, Fellow and Computational Genomics Research Lead at IBM, to discuss her recent ground-breaking research paper on an AI algorithm that can distinguish blood cancer subtypes from dark matter DNA.

Personalised Medicine for Cancer Patients

The future of personalised medicine relies on quick and non-invasive diagnostics. Simple blood tests for tumour DNA are transforming cancer care, making the need invasive and surgical tissue sampling obsolete.

‘Lots of tumour tissues shed their cells into the blood so, their cell DNA can also pass into the blood. Tumour DNA can therefore be isolated and collected from the blood and used in diagnostics.’

Although tumour DNA analysis can be used to diagnose which type cancer a patient has, it is still difficult to establish subtypes within a disease. Establishing patient disease subtype allows more tailored treatment based on their disease profile. For example, if a patient’s tumour is found to be expressing a certain receptor then they can be assigned a treatment drug that targets it.

Probing the Dark DNA

Current tumour DNA analysis has mostly only focused on the 2% of the genome that contains the genes that code for every single protein in the body. However, patients with the same disease type but different subtypes cannot be distinguished neatly by looking at the coding genome alone. Laxmi and her colleagues wondered if the key to diagnosing disease subtypes could be found in Dark DNA.

‘Dark DNA is the part of the DNA that we – the scientific community – know nothing about. Dark DNA are not genes, they are not transcription factors and they are not binding sites; they are nothing that we know about. There are no descriptions about this type of DNA across the entire research community.’

The possible role of Dark DNA in cancer has previously been ignored. The genetics that drive cancer were presumed to only lie in the coding regions of the genome, also known as the exome. However, recent research has countered this view, suggesting that mutations in DNA in both the coding and non-coding regions can influence cancer progression.

Artificial Intelligence as a Diagnostic Tool

Laxmi and her team wanted to use Artificial Intelligence to probe Dark DNA and investigate if it has the potential to distinguish between blood cancer subtypes.

‘And then we ask the question – we take this DNA that is largely considered useless because no one knows anything, knows how to probe them, or is trying to spend time understanding it. And we want to see if using AI as a mathematical tool can classify the subtle blood cancer types that we are dealing with here.’

In the paper, the researchers wanted to only apply the AI algorithm to the Dark DNA. Laxmi describes how they determined what part of the genome they classified as Dark DNA.

‘In this paper we were extremely conservative about what part of the genome we classified as dark DNA. We scoured the literature and databases and if anything at all was known about a section of DNA in any papers we classified that DNA section as non-dark. We therefore retained only the parts of the genome that we have absolutely no knowledge about; and that we call the Dark DNA.’

Laxmi and her team designed an algorithm, known as ReVeaL, that uses machine learning to distinguish blood cancer subtypes using only the Dark DNA of the genome. The ReVeaL algorithm works by capturing the mutational load distributions for subsets of the test patient’s genome. It then compares the genomic features of the test patient’s genomes to genomes of known blood cancer subtypes to assign the patient’s disease subtype.

Laxmi described the challenges of developing the algorithm;

‘One of the main challenges of developing the algorithm is that the disease is very heterogenous. What that means is that even when you take multiple patients with the same disease type, or the same phenotype, their DNA does not look the same. But there must be some commonality because the overall phenotype of the disease is the same. So, we had to take this heterogeneity of patients into account when training the AI.’

Dark DNA Shouldn’t be Ignored

‘Using the AI algorithm only on the dark DNA we were able to classify these subtypes of haematological cancer with an accuracy of 75%. The conclusion we get from this result is that dark DNA is probably useful and we can use it to classify different cancer types.’

The information found in the Dark DNA may therefore play a role in cancer development and determining disease subtype. Laxmi now wants to use the algorithm to determine if Dark DNA plays a role in other cancers, but she will require more data to investigate this.

‘Data is always a bottleneck – so we need full genome data from other cancer types to test the algorithm on them.’

She also wants to collaborate with biologists to ‘look at the specific dark region segments in the DNA that we think are involved in the disease.’

Frontline Genomics are delighted to launch our latest events taking place in Basel and Cambridge, MA this October. D4 (Data-Driven Drug Development) Europe and USA are the only events where attendees will receive data, evidence and case studies from the world’s leading minds in pharma. Find out more here:

D4 Europe – www.d4-europe.com

D4 USA – www.d4-pharma.com