Computer scientists at the Carnegie Mellon University in Pittsburgh have developed a digital method to transform massive amounts of gene expression data into something more image-like. Published in the Proceedings of the National Academy of Science, the scientists utilised an incredibly powerful deep learning method that has revolutionised methods such as facial recognition in recent years.

The design eases the identification of disease-related genes and developmental or genetic pathways that could be potential targets for drugs.

Convolutional Neural Networks (CNNs) can be a tricky topic to understand if you’re not a computer scientist. The system was initially inspired by biological processes, specifically by the connectivity pattern of neurons in the eyes. The network consists of an input and an output layer, as well as multiple “hidden” layers. In these “hidden” layers is where the magic happens, and the output layer is what we see.

In science, CNNs can be used to analyse which genes are interacting with each other and change this data into something that can be visualised and easier to understand. The researchers developed the convolutional neural network for co-expression (CNNC) to analyse these gene-gene relationships.

The new CNNC improves upon prior methods and significantly outperforms them all. ~20,000 genes in humans work in concert, so understanding how these genes work together is important to understand the development of disease. If two genes are active at the same time then it could be a clue that they are interacting or both activated by another gene – or a coincidence. The CNNC can also understand causality inferences, functional assignments, and disease gene predictions.

Using single-cell expression data, Yuan and Bar-Joseph determined the expression of every gene in a single cell, resulting in hundreds of thousands of genes being arranged in a matrix or histogram, so that each cell of the matrix represented a different level of co-expression for a pair of genes. This made the data more image-like. With known gene-gene interaction data, the scientists could train the CNNs to recognise which genes were interacting based on the visual patterns in the data matrix.

This powerful tool works well for genetics as there’s a huge amount of data within our genomes that could now become easier to visualise and understand, and aid in finding potential drug targets to manage disease.