Festival of Genomics 2019 – The Move Towards Multi-Omics
As a taste of what to expect at our Festival this year, we thought we’d showcase some of our best and brightest speakers and the fascinating topics they’ll be speaking on. Stay tuned for even more.
Still in its infancy, the study of multi-omics is a rapidly-growing area of research for both healthcare and pharma companies. With both our own festival and the wider life sciences sector moving rapidly towards a more integrative and holistic treatment of different -omics in research and drug development, we thought it’d be good to find out more. We got Dr Dennis Wang, Lecturer in Bioinformatics and Genomics Medicine at the NIHR Sheffield Biomedical Research Centre, to talk us through the shift towards multi-omics.
What is the Study of Multi-Omics? What are the Applications of This at a Practical Level?
Multi-omics is the integration of multiple levels of “-omics” data, encompassing any kind of high-throughput molecular data, for instance genomics, proteomics or transcriptomics. The main question in multi-omics analysis right now is how we integrate these different levels of data to describe complex biological systems, clinical phenotypes and patient outcomes, and how can we utilise this multi-level information to design personalised medicines. In clinical diagnostics, where particular mutations or genotypes which affect a patient are identified, we can use multi-omic techniques to validate that that genetic change has some functional impact on, say, the RNA or protein levels.
Why is the Life Sciences Sector (and Our Festival) Moving Towards Multi-Omics as a Standard?
When you profile biology, you cannot profile one aspect of a biological specimen by itself. The genome does not exist in isolation within an organism, but instead exists in a soup with other “-omes”, which can also be precisely measured (e.g. MRNA expression). Out of necessity, if you as a business or research organisation are wanting to understand how genetics influence clinical phenotypes, you have to consider all the changes in RNA, proteins, epigenetic factors, all the way through to pathways and the connectivity of the whole hierarchical system which determines the clinical phenotype.
Simply put, genetics is not the only factor that determines the patient’s phenotype.
How Far has Integration of Multi-Omics Progressed So Far Within Life Sciences Companies?
Currently we are only just starting out in the process towards full integration. Most of the recent studies have been linking genetics straight to the disease phenotypes, and not really incorporating disease-related changes in other molecular features, e.g. epigenomics, transcriptomics. Some of that is due to technical challenges of acquiring the right samples, and due to the inconsistencies or disagreements between measurements of these -omics datasets.
As we have seen through our cancer genomics research at the University of Sheffield, changes in DNA copy number of cancer cells do not directly correlate with changes in the RNA or protein expression in the same cells for the same genes. So that kind of fundamental understanding of how genetics is connected to RNA, to proteins in the central dogma and to aggregate biological systems, such as tissues, still needs to be investigated before we can truly harness and integrate these different -omics datasets.
Are There Different Approaches to Integrating Multi-Omics in Healthcare and Pharma? Are Any of These Approaches More Effective Than Others?
Currently there’s no standard best practice for multi-omics integration. There are a few different approaches to integration being developed. One type of approach involves using statistical modelling, where integrating genomic and transcriptomic data generally involves quantifying statistical relationships between RNA expression changes and genetic mutations. Using this approach often means that you have to make quite strong assumptions on the distributions of the molecular data you are working with, and how they statistically relate to each other.
Alternatively, studies have looked at this -omics data in a more mechanistic way. This begins by looking at known interactions between genes, which can be based on core expression of RNA or protein/protein interactions or pathways, but generally they come out of literature. At a general level, this approach takes a prior knowledge about how genes interact with each other and maps genetic/proteomic/transcriptomic changes onto that network at the pathway level, i.e. how these different -omic changes or aberrations can affect groups of genes. From there, it is usually easier to define a mechanistic relationship to a phenotype.
Those are the two main approaches, but it’s important to add that within these approaches there are a lot of different software and algorithmic options available. Likewise with the pathway approach, there is no one standard set of pathways one should use, but one must choose among the different pathway data sets available to the public for the most relevant.
Is There Any Reason to Use One of These Approaches More Than the Other?
It depends on what the approach is being used for. The statistical method is mainly used for diagnostics or academic studies more geared towards discovery of genetic linkages to diseases, because they are specifically looking for biomarkers. For biomarkers generally there is usually a statistical model or correlation metric, so the biomarker does not necessarily have to be mechanistic, and it does not have to be explained on a pathway level.
The pathway level, on the other hand, is used more in drug discovery or drug development. This is particularly true at early stages of the process, because when a company has a genetic linkage (usually to a drug target) they must provide mechanistic evidence for that linkage, so they need to not only explain that there’s a genetic association between a genotype and a phenotype, but also show mechanistically how that genetic variant might affect downstream RNA and the pathways to drive a certain disease phenotype. So I would say the pathway method is used slightly more in industry.
What Are the Main Problems Slowing or Preventing Full Multi-Omic Integration in Industry?
One of the biggest problems for a pharma company using a pathway approach is working out which pathway and a prior knowledge is best to use. Technology companies have developed pathway databases for people to search through and map multi-omic data to it. But what is the source of the data that the pathways are based on? Even from a published source, they may not be relevant to the particular disease or biological context of interest. Further compounding this problem is that a lot of the methods are based on non-disease cases, or on yeast or other variants. These largely are not relevant. In drug discovery, it is very important that the underlying data you use to integrate these different -omic datasets is disease relevant and context appropriate. That’s quite difficult to ensure.
How Has the Integration of Multi-Omics Affected Pharma and Healthcare Companies and the Work That They Do? What Steps Should Companies Take to Best Integrate Multi-Omics?
The integration of multi-omics has first and foremost impacted company personnel: the educational background and disciplines of research staff have become more varied, because now a genetic diagnostics company may not only need DNA sequencing specialists but also those with experience in RNA, protein, epigenetics, and other specialist areas.
Another impact is the volume of data. Companies are not only dealing with just one -omic dataset any more (-omics data here is any kind of molecular data generated from high throughput technology). Genomes are certainly large, but proteomics and imaging data can also be very large in terms of data sizes. So now a lot of industry has this big data issue of how do you manage and store all these large data files for every patient/sample they collect?
With that, there’s definitely an informatics challenge, and there could be new industries developed to deal with some of these issues. Right now we’re seeing more and more cloud providers who are offering multi-omics solutions to enable companies and researchers to analyse large datasets without having to buy additional computers – one can request computing resources on demand.
The final aspect of this is that the multi-omics integration has driven the pharma industry to become more data-driven in their business practices and build more data science teams, utilising machine learning and AI methods to interpret the multiple types of data to identify connectivity between them. Due to this, I expect to see more innovation in computational methods which would allow further multi-omics integration, and in turn generate new software tools and data scientists to apply them.
What are the three things a company must do to integrate well? First, build expert knowledge in each of those -omic datasets; build data infrastructure to store the different datasets; and make use of powerful software or machine learning tools available to connect the datasets together in a meaningful way.
What Work are You Yourself Currently Undertaking in Multi-Omics?
One example of the work we’re doing in the field of pharmacogenomics in relation to immunotherapies is using multi-omic data from cell lines to train machine learning methods to predict drug response in patients. The challenge of integrating multi-omic data between preclinical models and patients is important for translating new research finding to the clinic.
We believe a lot of the reasons behind drug responses or phenotypes observed in preclinical models like cell lines or animal models which are not similarly shown in patients is because there is a difference in the multi-omic profiles of preclinical models and patients. So we are using machine learning methods to understand the difference between preclinical models and patients, such as mutation burden and neoantigen load, to make better algorithms for predicting the success of drugs in the clinic.
Dr Wang will be giving a talk on “Addressing the Combinatorial Nightmare: Getting Answers on Large-Scale Multi-Omics Data Integration and Analysis” at 1pm on the 23th of January, in theatre 1.