Once thought to be relevant only to rare disease and cancer, genomics is now thought to relate to virtually every disease humans face. According to the World Health Organization’s Genomic Resource Centre, “Most diseases involve many genes in complex interactions, in addition to environmental influences.” Whether through somatic mutations that drive individual cancers or germline genetics that reveal how to more effectively treat or prevent diseases, there is no doubt that improved understanding of human genetics has the potential to benefit every area of medicine.

Recent research has shown that therapies with human genetic support, rather than relying on mouse and other animal models or in vitro studies in petri dishes, are twice as likely to gain FDA approval. In 2018, over a dozen approvals were for drugs that have genetic support for their targets.

Genetic research, however, is a big data problem without researchers generally having access to this “big” genomic and medical data. Not only has it been nearly impossible to access enough relevant data to support research, but analysing the data has been slow and challenging due to its sheer volume. The scene is changing, however, with the recent proliferation of population-scale genomics projects and the advent of new technologies to analyse data.

In recent years a number of pharmaceutical companies have announced large-scale collaborations around genomic data to uncover novel drug targets, validate existing drug pipelines, predict response, and expand therapeutics use.

The narrative began in 2012 when Amgen purchased Iceland-based deCODE genetics for $415 million, gaining early access to genetic and drug target discoveries. From 1996 until that time, most of the 360,000 people in Iceland gave informed consent to allow deCODE to whole genome sequence them and to access their medical records, with encrypted personal identifiers. More projects followed suite, with deals made between drug developers and population-scale genomics projects (See table below).


Companies Involved

Population Genomics Project




deCODE genetics

$415 million; 300,000 whole genomes; All diseases in population




$60 million; DNA chip data focused on common markers, Sequenced patients with Parkinson’s Disease




DNA chip data with common markers, Longevity Genetics



Human Longevity

Exome sequencing of AZ clinical trial cohorts, Multiple Indications



Genomics Medicine Ireland (GMI); WuXi NextCODE

45,000 whole genomes covering seven indications in neurology, oncology and immunology


Regeneron; GSK

UK Biobank

50,000 exomes with self-report and general ICD codes for diseases and diagnostic tests.


AbbVie, AstraZeneca, Biogen, Celgene, Genentech, GSK, MSD, Pfizer, Sanofi


10,000 whole genomes and 500,000 DNA chip genotyped; Multiple Indications




$300 million; DNA chip data, Sequenced patients with common diseases



Flatiron Health

$1.9 billion, focused on oncology outcomes based on medical data only



Foundation Medicine

$2.4 billion, >100,000 tumors with panel sequencing of 400 known tumor genes. 


Regeneron, AbbVie, Alnylam, AstraZeneca, Biogen, and Pfizer

UK Biobank

450,000 exomes to be done in partnership with Regeneron with self-report and general ICD codes for diseases and diagnostic tests.



Genomics Medicine Ireland (GMI)

$400 million to whole genome sequence 400,000 Irish who consent to provide detailed medical data covering 60 diseases over next five years.

These large investments in population-scale genomics projects reveal an appetite among big pharma to better understand human disease biology and its genetic drivers, mitigating the failures that have plagued the market in recent years due to a lack of biological understanding of investigational therapies’ targets.

Early stage biotech startups are also getting in the game, looking at genomic big data prior to focusing in on specific therapeutic approaches or indications. Earlier this month Maze Therapeutics announced their $191 million series A launch focusing on “mining human genetic databases for genetic modifiers that reduce the impact of disease-causing mutations.” More details were provided in Nature Biotechnology. Also mentioned in the article were nine related population-scale human genomics projects involved in drug discovery collaborations, including UK Biobank, FinnGen, the NIH’s All of Us, and WuXi NextCODE’s subsidiary, Genomics Medicine Ireland (GMI).

Data from human genomic analysis projects has led to a number of advances in medicine. Statins found their origin in human genomic data, as did two cholesterol blockbusters targeting the PCSK9 pathway for more powerful treatment of high-LDL cholesterol in patients who do not respond to statins. The pain target, NAV1.7, undergoing drug development for non-opioid pain drugs, was discovered through gene mutations found in rare families with congenital pain insensitivity. Regeneron used a large cardiovascular disease population at Geisinger Healthcare System to discover another LDL cholesterol drug target, ANGPTL-3. Amgen, which continues to announce collaborations and programs around the deCODE whole genome sequence and medical data, discussed one such target for remnant cholesterol, ASGR1. ASGR1 is a discovery that could not have been made without the supporting whole-genome sequencing data accessed through deCODE’s database: exome sequencing would have missed this gene. Icelandic carriers of loss of function mutations had much lower levels of remnant cholesterol and a 34%-reduced likelihood of developing heart disease than the rest of the population—with no associated adverse health effects. These findings were reproduced when looking into data from 300,000 people from the Netherlands, Denmark, Germany, New Zealand, the UK, and the USA.

The deCode spin-off WuXi NextCODE, with its new acquisition of Genomics Medicine Ireland, announced $400 million investment from global investors including the Irish sovereign fund. WuXi NextCODE announced that GMI would whole genome sequence and collect detailed medical data on 400,000 Irish volunteers covering 60 major diseases, partnering with life sciences companies to analyse the data for insights. According to their website, the collaborations will focus on understanding the underlying biology of diseases and identifying driver genes and druggable targets within the disease pathways. AbbVie’s ongoing collaboration for whole genome sequencing 45,000 Irish volunteers across seven diseases is one example.

“Genomics is transforming the way we understand some of the world’s most devastating diseases and enabling the discovery of new approaches that have the potential to deliver much greater benefit to patients,” said Jim Sullivan, Ph.D., who was at the time vice president, pharmaceutical discovery, AbbVie. “This alliance is an important part of our research strategy and complements our significant footprint here in Ireland.”

We spoke with WuXi NextCODE’s CSO, Jeff Gulcher MD, PhD, about one of these projects to help us understand how the data can benefit research and drug R&D. Dr Gulcher explained that for meaningful discovery you must have data relating to the entire disease pathway, including genomics, epigenetics, disease classifications and phenotypes, and often blood or tissue specimens for single cell and deeper pathway analysis. In nonalcoholic steatohepatitis (NASH), WuXi NextCODE is building multiple patient cohorts across multiple countries, including one with several thousand NASH patients to compare high fibrosis NASH with fatty liver disease that does not usually lead to liver complications due to fibrosis. These global studies will also include thousands of liver specimens for analysis. Each patient has also consented to share phenotypic data that is vital to understand the disease. Dr Gulcher explained that all this data, unavailable through any other current or planned population project, is necessary for researchers to understand the currently unknown genetic and biological drivers causing the progression of NASH from NAFL. Analysing this data set should reveal druggable targets that are unique to the disease being studied. This comprehensive human genomics approach may be needed for successful drug development in NASH, an area of medicine that has suffered from recent clinical trial failures.   

As population-scale programs continue to crop up, we expect to hear more about large deals leading to genomically-validated precision medicine pipelines within pharma, as well as a greater degree of partnership between industry, healthcare and the patients they all serve.

One thing is for certain: “big” genomic data is here to stay, laying a foundation for better understanding the genetic and biological drivers of human disease and, hopefully, better precision therapies that target those disease drivers.