The UK Biobank’s 2019 scientific conference concluded this month, showcasing fantastic speakers on topics ranging from use of the UK Biobank, to using genomics to understand individual risk, to strategies for health outcome phenotyping. Among them was Professor Josh Denny, Department of Biomedical Informatics at Vanderbilt University. FLG spoke to him about his talk on phenome-wide association studies (PheWAS): a research technique used to understand what disease associations can be made with a given gene.  

FLG: How Does PheWAS Work? Is it Purely Algorithmic, or are Other Methods Used?

JD: It’s purely algorithmic. Phenome-wide association study, or PheWAS, is basically analogous to what began as we started getting large-scale genomic information with genome-wide association studies (GWAS). The reason why no-one had done the equivalent phenome-wide approaches was because they didn’t yet have a richly-phenotyped set of individuals.

PheWAS began in electronic health records (EHR). We found from using these that there were lots of phenotypes that had nothing to do with a disease we were already looking for. Every other cohort study starts with a set of questions that is biased by what a researcher is interested in. And in the EHR you get a swathe of diseases and observations driven by the reasons a patient comes to a physician.

PheWAS gives a much broader range of observations. People have called it a “reverse genetics” approach, in that you’re starting with a genotype or some similar observation and asking what it means in the phenome. And there have been many cases in which people have serendipitously discovered something, like a genetic variant that relates to more than one disease. PheWAS lets you regularise or systematise the investigation of the phenomic impact of a variant.

FLG: Tell Us About Your Biobank 2019 Talk. What Were the Main Takeaways?

JD: At Biobank 2019 I talked about the PheWAS method, and how it works. I spoke about one of our early PheWAS’, where we had a very carefully designed approach to identify a phenotype, autoimmune hypothyroidism. We did a GWAS on it, and then we did a PheWAS on that genetic variant. We found hypothyroidism in the PheWAS just as we did in the GWAS, but we also identified some subtypes of hypothyroidism, and we identified atrial flutter. People who have hyperthyroidism get atrial flutter; people who have hypothyroidism are protected against it. That’s one thing that came out of the PheWAS that you couldn’t see in the GWAS.

After that I spoke about a large-scale validation we did of PheWAS, against previously-known genotype/phenotype association studies. We replicated about two-thirds of those that were adequately powered.

After that we talked about a big PheWAS study of HLA markers, which drive the immune response. In that study we found a couple of hundred associations, managing to replicate around 40 years of biology in a single study. We also found a few new associations.

Importantly, in that study we had a heat map between HLA types and disease associations, colour-coded red or blue based on whether there is a protective or risk factor. With PheWAS, we can see that one HLA type could put you at risk for, say, both rheumatoid arthritis and type 1 diabetes. A similar HLA type, however, may put you at risk for rheumatoid arthritis and be protective against type 1 diabetes.

If you undertake 50 different GWAS studies, they won’t all be in the same population so you can’t properly analyse them, or say whether a given HLA type puts a person at risk for both co-occurring factors. But because with PheWAS we can analyse both at the same time, we can actually say whether they’re independent features or not. Because it could be that having disease X, not just an individual’s genetics, that causes risk for disease Y.

Then we covered some examples of PheWAS in huge populations deployed in the UK Biobank, and some modifications we’ve made to cover the ICD10 codes used in the UK Biobank as well as ICD9 codes (used the US, where PheWAS was first developed).

FLG: What Are the Major Limitations of PheWAS? How Easily Can They be Resolved?

JD: The major limitations depend on the country you’re in. In the US we have a highly fragmented healthcare system, and some patients worry that they might be seen in a couple of different healthcare centres. So a phenome at any one healthcare centre might not be complete. That sort of thing reduces clinicians’ ability to find an effect: you might miss something.

Having said that, in big healthcare centres like Vanderbilt it’s not usually a critical issue for many chronic diseases because even if people see outside doctors they also see ours, so they’ll get diagnosed in both places. And it’s definitely less of a concern in the NHS, since in that instance there’s only one biller.

The other issue is misdiagnosis and coding errors. Improvements are being made in that area too, so in the case of type 2 diabetes being misdiagnosed or inaccurately coded as type 1, we can improve accuracy by undertaking some civil data science approaches.

FLG: Naturally, There’s a Strong Link Between PheWAS and Personalised Medicine. How Do You Expect That Link to Grow in the Future?

JD: Personalised medicine uses things like genetic risk scores to help determine who’s at risk. PheWAS helps tell you what that genetic risk score means for you. It might mean something for a given disease, but it might also relate to other diseases as well.

There’s also the example of medication reuse and repurposing and its side effects: those are all things PheWAS can quickly help determine. Drug companies are starting to use PheWAS now for that approach.

Over the next few years we’ll hopefully build clusters of these phenotypes to expand our understanding of complex disease and how it merges with rare Mendelian disease. Hopefully soon we’ll better understand this rare variant story as well as the common variant story we’ve been focused on for so long.