Ranking Personal Genomes According to Their Likely Health Cost
Thanks to the growth of genomic and genetic technologies, the day when we are able to predict an individual’s health costs purely based around their genetic ‘score’ is now upon us. The recent House Republican ‘Preserving Employee Wellness Program Act’, proposes legislation in the US that would enable companies to require employees to undergo genetic testing and provide genetic information about themselves and their families. However, legislative changes like this come with new crucial issues to be considered. According to a recent New York Times article by Louise Aronson (March 23), ‘although discriminating against workers with genetic abnormalities would be prohibited, it would be very difficult to prove that discrimination had taken place. Employers might simply invoke other reasons for hiring and firing decisions.’ Indeed, some genetic abnormalities can already be identified in individuals by assessing their genetic profile. But, as technology improves, I believe we may soon be able to accurately predict an individual’s expected health cost based on their genome.
To illustrate my point
As a thought experiment, if we were to create a genetic score where people are ranked according to their likely cost to an insurer or health system, what type of data would be needed? First, one would need the complete genome sequence of a population large enough to provide sufficient statistical power. Secondly, one would need to find the genetic correlations, while reducing any confounding factors. A large collection of complete genomes could come, for example, from a national sequencing initiative. The population of Cuba (12 million), for example, could present a relatively well self-contained cohort. The size is about right, the health care system is completely universal and health records are electronic and centralised. Because the Cuban health care system is centralised and its communist roots aim to provide (in theory) equal care to all, this would mean that health costs per treatment would be comparable: e.g., two different people treated for the same condition should cost the same. It could be argued that the environment would have an effect in health costs: if you eat a poor diet or do not take care of your body or mind you are more prone to illness, even if you have excellent genes. Let’s assume, however, that because we are analysing a whole population, environmental factors would be averaged out. Still, this setup might not take into account the specific features of the population, its environment and culture, hence rendering comparisons with other country populations challenging. How do we account for the fact that older people have more treatments and cost more? We could stratify the population by age, one group above 65 years of age, but, as you will see below, my proposed method for associating genetic variation to treatment theoretically would be able to work even if the age factor is not considered.
I hereby assume that if an individual suffers from bad health, this would be translated into a greater health care bill, regardless of whether this was caused by the environment, age or genes. Treatments received would need to be encoded in a hierarchical tree of terms (i.e., an ontology) where relationships between treatments are integrated: the more general the treatment the closer to the root of the tree of terms and vice-versa. Let’s assume then that we are able to calculate the cost of every person’s treatment in the Cuban population. Figure 1 below gives a graphical representation of how people would be related to treatments and how they would also be related to their genetic variations.
(Feel free to skip the following part, which is going to contain a bit of math and scientific lingo. I will explain it in simple term in the Discussion session.)
In order to measure the association of treatments to genetic variations, we apply a Hypergeometric Index (HI) (Figure 2).
The HI yields the log-transformed probability of having an equal or greater overlap between a treatment and a genetic variation than expected by chance (1). The significance of the association between a treatment and a genetic variation is reflected in the HI value. Scenario 1 in Figure 3 shows a hypothetical example where a treatment is connected to a genetic variation via 3 different people, but this treatment has 4 more connections to other people with other genetic variations (highly prevalent treatment). The hypergeometric association between the treatment and the genetic variation is low (HI = 0.001; scenario 1, Figure 3). In scenario 2, a treatment is connected to a genetic variation present in 3 people, but the amount of connections to other people for that treatment is low. Here a more specific treatment-genetic association occurs, and therefore the HI index is very high (HI = 0.942; see scenario 2, Figure 3).
If we were able to calculate the hypergeometric values for all the treatments associated to genetic variations in the Cuban population, we would be able to predict the likely cost treatments given a personal genome. In other words, we would be able to associate treatment costs to genetic variations that have a high HI. Based on these associations we would then predict health costs for individuals in treatments that can be genetically predicted. Initially, we would not be able to predict many treatments, but as treatment specificity became better (to be expected in a Precision Medicine-based approach), we would be able to increase the number of treatments that can be predicted from people’s genomes. Therefore, it would be possible to rank people according to how much they are likely to cost to the health system for those treatments that can be predicted from their genome.
I stated here that it would be possible to calculate a genetic score that predicted healthcare costs based on the genome of a person, assuming that sufficient genetic and clinical data were available. The computational infrastructure and data volume required for this exercise would need the involvement of a nation-wide funding and legal framework, including participatory patient consent forms and rich environmental and phenotypic controlled-vocabulary descriptions. A prediction of health costs like this would arguably not differ much from current statistics used by banks or insurers to assess mortgages or insurance health coverage. To my knowledge, insurance or bank statistics however, do not systematically incorporate the clinical risk prediction afforded by personal/clinical genetic testing. Yet, in my proposed hypothetical scenario, a genetic-based health cost assessment might be calculated by averaging health treatments for the same condition in a population, given a particular genetic load.
The current state of the art genome-based clinical risk predictions accurately predict actionable clinical outcomes for a very few genes. Genes like BRCA which display strong genetic associations with disease (such as breast cancer) are rare, making most health risk predictions to date unspecific for the individual. This also applies in the personal genomics context (i.e., genetic testing for apparently healthy individuals). For example my personal top predicted risk by 23andMe is a 28% chance of developing prostate cancer, which renders any potential health treatment unviable in a regular clinical setting. Such health risks may also be the result of many gene interactions, leading to the inability to obtain treatment-genetic correlations just by examining direct gene-disease causality. Yet the combined genetic score that integrates all my health risks might prove desirable for certain purposes, in the same way credit scores can be used in banking to discriminate ‘good’ customers from ‘bad’ ones. Despite the current considered limitations, profit motives and personal drivers could still encourage the development of algorithms that correlate treatment costs with personal genomes.
On a broader scale, the unwanted side-effects or risks associated with a genetic score ranking people according to their likely costs have the potential to disrupt society and our species as a whole. Ranking genomes could not only affect how people receive treatment or choose their approach to healthcare. Without the appropriate regulation and educational resources to appropriately internalise and interpret genetic score results, we could potentially face severe effects in the way employers choose employees, people determine their professions or how lifestyle choices are decided. In such a hypothetical scenario, current regulatory frameworks such as GINA (the US-based genetic non-discrimination act) could offer the basis for a much wider protection of people’s non-discriminatory rights than the ones existing today. GINA would need to be significantly revised, however, to make sure that usage of genetic data from people is independently reviewed by third party ethical review boards. The uses of genetic scores, I suggest, should be either banned or strictly controlled with responsibilities and liabilities properly spelled out and recognised at an international level in conjunction with local data stewardship authorities.
The US Republican Party’s current proposal to allow employers to access genetic data from employees shows the risk that even when genetic discrimination is illegal, once the data is accessed, there is no way to ensure it is used for the right purposes. Controlled/managed access of genomic data in the research environment has already provided mechanisms to protect the consent framework for sharing patient genome data. I believe the Global Alliance for Genomics and Health could have a tremendous role in establishing protocols for safeguarding society from unwanted scenarios such as the genetic ranking of people. Such protocols would need to be applied to general public genetic data, protecting people from covered up discrimination. Therefore, it is imperative that we, as citizens, now discuss the potential ethical, legal, and societal challenges that genetic scores could bring upon health systems and associated health insurance practices. We are all endowed with our own personal genome; what affects us genetically, also affects our health and life quality. How this knowledge affects people’s decision to get treatment, choose a specific profession or even get married should always be the choice of the person and not be guided by third party interest. Let’s keep it that way.
Fuxman Bass JI, Diallo A, Nelson J, Soto JM, Myers CL, Walhout AJM. Using networks to measure similarity between genes: association index selection. Nat Methods 2013; 10: 1169–76.
I am grateful to Fiona Nielsen (Repositive Ltd. CEO) and Daniel Jason Binks (Repositive Marketing Assistant) for useful comments and enhancements.
Figures are adapted from a recent publication I co-authored: Reyes-Palomares A, Bueno A, Rodríguez-López R, Medina MÁ, Sánchez-Jiménez F, Corpas M, et al. Systematic identification of phenotypically enriched loci using a patient network of genomic disorders. BMC Genomics. 2016;17: 232.