We sat down with Jeff Gulcher, Chief Scientific Officer at WuXi NextCODE Genomics at ASHG in October, to find out what they are working on right now, and why ASHG is so important. 

What you are exhibiting specifically and why is this event so important to you?

As the leading global platform for using genomic data at scale, ASHG is always an important event for us. This year we’ve shown the unique power and efficiency of our GOR database management system by mining 100,000 genomes from the exhibit hall floor – the biggest demonstration ever of the use of genomic data online. We’ve also showcased the work of our partners in using our platform for many of the largest rare disease diagnostics efforts around the world.

Another thing we’ve highlighted and that gives a good view of the breadth of our capabilities is our work in the area of tumour and cancer analysis. What are the challenges of cancer analysis when trying to figure out if a person has cancer, and how do we best treat that patient? You would’ve heard a lot of people trying to sequence tumours and trying to identify matching drugs to a specific mutation to a specific gene. We have been doing that ourselves as well, but we are now also looking to the future. Can we help to use the genome to find better drug targets? Can we do a better job at diagnosing the type of tumour that a person might have?

Can you describe your recent breakthrough in FFPE sequencing?

Sequencing reads of a sample prepared by the traditional whole-genome sequencing workflow for fresh-frozen samples and data generated using WuXi NextCODE’s SeqPlus whole-genome FFPE method. The green and purple indicate reads sequenced in the forward and reverse directions, respectively, and yellow represents bases with non-reference sequence. The center of the image shows a C to A mutation in each of the tumor samples

One of the big challenges in cancer genomics is that when a tumour is resected from a patient it is usually put into a formalin fixation solution – called FFPE – because they want to do a good job looking at the histology. This is the routine way of analysing tumours after biopsy or surgery. The problem is that sequencers don’t really like the DNA that comes from FFPE samples, as it is hard to extract in good condition. As a result, FFPE samples usually yield bad quality DNA and you then have to sequence it very intensively to try to make up for the lack of quality. There are groups that’s do that, but the trouble is that you still can’t look at the whole genome when you do that. At best you can look at small slivers, maybe 300 to 400 genes, and even then it can be expensive.

We spent the last year developing a way to get larger pieces of DNA and thus higher quality whole-genome sequence out of FFPE samples – and we’ve succeeded. At ASHG we are letting people know that we have this technique available as a service, with very high quality DNA coming out. For the first time ever you can do whole genome sequencing on a routine FFPE clinical sample.

This is a big deal, since it’s the way most tumour samples are kept. The average medical centre has a pathology lab that may have kept such samples for 50 years or more. So this technique has the potential of unlocking the data in all those samples. It is in essence liberating most of the cancer data in the world, and means that for the first time we can connect sequence data with the wealth of treatment and outcome data that is contained in laboratories and biobanks around the world.

What would you say are some of your other core areas of interest right now?

We are also very focused at developing and using AI and deep learning methods. Going back to cancer you can ask: now that you have sequenced the tumour, what’s next? How does that tumour data compare with already sequenced tumours? One of the things we have done recently is go back through a large collection of 11,000 tumours in a very large US supported tumour bank: The Cancer Genome Atlas, or TCGA. All that data is available if you apply, but it is so much data that it is unwieldy for most groups to use fully and effectively. It is a big data problem that we are solving. We are in fact putting it into a GOR database, to the benefit of our partners and to make some discoveries of our own that can benefit patients.

One of the questions we asked was: ‘Can we analyse all of those samples and come up with genomic signatures that point to interesting driver genes or pathways that may determine the difference between different types of cancer?’ For example, we want to find different genomic signatures that can tell us which breast cancer cases we need to treat very aggressively versus those for which we can use more standard treatment methods. We have done that using our applied artificial intelligence and deep learning methods.

How can you diagnose cancer using AI?

AI is perhaps the only way to systematically look across the genomes of thousands of samples. This is an extension of the sort of pattern recognition and AI that have worked very well for facial recognition. In that case, one is looking at points of eyebrows, or the width of the nose or the distance between the eyes, etc. There are a limited number of dimensions you need to look at and you need only thousands of faces to find the patterns you need.

But fast forward to something more complex. In our case we are now making measurements across the entire genome with tens of thousands of dimensions: 65,000 different genes with different approaches for each individual or sample. You may look at whether the gene is turned on or off at the RNA level; look for certain mutations in the gene itself; look at the amplifications in the genome; and so on. Just imagine trying to merge all that data together and ask and answer those questions. Traditional methods can’t do that, as it’s too complicated.

So we are using AI and deep learning to find these genomic signatures and advance disease recognition. We have developed and applied our proprietary methods to the TCGA collection as a sort of proof of concept of our capabilities. We’ve seen superior results, in terms of distinguishing between one type of lung cancer versus another, as well as in breast cancer. We can also now identify and distinguish between nearly two dozen types of cancer using genomic signatures, something that has never been done before. Along the way we are finding some of the more important pathways to perhaps target in the future. These represent the next generation of sequencing oncology targets, and that’s very exciting.

You mentioned you had received $240 million in recent funding, how are you going to use this within the company?

We have big ambitions, major partners, and many projects and opportunities. We are expanding our teams at all three of our sites, in Shanghai, Cambridge, Massachusetts, and Iceland. We are for example already providing the platform for most of the world’s national genome projects and more are looking to us. But the other thing we are looking to find is additional opportunities to bring more data into our data platform. The more data our platform uses the more we learn from it, and the more we can help our partners to analyse their data. We are also looking at potential acquisition targets that will augment some of the expertise that we already have.

How do you hope to combine your datasets?

We can do this through a variety of different ways. One is of course to partner with groups who want to look at very large cohorts. We are helping several paediatric hospitals on different continents to sequence tens of thousands of paediatric cases and to use that to try and accelerate diagnosis and discovery. Another way is by working parallel with a medical centre, helping them to recruit patients that are willing to be tested for their conditions. We also work with big population groups, for example Genomics Ireland and other groups that are developing big cohorts. We work with such partners to help them partner with large pharma, and in specific disease areas. We also work with a lot of Chinese cohorts.

Finally, we are launching consumer-focused products in China that can help to understand your individual risk of certain diseases but can also allow you to volunteer to participate in research. What is someone’s future risk for cancer, for example, and if someone is at elevated risk how can they manage that further with physicians or genetic counsellors? The key to being able to answer that question for ever more people is to bring in data and reanalyse it again and again and again, that is our strategy. When it comes to future risk, the question is how important is that risk and, if people and the healthcare system want that type of information, how can we make the understanding more precise and more actionable. That is the sort of thing that only a platform such as ours can deliver, by harnessing vast amounts of data from people around the world and to benefit people around the world.  

Why is it so essential that you act as a global platform?

I think if you don’t use database architecture like ours then you are limiting yourself, even in your own institution, because you will have to keep your data in data silos. Once you eliminate these silos and connect the data, like deCODE has done in Iceland, then your research become much more powerful and in the end benefits more people. Taking that to a global level, if data is stored in the same database management system in Europe and China as it is in the United States, it is also a lot easier to collaborate. When you want to collaborate on rare diseases or in clinical trials, for example, you have the possibility to do so.

Thus for a variety of reasons it is important for there to be a global platform and we are uniquely placed to serve as that platform. It allows us to unlock and really enable our partners to get the most out of their data and the global state of knowledge across the field. It makes it easier to collaborate, expands reference datasets and makes them universally accessible, and so further accelerates research and the delivery of insights to improve patient care.

What are some of the key components of that platform that help you maintain and achieve your goals?

There are three main pieces. The first is what you might call the back office, our core Genomically Ordered Relational (GOR) database management system, that powers our platform. Then we have two sets of analytical tools and interfaces for delivering insights from all that data. One is on the clinical side, the other large-scale case-control research.

An example of the clinical application is when you are looking at one child with an undiagnosed rare disease and his of her family. For this task we enable our customers with our clinical sequence analyser, or CSA, which enables clinical geneticists to home in on the variants that cause it based first on an almost instantaneous search of all the known disease genes that have been linked to the child’s symptoms, in any reference database in the world. However, if you don’t find something right away, you can also filter by looking at the inheritance pattern – comparing the child’s genome to those of her parents – and also see all de novo variants; I can use the world’s biggest allele frequency database to winnow the number of genes down by looking only at those variants that fit my criteria of rare; and I can further filter by the impact of the mutations themselves. In this way I have the greatest possible chance of diagnosing the child’s disease, using all the world’s data. That is how our CSA serves as a decision support tool for rare disease interpretation around the globe.

On the research side, our sequence miner enables you to look at the genomes of, say, a thousand patients with a given disease and a hundred thousand healthy controls. You can ask the big question: what genetic variations do patients tend to share that the control subjects don’t? This is the key to discovering new genes and pathways that might be important for discovering new pathways and drug targets. Those are the back office, clinical based tool and a research based tool that helps us aggregate big data in uniquely efficient ways and globally. Moreover, there is a fourth dimension – AI and deep learning – that we can infuse AI across the platform, adding a new dimension of power both to clinical interpretation and large-scale research.

What would you say are some of your biggest challenges?

I think one of the biggest challenges in the field is that some groups don’t have the resources yet to manage big sequencing data, so what they do instead is work with small panels of genes and the genes that are already well known. That’s great if you can make the diagnosis among what is already known about the genome, but that’s less than a quarter of the time. And so seventy-five percent of the time, you can’t make the diagnosis, and what’s next? Addressing that question with the best tools and most data available are our strength. We see the trend eventually, as sequencing pricing comes down more and more, that more people will begin to systematically sequence and scan the entire genome, whether that is the whole genome or whole exome. This is great, as we think it will lead to faster diagnosis and discovery.

What are some of your future plans, and how do you hope to tackle your hurdles? What are your future plans?

For us, it is more of the same. We will keep making our systems available for more pharma, even more population specific groups, even more medical centres and consumers. We will then continue to improve the interpretation of that data – with an ever-growing knowledge base, more powerful and intuitive bioinformatics, and pervasive AI and deep learning methods.