Edico Genome Achieves World Record With DRAGEN Tool
At ASHG, we spoke to Gavin Stone, Vice President of Marketing at Edico Genome, to discuss their World Record achievement in more detail, as well as the advantages offered by their DRAGEN tool.
FLG: Talk me through the World Record attempt
GS: It’s been a collaborative effort with Children’s Hospital of Philadelphia, Amazon Web Services, and Edico Genome. Children’s Hospital of Philadelphia have this cohort of 1,000 paediatric genomes. 60% of the genomes are from African-Americans, making it one of the largest African-American cohorts that’s ever been sequenced. We wanted to see how quickly we could analyse the genomes and see how scalable this solution we developed with Amazon Web Services could be. Initially, we wondered if we could process 100 at the same time, then figured ‘why not 1,000?’ So that’s the challenge we set for ourselves.
DRAGEN runs off Amazon Web Services’ (AWS) EC2 F1 instances, on the AWS Cloud. We had 1,000 genomes, and 1,000 instances – our challenge was figuring out how to stream 100 terabytes of data from Amazon S3 (Simple Storage Service) onto those instances to run the analysis, and manage it all at the same time. Assigning each genome to an instance only takes a few seconds, but it took us around 10-12 minutes to get the full 1,000 assigned and running due to the large volume
This was all done with a real world dataset. They weren’t 1,000 uniform genomes. They varied dramatically and each had, on average, 40x coverage. A typical genome usually comes in at 30x coverage. The biggest coverage we had was actually 60x, so it was quite deep sequencing. A synthetic dataset would have taken a much shorter time. Two hours and twenty-five minutes is a fantastic time – World Record setting time. We were also only using their EC2 F1.2x large instances. Amazon also have F1.16x large instances which are eight times more powerful. With those, and some optimisation on our side, we could smash this record at some point in the future.
FLG: In a neonatal intensive care unit, time is critical, making DRAGEN a spectacular tool. What are its big advantages outside of the clinical setting?
GS: The neonatal unit is the obvious one, where minutes can mean the difference between life and death. Beyond that, once you get into the cloud, speed translates directly into cost. So if you have a research grant to analyse as many genomes as you can – being more cost efficient like this will allow you to analyse a lot more genomes and have a stronger study. It’s the same thing for clinical diagnostics- if you’re able to offer your test much cheaper and with a faster turnaround. So it’s not just the speed in a critical environment, it’s the throughput that allows a lot of our customers to offer their products in a different way to what they could before.
FLG: Who’s using it so far?
GS: The biggest sequencing centres in the world are using DRAGEN. That includes Hudson Alpha, Macrogen, BGI, Johns Hopkins, Baylor…you name it and they probably have at least one DRAGEN system. To date, our customers have run about 18 petabytes of data through their DRAGEN systems worldwide, which is equivalent to around 200,000 genomes. We don’t limit people to just genomes, so there’s a mix of genomes, exomes, panels, human, non-human, wheat, horse, pig, cat, dog, mice, bacteria…anything you can name. It’s a flexible platform that lets you do all of these applications. The hardware isn’t fixed, it takes 3 seconds to reconfigure.
FLG: You have the partnerships that went into today’s record attempt, and you also work closely with Illumina and Rady’s – how do these partnerships come about, and what’s been most useful from them?
GS: Our strong portfolio of partnerships are part of what makes DRAGEN so successful. We have a lot of strong partnerships in the tech industry, with Dell and Amazon Web Services, as well as numerous partnerships with industry leaders in the life science space. These partnerships are a testament to the growing interest in genomics from outside of the life sciences industry. Then we also have Illumina, who are leading on the sequencing side. We have a strong partnership with them, with DRAGEN now being made available on BaseSpace as well as through DNAnexus, AWS Marketplace and Seven Bridges. These partnerships enable us to provide DRAGEN as an underlying base technology for the industry. That’s really what genomics is about – everybody helping each other out to make everything better.
FLG: Talking more broadly about what you guys do, it’s using sophisticated processing to eliminate bottle necks in the bioinformatics workflows. What’s next?
GS: By eliminating bottlenecks, we actually create new bottlenecks further downstream. Our platform has the ability to help solve some of those bottlenecks as well. These sorts of reconfigurable processes are very well suited for things like machine learning and deep learning, to facilitate that downstream analysis. We’re throwing so much data at medical geneticists, that they can’t easily turn all of that into a diagnosis. We’re able to filter that data, and order it presenting the most likely causes in a way that make diagnosis easier. I think that goes a long way to furthering the goals of what is precision medicine. We’re not going in there to do everything in that space, but I think we’ll form more partnerships that will facilitate this and let us provide that underlying technology.
FLG: Have you seen anything at the event this week that’s excited you in particular?
GS: I’ve not had much of a chance to look at anything. Our team has been extremely concentrated on getting this Guinness World Records done! I was interested in the alternative sequencing technologies and catching up with the Oxford Nanopore and Nanostring teams, and seeing what other alternatives are brewing.
For me, just personally, I’m also really interested on what’s happening with CRISPR and being able to write DNA as well as read it. I’m excited to go and chat with a bunch of those guys!
FLG: Anything else you’d like to mention?
GS: We’re really just at the tip of the iceberg, at the moment. There are so many discoveries to be made in the field of genomics. We’re really pushing those boundaries, and continue to make all of what we do better, faster, more accurate, and perform best in the PrecisionFDA challenge (PrecisionFDA Hidden Treasures – Warm Up Challenge). That’s just on the very first of these new algorithms we’ve developed. Until now we’ve just been developing algorithms that are equal to what is already out there. The next generation, that we’re busy rolling out now, goes so far beyond what is currently achievable. So for the industry as a whole, and Edico Genome, we’re just getting started.