The next generation sequencing revolution: a view from the inside
There is an almost incredible amount of discover, innovation and application taking place in the field of genomics today. This has been a direct result of the rise of next generation sequencing technology. Is it time to look at which technologies are going to take us even further?
In this article I am going to describe my own personal experiences with the excitement that is the next generation sequencing revolution. My name is David I Smith and I am a Professor in the Department of Laboratory Medicine and Pathology at the Mayo Clinic. For a ten year period I was the Chairman of the Research Core Oversight Committee at the Mayo Clinic and for the past several years I have been the Chairman of the Technology Assessment Committee for the Mayo Clinic Center for Individualized Medicine. In this capacity I have had the pleasure to not only review the exciting revolutions in DNA sequencing but to also visit the major suppliers of these technologies and to go to many of the centers that are leading the way with this revolution. Finally I have been to many of the next generation sequencing meetings including multiple years at what is clearly the Academy Awards of sequencing meetings, the Advances in Genome Biology and Technology meetings in Marco Island.
My first exposure to this revolution occurred in 2006 when I attended a seminar given at the Mayo Clinic by 454 where they were talking about the Genome Sequencer 20 (GS20). It was capable of 20 megabases (Mbs) of sequencing per run of the machine. This was astounding at the time. My impression was two-fold. First I realized that the slides that I had been presenting to students every year about Sanger sequencing were now completely outdated and the second was that massively parallel sequencing was going to be a huge deal which would completely transform how we did research and more importantly how this could be rapidly translated into clinical practice. What I could not imagine was how rapidly advances would occur with massively parallel sequencing.
I went to visit 454 in Connecticut several times in 2007 and 2008 and saw their advancements that led to increases in sequence output of the Genome Sequencer from 20 to eventually 500 Mbs. However, it became obvious that there were a number of key limitations to this platform. While I originally thought that these were the cumbersome and messy procedure of emulsion PCR and the limitations of using unblocked nucleotides (the homopolymer problem), the real issue with this platform was its’ inability to increase sequence output beyond 500 Mbs. In spite of this, this platform was able to dramatically reduce the cost of whole genome sequencing and they sequenced the second named individual (James Watson) for a cost of one million dollars.
The second commercially available NGS machine was originally produced by Solexa which eventually became Illumina. I went with a number of Mayo colleagues and visited Illumina several times and we were impressed with a platform that had solved many of the 454 problems including the use of bridge amplification to amplify DNA fragments on a flow-cell and blocked fluorescently labeled nucleotides. The original Illumina Genome Analyzer was capable of 1 gigabase (Gb) of DNA sequence output which was accomplished by very short sequencing (31 base pairs) of 48 million amplified DNA fragments. The very short DNA fragments were a severe limitation for sequence assembly and the accuracy of the last few bases only increased this problem. However, the major key strength of this platform was the incredible headroom that it had for dramatic increases in sequence output.
I was able to obtain funding for a partnership between the Mayo Clinic and the University of Minnesota so that Mayo could obtain their first NGS machine in 2008, the Genome Analyzer (GA), and this was placed in the DNA sequencing Core at the Mayo Clinic to support our researchers. A year later we obtained out second GA (the GA II). When the HiSeq instrument came out we began to purchase these with their considerably greater sequence output. The Genome Sequencing Core of the Mayo Clinic heavily invested in the Illumina sequencing platform and they currently have seven of the different HiSeq instruments. We also have several MiSeq instruments within that research core laboratory. The technological advances on the Illumina platform have been astounding and now the major problem with this platform have been solved due to dramatic increases in obtainable sequence length (which really solved the alignment problems). They have also had dramatic increases in the number of simultaneous sequences obtained. The increase in sequence output from one Gb to over 1 terrabase in such a short time period on this platform have been absolutely astounding and truly herald in the age of NGS especially for clinical practice.
A new problem, however, is that the newest Illumina machines utilize patterned flow cell technology which will quickly make the HiSeq 2000s and 2500s obsolete. Thus, the Mayo Genome Sequencing Core, and other places with large investments in those platforms realize that their multi-million dollar investment quickly becomes obsolete. This is a problem with this entire technology, however perhaps with the patterned flow cell technology there is the capability for slightly longer instrument life cycle, by merely increasing the number of patterned sequencing centers.
The third commercially available sequencing platform utilized sequencing by ligatation (the SOLID system). I went to Life Technologies in 2009 and obtained training for to run that machine. Life Technologies also provided me and the Mayo Clinic with a SOLID machine but they never provided any funds for the running of that machine thus we had a machine for over a year but never got to kick its tires in house.
Fortunately Life Technologies did run some of our tumor samples on their platform and we were able to obtain RNAseq data on a number of our tumors which they then analyzed.
There were a number of strengths to the SOLID platform. It too utilised blocked nucleotides and because it utilized two-base encoding offered the capabilities of higher sequence accuracy than the Illumina platform. Unfortunately it still utilised emulsion PCR, had a very cumbersome and slow sequencing by ligation strategy, and was very limited in the number of bases it could sequence from any amplified fragment. The biggest problem, however, is that it tried, and failed to play catch-up to the Illumina sequencing platform and this eventually spelled its doom.
Life Technologies invested heavily in the Ion Torrent sequencing platform and eventually dropped the SOLID platform. There were a number of advantages to this platform including the utilisation of computer chip manufacturing to produce the sequencing chips, its utilisation of simple unmodified nucleotides, and much faster run times than the Illumina sequencers. Another key advantage of this platform is the ability to start with lower amounts of input DNA for library construction prior to sequencing. However, this platform still utilises emulsion PCR to amplify DNA fragments on beads, and the naked nucleotides suffer from the same homopolymer problem that plagued the 454 platform.
The low cost of the PGM machine made it a very easy machine to purchase and many of these machines were sold. However, it has become clear that most of these machines sit idle and that we exist in a world where the vast majority of sequencing is done on the Illumina platform. There have been some dramatic increases in sequence output on the Ion Torrent platform and suggestions that it could provide greater sequence accuracy than the Illumina platform. Unfortunately the promise of continued increases in sequence output has not kept up with expectations and currently this platform offers considerably less sequence output than the Illumina platform.
One of the key limitations to both the Illumina and Ion Torrent platforms is relatively short read lengths hence the next generation of NGS machines are those based upon single molecule sequencing. These offer a number of advantages including removing the need for any PCR amplification of DNA fragments and the capabilities of extremely long read lengths. The first viable single molecule sequencer was the Pacific Biosciences machines. Unfortunately sequence accuracy on this platform has and is been one of its major problems (producing only 85% accurate sequencing). A potential solution for this is provided by the utilisation of “smart-bells” so that DNA fragments are sequenced multiple times which produces much more accurate consensus sequences, but dramatically decreases overall sequence output. The past several years has seen increases in sequence output on this machine and the ability to sequence considerably longer DNA fragments (currently in the 10-50 Kb range). However, the promise of a $100 genome in 2014 (originally promised by Dr. Stephen Turner at AGBT in 2009) is little more than a pipe dream.
A promising single molecule sequencer is the Oxford Nanopore sequencing platform and this produced considerable buzz when presented at AGBT several years ago. The Minion sequencer is now being provided to a large number of laboratories but the first sequences obtained on this platform were highly inaccurate. There have been some considerable improvements on this platform and the promise of a very large number of nanopores on the GridION system suggest that this platform could in the future produce sufficient sequence to start to overcome the limitations of sequence accuracy upon first pass sequencing.
As part of my role in the Technology Assessment Committee, I also visited a number of other companies that are working to develop NGS-based technologies. This includes GnuBio which is developing a fully integrated droplet-based DNA sequencing technology. They were recently acquired by Bio-Rad and offer the capability of sequencing small gene panels with high accuracy. I also visited NabSys which using solid-state nanodetectors to analyze single DNA molecules. While not technically a DNA sequencer it could be useful for the analysis of DNA structural variation and genome mapping. There is also a new sequencing by synthesis platform being developed by Qiagen. One of the strengths is that Qiagen is working on developing an integrated NGS platform with everything from sample prep, automated library construction as well data analysis on the Ingenuity platform. The weakness is apparently their sequencing platform which appears to not be ready for primetime.
The current NGS landscape is pretty much defined by the Illumina sequencing platform. They produce a group of machines from low output MiSeq instruments of various flavors to the highest output HiSeq X Ten machines. On the HiSeq X Ten machines you can sequence the human genome below the mythical $1,000 mark. However, this does not factor two very important things into the equation which are the cost of sequence assembly and analysis and perhaps the greater cost of storing all that information. Hence, is this truly a $1,000 genome? Unfortunately it is not!
In October 2014 I was fortunate to be invited to speak at a BGI Conference in Shenzhen. As part of this Conference I was taken on a tour of their facilities and shown the Complete Genomics sequencing platform (they purchased CGI a year earlier). The CGI sequencing platform utilizes DNA nanoballs which are packed tightly on a silicon chip. They then utilize combinatorial probe-anchor ligation to sequence the DNA within the packed nanoballs. There are a number of features of this platform that are very attractive. The most attractive feature is that they have considerable bioinformatics expertise provided by BGI thus for a single price can generate a complete human sequence and assemble and annotate that sequence with high accuracy. It is projected that price will be at the $1,000 mark sometime in 2015. A second attractive feature is that there appears to be considerable headroom for further increases in sequence output. A third is that they can also generate phased sequenced genomes with their long fragment read technology. However, there are some key limitations to this platform including the fact that they do not currently sell their platform but instead you send your DNA to them and they generate the assembled sequences for you. In addition, it currently is only viable for whole genome and whole exome sequencing. The data generated is also not available for further manipulation. There are rumors that BGI will be making selling the CGI platform for purchase so that Centers can generate their own sequences in house, but these remain rumors at this point in time.
In conclusion the past 15 years has seen dramatic improvements in sequencing technologies and the current cost for whole genome sequencing is quickly approaching the $1,000 mark. These technological improvements have made these technologies viable for the total and complete transformation of clinical practice, a subject for its own discussion. Currently the sequencing landscape for both research and its clinical translation is completely dominated by the Illumina platform and this decided lack of competition does not benefit the end users. There are a number of new technologies being developed but many of them are a considerable way from having any impact on the sequencing landscape. However, the complete CGI platform provided by BGI does offer an alternative sequencing solution and the potential competition between those two platforms should offer further improvements in these technologies and even cheaper sequencing in the future.
More on these topics