Next generation sequencing: How and why we got here
In this article, written for Front Line Genomics magazine issue 2, Richard Wintle explores the history of next generation sequencing, from the Human Genome Project to the cutthroat high throughput industry of today. Richard is the assistant director at the Centre for Applied Genomics, a core faculty of the Hospital for Sick Children in Toronto. He is also a keen photographer, motorsports enthusiast and science writer.
In the early part of the twenty-first century, we were treated to the completion of the first drafts of the human genome sequence. In the ensuing decade, surprising new technologies for DNA sequencing have been developed. Today’s highest throughput sequencers can generate over 20,000 times as much data in a single run as those used in the Human Genome Project, making routine, cost-effective genome sequencing from individuals realistic. Here, we will discuss where these sequencing technologies came from and what they’ve been used for. But even by 2015, a number of ‘next-generation’ sequencing approaches have come and gone, falling by the wayside and leaving only a very few real genome sequencing contenders left.
In the early part of the twenty-first century, we were treated to the completion of the first drafts of the human genome sequence. In the ensuing decade, surprising new technologies for DNA sequencing have been developed. Today’s highest throughput sequencers can generate over 20,000 times as much data in a single run as those used in the Human Genome Project, making routine, cost-effective genome sequencing from individuals realistic. Here, we will discuss where these sequencing technologies came from and what they’ve been used for. But even by 2015, a number of “next-generation” sequencing approaches have come and gone, falling by the wayside and leaving only a very few real genome sequencing contenders left.
In 2015, it is reasonable to say that we are living in what could be described as yet another “Golden Age” of genomics. Technologies now exist that allow for sequencing of complete human genomes in a matter of days. Coupled to modern, high-performance computing and a host of bioinformatics tools that have been developed to analyze the data, whole genome sequences from individual humans can now be generated at costs approximating that of single DNA laboratory tests. Collectively referred to as “next-generation sequencing”, or NGS, these technologies all share several characteristics: massive parallelization of chemical sequencing reactions, micro- to nano-scale reaction volumes, and a hefty dose of computational power to capture the raw data and process it to formats interpretable by analysis software running on external computers.
Since the first publication of a human genome sequence using NGS, that of Nobel Laureate James Watson in 20081, there have been numerous studies of single genomes, using various NGS approaches (for relatively early examples, see references 2 and 3). More recently, we have begun to see larger-scale studies, applying NGS to the wholegenome analysis of larger patient and family cohorts (for example, references 4 and 5). Not surprisingly, cancer biologists have embraced these new technologies: the field is replete with examples, including major international efforts such as the International Cancer Genome Consortium (https://icgc.org), which aims to thoroughly characterize fifty different tumour types, including whole-genome sequence analysis. Indeed, these are exciting times for genome scientists, far from the early days of automated DNA sequencing.
Good old days – the Human Genome Project
In many ways, the future of NGS was determined by a roadmap laid out during the Human Genome Project (HGP). Both the public-6 and privately-funded7 human genome reference sequences were created mainly on fluorescent, automated capillary sequencers. Developed initially as early as 19908, capillary sequencing ultimately took conventional Sanger sequencing and parallelized it into instruments that could run 96 or 384 reactions at once in microplate format. Along the way, radioisotope labels were abandoned in favour of fluorescent detection, polyacrylamide slab gels exchanged for polymer-filled capillaries, and X-ray film done away with by laser fluorescent detection systems. The development of this technology was an inexorable result of the massive sequencing capacity required to complete the human genome.
Nowadays, HGP-era sequencing looks distinctly low-throughput, and rooms full of hundreds of capillary sequencers have given way to institutes running comparatively small numbers of modern NGS instruments. The last major human genome sequencing paper using capillary sequencing was published in 20079, and even that study used mainly existing data. While this study and subsequent analysis of the sequence10 remain the benchmark of thorough analysis of an individual genome, even in 2007 the newer technologies were beginning to take over. A veritable earthquake was beginning to shake the sequencing field.
“Next Generation Sequencing”
While Levy and colleagues were re-assembling the first individual, diploid human genome sequence, others were applying NGS approaches, most notably to produce a genome sequence from the DNA of James Watson, the first NGS-sequenced human genome1. This study used instruments from 454 Life Sciences, an offspring from Curagen that was later acquired by Roche Life Sciences in 2007.11 If that sounds a bit convoluted, it is: the short history of NGS is remarkably messy, replete with acquisitions, hostile takeovers, and mergers, and littered with the skeletons of dead-end technologies. In this fast-moving field, by 2015 we have already seen numerous instruments reaching end of life, and others upgraded almost beyond recognition. Still other promising approaches have never made it to market and been abandoned. One or two that had been available for sale never worked as advertised, and have also subsided into the increasingly murky history of the field.
The Roche technology used for Dr. Watson’s genome was the first NGS technology to market, with the now long-deceased GS20, so named because it was a Genome Sequencer capable of producing a massive 20 million bases of data. Compared with the most popular of the capillary sequencers, Applied Biosystems’ 3730xl, it was a monster, producing over 250 times as much data in a single run. Its latest version, the GS-FLX, increased this output another 35- fold to 700 million bases12, but even this was not enough – Roche announced in 2013 that it was shutting down its 454 sequencing division, and that it would discontinue support in 2016.13 In the meantime, the company had other sequencing goals in sight: they attempted a failed hostile takeover of industry leader Illumina in 2012, and forged an alliance with nanopore sequencing pioneer Pacific Biosciences in 2013.14
Other players in the field have followed similarly convoluted evolutions. Genotyping heavyweight Illumina began their foray into sequencing by acquiring Solexa, a spin-off company founded by scientists from Cambridge University. Late in 2006, Illumina announced the acquisition of Solexa in a takeover that had been a closely-guarded secret, immediately catapulting the former genotyping company into the forefront of the NGS field.15,16
The Solexa 1G, rapidly re-branded as the Illumina Genome Analyzer, promised up to 1 Gigabase of sequence per run, a vast increase over that of the Roche instruments. By using four-colour fluorescent sequencing-by-synthesis (i.e., the measurement of different coloured DNA bases as they are incorporated into a growing chain), the instruments also had a familiarity about them that appealed to those used to four-colour capillary sequencing. Later versions, the Genome Analyzer II, GAIIx, and the HiSeq series, have increased this output remarkably, to the just-released spec of the HiSeq 4000, a single instrument reputed to produce up to 1.5 Terabases17 of sequence in a single run. Bundled into packages of 5 or 10, the HiSeq X instruments provide additional capacity, up to a reported 1.8 Terabases per run. For those keeping score, that’s about 90 thousand times as much sequence as the venerable 454 GS20 could manage.
The third big player, at least initially, was Applied Biosystems. Long the market leader in capillary sequencing, the company bought Agencourt Personal Genomics, developers of a sequencing-by-ligation technology termed SOLiD.18 SOLiD instruments went through a similar number of iterations as the Illumina’s have done, culminating in the SOLiD 5500xl W, still available in 2015 with an output of up to 360 Gigabases, depending on the specific application.19 In parallel, having become part of Life Technologies along the way, which in turn is now all part of lab supply and instrumentation juggernaut Thermo Fisher, in August of 2010 the company hedged its bets by announcing the acquisition of Ion Torrent, developer of a completely different sequencing technology.20 The current Ion Torrent Personal Genome Machine (PGM) and Proton instruments form the strongest competition to Illumina at the present time. The somewhat confusingly-named PGM, can generate up to 2 Gigabases of sequence per run, or only about two-thirds coverage of a human genome.21 The Proton, while a capable workhorse for exome sequencing, is only specified to produce 10 Gigabases of sequence per run, although in reality, 15 Gb is routinely achievable.22 While enough to redundantly cover the roughly 3-billion base human genome five times or so, this is not nearly enough throughput in a single run to achieve the depth of coverage required for robust whole-genome re-sequencing.
Along the way, there have been many other sequencing technologies. One interesting example is the “Polonator”, initially distributed by Dover Systems, with a claimed output of 8-10 Gigabases per run23, and an open architecture model for both the instrument, and sequencing applications and chemistries for it. How successful this has been is somewhat difficult to determine, and it appears most recently to have been taken over by Azco Biotech. There are few, if any, currently in operation.
Other companies have taken the approach of using single molecules as sequencing templates, thereby avoiding any biases that might be introduced through PCR amplification of the templates. Notable among these, the Helicos Biosciences Heliscope certainly had the capability to sequence whole genomes. 24 The instrument, though, was both huge and expensive. Ultimately Helicos, after a series of press releases promising upcoming sales, filed for bankruptcy.25 More successful single-molecule technologies include those from Pacific Biosciences (another onetime rumoured acquisition target of Applied Biosystems) and Oxford Nanopore Technology. These rely on sequencing by detecting molecules as they interact with nanopores in a membrane (see reference 26 for a recent review). Other companies have either fallen by the wayside, or changed focus, one example being BioNanomatrix, now Bio Nano Genomics, which focuses on genome mapping rather than sequencing per se.27
One other important option is provided by Complete Genomics, a company built entirely on a service model and using proprietary technology based on DNA “nanoballs”.28 Complete Genomics was founded initially to focus only on human, whole-genome sequencing, but importantly for cancer biologists, it integrated matched tumor:normal sample sequencing from very early in its existence. Now owned by Chinese sequencing company BGI29, it has shifted its focus to clinical projects, but is still apparently completing some legacy research projects as well. Persistent rumours suggest that Complete Genomics instruments might find their way into the market at some point, but concrete evidence is lacking.
Of all of these and many others, only Pacific Biosciences and Oxford Nanopore Technologies appear to be well established, with instruments well suited for niche sequencing (targeted gene panels or regions, small genomes, pathogens). No other player is taking a serious run at whole genome sequencing, and with serious questions about Ion Torrent ever achieving instrument outputs that can realistically sequence whole human genomes, that leaves really only one player selling instruments that can: Illumina.
So where does this leave us when faced with the problem of sequencing cancer samples today?
Sequencing a human genome and comparing it to the existing reference sequence, a process usually referred to as “re-sequencing”, requires redundant coverage in order to accurately map and overlap all of the individual sequence reads. This, in turn, allows for confident “calling” of variants in the test genome as compared with reference. The scientific community seems to have settled on roughly 30 to 40-fold sequence “depth” as a standard required for a robust, constitutional genome sequence.30 This means that a minimum of 90 Gigabases of sequence are required, given that the human genome is about 3 billion or so bases in total length.
Tumour sequencing generally requires much greater depth of sequencing, as tumour samples can have mixes of cell types, with only some cells harbouring disease-relevant somatic mutations. How much, exactly, appears to be a somewhat unresolved question, with greater depth revealing rarer mutations. A recent white paper by Illumina suggests that in fairly pure tumour samples (those with >80% cancer cells), good detection of high-frequency somatic events can be achieved with 60x genome coverage.31
So where does this leave prospective genome scientists today? For sequencing of the roughly 1.5% of the genome that contains the coding regions of the genes (exome sequencing), there are instruments from Illumina and Ion Torrent, which despite a great deal of misinformation in the marketplace, work more or less equivalently well for this application.32 For whole genome analysis, if running such experiments in house is desirable, purchase of Illumina’s instruments is the only real option. However, outsourcing is still a viable alternative, with services from Illumina itself, Illuminabased service providers such as South Korea’s Macrogen, or Complete Genomics, offering WGS. Specialized instruments such as PacBio’s RS II and the Oxford Nanopore MinION seem useful for targeted analysis of difficult genomic regions, haplotyping (i.e., determining the arrangement of markers on the same chromosome, either the paternally- or maternally-inherited homologue) in individual genes, or the analysis of small genomes such as those of viruses. Realistically, only ten years since 454 rattled the sequencing world with its GS20, Illumina has now essentially cornered the market for whole-genome sequencing, and appears that only a technological revolution as big as NGS itself might change this.
- Wheeler DA et al. (2008). The complete genome of an individual by massively parallel DNA sequencing. Nature. 2008 Apr 17;452(7189):872-6.
- Ashley EA et al. (2010). Clinical assessment incorporating a personal genome. Lancet. 2010 May 1;375(9725):1525-35.
- Lupski JR et al. (2010). Whole-genome sequencing in a patient with Charcot Marie-Tooth neuropathy. N Engl J Med. 2010 Apr 1;362(13):1181-91.
- Jiang et al. (2013). Detection of clinically relevant genetic variants in autism spectrum disorder by whole-genome sequencing. Am J Hum Genet. 2013 Aug 8;93(2):249-63.
- Yuen RK et al. (2015). Whole-genome sequencing of quartet families with autism spectrum disorder. Nat Med. 2015 Feb;21(2):185-91.
- Lander ES et al. (2001). Initial sequencing and analysis of the human genome. Nature. 2001 Feb 15;409(6822):860-921.
- Venter JC et al. (2001). The sequence of the human genome. Science. 2001 Feb 16;291(5507):1304-51.
- Swerdlow H, Gesteland R (1990). Capillary gel electrophoresis for rapid, high resolution DNA sequencing. Nucleic Acids Res. 1990 Mar 25;18(6):1415-9.
- Levy S et al. (2007). The diploid genome sequence of an individual human. PLoS Biol. 2007 Sep 4;5(10):e254.
- Pang AW et al. (2010). Towards a comprehensive structural variation map of an individual human genome. Genome Biol. 2010;11(5):R52.
- Roche acquires 454 Life Sciences to strengthen presence in ultra-fast gene sequencing. Press release, 29 March 2007, accessed February 14, 2015.
- 454 Life Sciences product information, accessed February 14, 2015.
- Six years after acquisition, Roche quietly shutters 454. Bio-IT World, October 16, 2013, accessed Feb. 13, 2015.
- Roche shutting down 454 sequencing business. GenomeWeb News, October 15, 2013, accessed Feb. 13, 2015.
- Illumina Signs Definitive Agreement to Acquire Solexa. Press release, November 13, 2006, accessed February 14, 2015.
- History of Illumina Sequencing, accessed February 14, 2015.
- Specification Sheet: Sequencing – HiSeq 3000/HiSeq 4000 Sequencing Systems, accessed February 14, 2015.
- Applied Biosystems to acquire Agencourt Personal Genomics, privately-held developer of genetic analysis technologies; ultra high throughput technology with potential application in DNA sequencing, gene expression, and genotyping. Press release, April 30, 2006, accessed Feb. 14, 2015.
- Applied Biosystems: v2.0 Specification Sheet – 5500 W Series Genetic Analyzers, accessed February 14, 2015.
- Life Technologies announces agreement to acquire Ion Torrent. Press release, August 17, 2010.
- Specification Sheet – The Ion PGM System. Life Technologies, accessed February 13, 2015.
- Specification Sheet – The Ion Proton System. Life Technologies, accessed February 14, 2015.
- The Polonator G.007 FAQ, accessed Feb. 15, 2015.
- Pushkarev D et al. (2009). Single-molecule sequencing of an individual human genome. Nat Biotechnol. 2009 Sep; 27(9):847-50. doi: 10.1038/nbt.1561.
- Carroll, J (2012). Troubled sequencing pioneer Helicos shelters in bankruptcy court. Fierce Biotech, November 19, 2012, accessed Feb. 15, 2015.
- Wang Y et al. (2015). The evolution of nanopore sequencing. Front Genet. 2015 Jan 7;5:449.
- BusinessWire (2011). BioNanomatrix Completes Name Change to BioNano Genomics. BusinessWire, Oct. 11, 2011, accessed Feb. 15, 2015.
- Drmanac R et al. (2010). Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science. 2010 Jan 1;327(5961):78-81.
- BGI-Shenzhen Completes Acquisition of Complete Genomics. Press release, Mar. 18, 2013, accessed Feb. 19, 2015
- Sims D et al. (2014). Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genet. 2014 Feb;15(2):121-32.
- White Paper: Sequencing – Evaluating somatic variant calling in tumor/normal studies, accessed February 14. 2015.
- Boland JF et al. (2013). The new sequencer on the block: comparison of Life Technology’s Proton sequencer to an Illumina HiSeq for whole-exome sequencing. Hum Genet. 2013 Oct;132(10):1153-63.