Next-generation sequencing, let me count the ways

So we all know that Sanger sequencing is so last century and next-generation sequencing has taken over the world. But just how many ways are there to perform next-generation sequencing? How is it actually done and which is the best method for your application?

First up, what can you do with next-generation sequencing (NGS) that you could not do with Sanger sequencing? Answer: a lot. NGS is cheaper, faster, and allows you to cover a lot more sequence than Sanger sequencing (6 billion reads/run with Illumina’s NGS technology vs 384 reads/run for automated Sanger sequencing). This yields alot more information for a fraction of the price in record time. The first human genome (J Craig Venter’s) took 3 years to be Sanger-sequenced and assembled and cost US$100 million, whereas now with NGS, a human genome would cost US$1000 and may be sequenced within 26 h.

How does it work and what are the applications? Here’s the breakdown:

1. Illumina uses a sequence by synthesis-based approach. Video.

  • DNA is fragmented into 50-300 base long fragments and adaptor sequences are hybridized to their ends.
  • These adaptors allow the oligos to attach to the surface of a flow cell, and they get extended by a polymerase.
  • Bridge PCR then occurs where the now extended oligo bends over to an adjacent adaptor sequence and a polymerase extends that sequence. This process repeats until a “polony” is formed i.e. a dense colony of oligos of the same sequence.
  • Reverse strands are cleaved away and primers are added to sequence the remaining strands by a synthesis process. This involves differently fluorescent-labelled dNTPs being added to the growing strand. Each nucleotide is chemically protected to prevent the addition of more than one nucleotide every cycle. After every addition, the fluorescent label is chemically cleaved and the sample is excited with fluorescence which will measure which base has been incorporated.
  • The same process is repeated with the reverse strands (i.e. a paired read) and all the sequences obtained are then clustered and mapped back to the expected sequence.
  • Probably the most well-established platform so you can expect good technical support. They also have a variety of machines for different requirements e.g. the MiSeq, which is a lower-throughput faster system aimed at smaller laboratories and the clinical diagnostic market.
  • Sequence yield/run: Up to 600 gigabases (10^9)
  • Sequencing Cost/million bases: $0.04
  • Instrument Cost: $128K – $654K (depending on instrument)
  • Time: A 300 base-long paired read takes 2 days.
  • Sample requirement: ~100 ng
  • Error rate: 0.1%

2. Ion Torrent by Thermo Fisher uses pH as a readout instead of fluorescence. Video.

  • Emulsion PCR is used to clonally amplify DNA library fragments on the surface of microbeads which are then placed in microwells.
  • A proton (H+) is released for each nucleotide incorporation event which produces a pH change detected by a miniature pH sensor.
  • Nucleotides are introduced separately and only DNA fragments that incorporated the particular nucleotide will show a signal.
  • Nucleotides are unmodified so no chemical deprotection steps are necessary and pH sensing is also more rapid than fluorescence detection.
  • However, it comes with a higher error rate (repetitive sequences being challenging to read ) and higher cost per read.
  • Ion torrent due to its lower throughput is more commonly used for small scale applications and is popular for microbial sequencing or targeted/amplicon sequencing where only specific genomic regions are analyzed.
  • Sequence yield/run: up to 1 gigabase
  • Sequencing Cost/million bases: $0.10
  • Instrument Cost: $80K
  • Time: A 300 base-long paired read takes 3 h.
  • Sample requirement: 10 ng
  • Error rate:  1%

3. Pacific Biosciences uses single molecule real time (SMRT) sequencing which is similar to Illumina’s technique but avoids the need for polony generation. Video.

  • The main advantage is that average read lengths are not restricted to 300 bases but can go up to 50, 000 bases making it suitable for genome analysis.
  • A DNA polymerase is attached on the bottom of zeptoliter-sized (10−21 l) chamber and polymerizes complementary DNA to the single-stranded DNA template.
  • Detection system is  fluorescence-based but as its fluorescence label is on a gamma phosphatase position, it is naturally cleaved after incorporation onto growing strand.
  • Fluorescence sensing is also limited to near the surface-bound polymerase, reducing background signal.
  • However it has worse error rates than the Ion Torrent and is also more expensive.
  • Its long reads enable better de novo assembly of the genome, and its also good for sequencing complex/repetitive genomic regions. Its high error rates however make it harder to accurately identify SNPs (single nucleotide polymorphisms) but good at detecting SVs (structural variants).
  • When used in conjunction with other sequencing methods with a lower error rate (termed hybrid sequencing), it allowed for better detection of gene isoforms.
  • Furthermore, its ability to monitor the time taken between base incorporation allows it to detect epigenetic modifications.
  • Sequence yield/run: 500 megabases to 1 gigabase
  • Sequencing Cost/million bases: $0.40-$0.80
  • Instrument Cost: $695K
  • Time: Does not do paired reads. 2 h / read.
  • Sample requirement: 1 ug
  • Error rate: 13%

4. Nanopore-based sequencing by Oxford Nanopore Technologies is the only sequencing technique that does not rely on a polymerase. Video.

  • DNA is pulled and threaded through an enzymatic pore membrane protein. The electric current induced across the membrane when this happens is different for each nucleotide type incorporated allowing the sequence to be determined.
  • However a pore can fit a minimum of 4 nucleotides which makes it difficult to deconvolute the sequence.
  • This high number of possible states leads to higher error rates than PacBio, with even higher costs resulting in not much market adoption.
  • However, they have a nifty product called the MinION which is a mini-sized sequencer that works when plugged into a laptop via a USB cable. Therefore, one has the option of performing sequencing in a non-lab-based environment. It also has a massive read length of up to 10 kilobases.
  • Sequence yield/run: 22 gigabases (MinION)
  • Sequencing Cost/million bases: $6.44–$17.90
  • Time: 48 h
  • Sample requirement: 10 pg – 1 ug
  • Error rate: 40%

So there you have it. Of course there are probably some other methods used out there but they mainly share similar principles to the ones described here. For further tips on planning your NGS experiment, try here, here and here.  For a guide on what sequencing depth you should aim for, try Illumina’s site.