Infographic: The Sequencing and Assembly of the Human Genome – The Scientist

Current Issue
Current Issue

Brianna Chrisman is a final-year PhD student in the Department of Bioengineering at Stanford University. In her research, she uses non-reference human DNA sequences to better understand viral, metagenomic…
View full profile.
Learn about our editorial policies.

Jordan Eizenga is a postdoctoral researcher in the University of California Santa Cruz Genomics Institute and a member of the Human Pangenome Reference Consortium. His research focuses on methods development…
View full profile.
Learn about our editorial policies.
After sequencing fragments of DNA to obtain reads, most genomic pipelines follow one of two steps. The reads can be de novo assembled to construct longer stretches called contigs from scratch, with overlapping sequences on the ends dictating which read pieces belong next to each other (below left). Alternatively, reads can also be aligned to a reference genome to identify small genetic variations (below right). Where de novo assembly can be thought of as assembling a puzzle without the use of the picture on the box, alignment is the equivalent of piecing together a puzzle by looking at that picture. However, because a singular reference genome fails to capture all of the genetic diversity across humans, some sections of DNA might not be able to align to the reference genome well.
There have been numerous sequence modalities developed in the last quarter century, but major advances include Sanger sequencing, sequencing by synthesis, nanopore long-read sequencing from Oxford Nanopore, and, most recently, high-fidelity single-molecule real-time sequencing from PacBio. These differ in the length of reads they generate, their efficiency, and accuracy, with technologies generally evolving to support faster, cheaper, and more-precise sequencing.
The first sequencing technology invented, and no longer used in modern projects, Sanger sequencing relies on tagging the ends of various sizes of DNA fragments with complementary fluorescent nucleotides. Fragments are then separated by size using gel electrophoresis and the final nucleotides’ fluorescence is read by a laser. The full sequence is inferred by piecing together the end nucleotides of the different-sized fragments. 
        YEARS IN USE: 19802010
        READ LENGTH: ~500–1,000 bases
        CONS: Low throughput, time intensive
Sequencing by synthesis (SBS) is the most commonly used type of sequencing today. It relies on synthesizing complementary DNA strands using fluorescently tagged nucleotides and capturing the output signal on a high-resolution camera. Hundreds of thousands of DNA fragments can be read at once, but SBS is limited to short lengths of DNA, making it challenging to assemble whole genomes de novo.
        YEARS IN USE: 2002today
        READ LENGTH: ~100–500 bases
        CONS: Limited to short reads
Oxford Nanopore devices pull DNA through a bioengineered pore to produce electrical current fluctuations that are then translated into a sequence. This approach generates long reads that can be used for de novo genome assembly or to identify larger structural variations that may not be possible with short reads, but it is less accurate than other sequencing technologies. 
        YEARS IN USE: 2002today
        READ LENGTH: ~10 kb–1 Mb
        CONS: Error-prone
Only recently released by PacBio, high-fidelity (HiFi) single-molecule real-time (SMRT) sequencing relies on similar fluorescence strategies as SBS. Like nanopore sequencing, HiFi produces long reads that can be used for de novo genome assembly or to identify structural variants, but it achieves improved accuracy by circularizing a long DNA molecule so that it can be read dozens of times in a single run.
        YEARS IN USE: 2020today
        READ LENGTH: ~10 kb
        CONS: Currently very expensive
Unlike a linear reference genome, a graph genome allows a single region of the genome to take on a diverse set of sequences. For regions with high genetic diversity, a graph genome can better capture the many human DNA sequences that might exist.
Read the full story.
Interested in exclusive access to more premium content?
© 1986–2022 The Scientist. All rights reserved.