Skip Navigation


CIT can broadcast your seminar, conference or meeting live to a world-wide audience over the Internet as a real-time streaming video. The event can be recorded and made available for viewers to watch at their convenience as an on-demand video or a downloadable file. CIT can also broadcast NIH-only or HHS-only content.

Biowulf 20th Anniversary Symposium: Telomere-to-telomere assembly of a complete human X chromosome

Loading video...

216 Views  
   
Air date: Thursday, February 28, 2019, 10:30:00 AM
Time displayed is Eastern Time, Washington DC Local
Views: Total views: 216, (110 Live, 106 On-demand)
Category: Special
Runtime: 01:14:41
Description: Biowulf Seminar

Release of the first human genome assembly was a landmark achievement, and after nearly two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no one chromosome has yet been finished end to end, and hundreds of gaps persist across the genome. These unresolved regions include segmental duplications, ribosomal rRNA gene arrays, and satellite arrays that harbor unexplored variation of unknown consequence. We aim to finish these remaining regions and generate the first truly complete assembly of a human genome.

Here we announce a whole-genome de novo assembly that surpasses the continuity of GRCh38, along with the first complete, telomere-to-telomere assembly of a human X chromosome. In total, we collected 40X coverage of ultra-long Oxford Nanopore sequencing for the CHM13hTERT cell line, including 44 Gb of sequence in reads >100 kb and a maximum read length exceeding 1 Mb. This unprecedented coverage of ultra-long reads enabled the resolution of most repeats in the genome, including large fractions of the centromeric satellite arrays and short arms of the acrocentrics. A de novo assembly combining this nanopore data with 70X of existing PacBio data achieved an NG50 contig size of 75 Mb (compared to 56 Mb for GRCh38), with some chromosomes broken only at the centromere. Using this assembly as a basis, we chose to manually finish the X chromosome. The few unresolved segmental duplications were assembled using ultra-long reads spanning the individual copies, and the ~2.7 Mbp X centromere was assembled by identifying unique variants within the array and using these to anchor overlapping ultra-long reads. These results demonstrate that it is now possible to finish entire human chromosomes without gaps, and our future work will focus on completing and validating the remainder of the genome.
Debug: Show Debug
NLM Title: Telomere-to-telomere assembly of a complete human X chromosome / Sergey Koren.
Author: Koren, Sergey.
Biowulf 20th Anniversary Symposium
National Institutes of Health (U.S.),
Publisher:
Abstract: (CIT): Release of the first human genome assembly was a landmark achievement, and after nearly two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no one chromosome has yet been finished end to end, and hundreds of gaps persist across the genome. These unresolved regions include segmental duplications, ribosomal rRNA gene arrays, and satellite arrays that harbor unexplored variation of unknown consequence. We aim to finish these remaining regions and generate the first truly complete assembly of a human genome. Here we announce a whole-genome de novo assembly that surpasses the continuity of GRCh38, along with the first complete, telomere-to-telomere assembly of a human X chromosome. In total, we collected 40X coverage of ultra-long Oxford Nanopore sequencing for the CHM13hTERT cell line, including 44 Gb of sequence in reads >100 kb and a maximum read length exceeding 1 Mb. This unprecedented coverage of ultra-long reads enabled the resolution of most repeats in the genome, including large fractions of the centromeric satellite arrays and short arms of the acrocentrics. A de novo assembly combining this nanopore data with 70X of existing PacBio data achieved an NG50 contig size of 75 Mb (compared to 56 Mb for GRCh38), with some chromosomes broken only at the centromere. Using this assembly as a basis, we chose to manually finish the X chromosome. The few unresolved segmental duplications were assembled using ultra-long reads spanning the individual copies, and the ~2.7 Mbp X centromere was assembled by identifying unique variants within the array and using these to anchor overlapping ultra-long reads. These results demonstrate that it is now possible to finish entire human chromosomes without gaps, and our future work will focus on completing and validating the remainder of the genome.
Subjects: Chromosomes, Human, X--genetics
Medical Informatics Computing
Telomere
Whole Genome Sequencing
Publication Types: Congress
Webcasts
Download: To download this event, select one of the available bitrates:
[64k]  [150k]  [240k]  [440k]  [740k]  [1040k]  [1240k]  [1440k]    How to download a Videocast
Caption Text: Download Caption File
NLM Classification: QU 470
NLM ID: 101744872
CIT Live ID: 31558
Permanent link: https://videocast.nih.gov/launch.asp?27344