De novo diploid genome assembly for genome-wide structural variant detection

Lu Zhang, Xin Zhou, Ziming Weng, Arend Sidow*

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

7 Citations (Scopus)


Detection of structural variants (SVs) on the basis of read alignment to a reference genome remains a difficult problem. De novo assembly, traditionally used to generate reference genomes, offers an alternative for SV detection. However, it has not been applied broadly to human genomes because of fundamental limitations of short-fragment approaches and high cost of long-read technologies. We here show that 10× linked-read sequencing supports accurate SV detection. We examined variants in six de novo 10× assemblies with diverse experimental parameters from two commonly used human cell lines: NA12878 and NA24385. The assemblies are effective for detecting mid-size SVs, which were discovered by simple pairwise alignment of the assemblies’ contigs to the reference (hg38). Our study also shows that the base-pair level SV breakpoint accuracy is high, with a majority of SVs having precisely correct sizes and breakpoints. Setting the ancestral state of SV loci by comparing to ape orthologs allows inference of the actual molecular mechanism (insertion or deletion) causing the mutation. In about half of cases, the mechanism is the opposite of the reference-based call. We uncover 214 SVs that may have been maintained as polymorphisms in the human lineage since before our divergence from chimp. Overall, we show that de novo assembly of 10× linked-read data can achieve cost-effective SV detection for personal genomes.
Original languageEnglish
Article numberlqz018
Number of pages10
JournalNAR Genomics and Bioinformatics
Issue number1
Publication statusPublished - 1 Mar 2020

Scopus Subject Areas

  • Genetics
  • Structural Biology
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics


Dive into the research topics of 'De novo diploid genome assembly for genome-wide structural variant detection'. Together they form a unique fingerprint.

Cite this