TY - JOUR
T1 - De novo diploid genome assembly for genome-wide structural variant detection
AU - Zhang, Lu
AU - Zhou, Xin
AU - Weng, Ziming
AU - Sidow, Arend
N1 - Funding Information:
This research was supported by training and research grants from the National Institute of Standards and Technology, Gaithersburg, MD, USA.
Publisher Copyright:
© The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.
PY - 2020/3/1
Y1 - 2020/3/1
N2 - Detection of structural variants (SVs) on the basis of read alignment to a reference genome remains a difficult problem. De novo
assembly, traditionally used to generate reference genomes, offers an
alternative for SV detection. However, it has not been applied broadly
to human genomes because of fundamental limitations of short-fragment
approaches and high cost of long-read technologies. We here show that
10× linked-read sequencing supports accurate SV detection. We examined
variants in six de novo 10× assemblies with diverse
experimental parameters from two commonly used human cell lines: NA12878
and NA24385. The assemblies are effective for detecting mid-size SVs,
which were discovered by simple pairwise alignment of the assemblies’
contigs to the reference (hg38). Our study also shows that the base-pair
level SV breakpoint accuracy is high, with a majority of SVs having
precisely correct sizes and breakpoints. Setting the ancestral state of
SV loci by comparing to ape orthologs allows inference of the actual
molecular mechanism (insertion or deletion) causing the mutation. In
about half of cases, the mechanism is the opposite of the
reference-based call. We uncover 214 SVs that may have been maintained
as polymorphisms in the human lineage since before our divergence from
chimp. Overall, we show that de novo assembly of 10× linked-read data can achieve cost-effective SV detection for personal genomes.
AB - Detection of structural variants (SVs) on the basis of read alignment to a reference genome remains a difficult problem. De novo
assembly, traditionally used to generate reference genomes, offers an
alternative for SV detection. However, it has not been applied broadly
to human genomes because of fundamental limitations of short-fragment
approaches and high cost of long-read technologies. We here show that
10× linked-read sequencing supports accurate SV detection. We examined
variants in six de novo 10× assemblies with diverse
experimental parameters from two commonly used human cell lines: NA12878
and NA24385. The assemblies are effective for detecting mid-size SVs,
which were discovered by simple pairwise alignment of the assemblies’
contigs to the reference (hg38). Our study also shows that the base-pair
level SV breakpoint accuracy is high, with a majority of SVs having
precisely correct sizes and breakpoints. Setting the ancestral state of
SV loci by comparing to ape orthologs allows inference of the actual
molecular mechanism (insertion or deletion) causing the mutation. In
about half of cases, the mechanism is the opposite of the
reference-based call. We uncover 214 SVs that may have been maintained
as polymorphisms in the human lineage since before our divergence from
chimp. Overall, we show that de novo assembly of 10× linked-read data can achieve cost-effective SV detection for personal genomes.
UR - http://www.scopus.com/inward/record.url?scp=85101132942&partnerID=8YFLogxK
U2 - 10.1093/nargab/lqz018
DO - 10.1093/nargab/lqz018
M3 - Journal article
AN - SCOPUS:85101132942
SN - 2631-9268
VL - 2
JO - NAR Genomics and Bioinformatics
JF - NAR Genomics and Bioinformatics
IS - 1
M1 - lqz018
ER -