An ensemble deep learning framework to refine large deletions in linked-reads

Yunfei Hu, Sanidhya V Mangal, Lu Zhang, Xin Zhou

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

Abstract

The detection of structural variants (SVs) remains challenging due to inconsistencies in detected breakpoints and biological complexity of some rearrangements. Linked-reads have demonstrated their superiority in diploid genome assembly and SV detection. Recently developed tools Aquila and Aquila_stLFR use a reference sequence and linked-reads to generate a high quality diploid genome assembly, using which they then detect and phase personal genetic variations. However, they both produce a substantial proportion of false positive deletion SV calls. To take full advantage of linked-reads, an effective downstream filtering and refinement framework is needed pressingly. In this work, we propose AquilaDeepFilter to filter large deletion SVs from Aquila and Aquila_stLFR. AquilaDeepFilter relies on a deep learning ensemble approach by integrating several state-of-the-art CNN backbones. The filtering of deletion SVs is formulated as a binary classification task on image data that are generated through the extraction of multiple alignment signals, including read depth, split reads and discordant read pairs. Three linked-reads libraries sequenced from the well-studied sample NA24385 and the gold standard of GiaB benchmark were used to perform thorough experiments on our proposed method. The results demonstrated that AquilaDeepFilter could increase the precision rate of Aquila while the recall rate of Aquila decreased only slightly, and the overall F1 improved by 20%. Furthermore, AquilaDeepFilter outperformed another deep learning based method for SV filtering, DeepSVFilter. Even though we designed AquilaDeepFilter for linked-reads, the framework could also be used to improve SV detection on short reads.
Original languageEnglish
Title of host publication2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
PublisherIEEE
Pages288-295
Number of pages8
ISBN (Electronic)9781665401265
ISBN (Print)9781665429825
DOIs
Publication statusPublished - 12 Dec 2021
Event2021 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021 - Houston, United States
Duration: 9 Dec 202112 Dec 2021
https://ieeexplore.ieee.org/xpl/conhome/9669261/proceeding

Conference

Conference2021 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021
Country/TerritoryUnited States
CityHouston
Period9/12/2112/12/21
Internet address

User-Defined Keywords

  • Deep learning
  • ensemble method
  • structural variants
  • linked-reads
  • convolutional neural networks

Fingerprint

Dive into the research topics of 'An ensemble deep learning framework to refine large deletions in linked-reads'. Together they form a unique fingerprint.

Cite this