Exploring high-quality microbial genomes by assembling short-reads with long-range connectivity

Zhenmiao Zhang, Jin Xiao, Hongbo Wang, Chao Yang, Yufen Huang, Zhen Yue, Yang Chen, Lijuan Han, Kejing Yin, Aiping Lyu, Xiaodong Fang, Lu Zhang*

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

Abstract

Although long-read sequencing enables the generation of complete genomes for unculturable microbes, its high cost limits the widespread adoption of long-read sequencing in large-scale metagenomic studies. An alternative method is to assemble short-reads with long-range connectivity, which can be a cost-effective way to generate high-quality microbial genomes. Here, we develop Pangaea, a bioinformatic approach designed to enhance metagenome assembly using short-reads with long-range connectivity. Pangaea leverages connectivity derived from physical barcodes of linked-reads or virtual barcodes by aligning short-reads to long-reads. Pangaea utilizes a deep learning-based read binning algorithm to assemble co-barcoded reads exhibiting similar sequence contexts and abundances, thereby improving the assembly of high- and medium-abundance microbial genomes. Pangaea also leverages a multi-thresholding algorithm strategy to refine assembly for low-abundance microbes. We benchmark Pangaea on linked-reads and a combination of short- and long-reads from simulation data, mock communities and human gut metagenomes. Pangaea achieves significantly higher contig continuity as well as more near-complete metagenome-assembled genomes (NCMAGs) than the existing assemblers. Pangaea also generates three complete and circular NCMAGs on the human gut microbiomes.

Original languageEnglish
Article number4631
Number of pages18
JournalNature Communications
Volume15
Issue number1
DOIs
Publication statusPublished - 31 May 2024

Scopus Subject Areas

  • Physics and Astronomy(all)
  • Chemistry(all)
  • Biochemistry, Genetics and Molecular Biology(all)

Fingerprint

Dive into the research topics of 'Exploring high-quality microbial genomes by assembling short-reads with long-range connectivity'. Together they form a unique fingerprint.

Cite this