TY - JOUR
T1 - Benchmarking genome assembly methods on metagenomic sequencing data
AU - Zhang, Zhenmiao
AU - Yang, Chao
AU - Veldsman, Werner Pieter
AU - Fang, Xiaodong
AU - Zhang, Lu
N1 - Funding Information:
This research was partially supported by the Guangdong Basic and Applied Basic Research Foundation (No. 2019A1515011046 and No. 2021A1515012226); the Hong Kong Research Grant Council Early Career Scheme (HKBU 22201419), HKBU IRCMS (No. IRCMS/19-20/D02) and a grant from Shenzhen Science and Technology Innovation Commission (SZSTI) – Shenzhen Virtual University Park (SZVUP) Special Fund Project (No. 2021Szvup135). This project is also supported by open project of BGI-Shenzhen, Shenzhen 518000, China. The design of the study and collection, analysis and interpretation of data were partially supported by the Science Technology and Innovation Committee of Shenzhen Municipality, China (SGDX20190919142801722).
Publisher Copyright:
© The Author(s) 2023. Published by Oxford University Press. All rights reserved.
PY - 2023/3
Y1 - 2023/3
N2 - Metagenome assembly is an efficient approach to reconstruct microbial genomes from metagenomic sequencing data. Although short-read sequencing has been widely used for metagenome assembly, linked- and long-read sequencing have shown their advancements in assembly by providing long-range DNA connectedness. Many metagenome assembly tools were developed to simplify the assembly graphs and resolve the repeats in microbial genomes. However, there remains no comprehensive evaluation of metagenomic sequencing technologies, and there is a lack of practical guidance on selecting the appropriate metagenome assembly tools. This paper presents a comprehensive benchmark of 19 commonly used assembly tools applied to metagenomic sequencing datasets obtained from simulation, mock communities or human gut microbiomes. These datasets were generated using mainstream sequencing platforms, such as Illumina and BGISEQ short-read sequencing, 10x Genomics linked-read sequencing, and PacBio and Oxford Nanopore long-read sequencing. The assembly tools were extensively evaluated against many criteria, which revealed that long-read assemblers generated high contig contiguity but failed to reveal some medium- and high-quality metagenome-assembled genomes (MAGs). Linked-read assemblers obtained the highest number of overall near-complete MAGs from the human gut microbiomes. Hybrid assemblers using both short- and long-read sequencing were promising methods to improve both total assembly length and the number of near-complete MAGs. This paper also discussed the running time and peak memory consumption of these assembly tools and provided practical guidance on selecting them.
AB - Metagenome assembly is an efficient approach to reconstruct microbial genomes from metagenomic sequencing data. Although short-read sequencing has been widely used for metagenome assembly, linked- and long-read sequencing have shown their advancements in assembly by providing long-range DNA connectedness. Many metagenome assembly tools were developed to simplify the assembly graphs and resolve the repeats in microbial genomes. However, there remains no comprehensive evaluation of metagenomic sequencing technologies, and there is a lack of practical guidance on selecting the appropriate metagenome assembly tools. This paper presents a comprehensive benchmark of 19 commonly used assembly tools applied to metagenomic sequencing datasets obtained from simulation, mock communities or human gut microbiomes. These datasets were generated using mainstream sequencing platforms, such as Illumina and BGISEQ short-read sequencing, 10x Genomics linked-read sequencing, and PacBio and Oxford Nanopore long-read sequencing. The assembly tools were extensively evaluated against many criteria, which revealed that long-read assemblers generated high contig contiguity but failed to reveal some medium- and high-quality metagenome-assembled genomes (MAGs). Linked-read assemblers obtained the highest number of overall near-complete MAGs from the human gut microbiomes. Hybrid assemblers using both short- and long-read sequencing were promising methods to improve both total assembly length and the number of near-complete MAGs. This paper also discussed the running time and peak memory consumption of these assembly tools and provided practical guidance on selecting them.
KW - genome assembly tools
KW - linked-read sequencing
KW - long-read sequencing
KW - metagenome-assembled genome
KW - metagenomic sequencing
KW - short-read sequencing
UR - http://www.scopus.com/inward/record.url?scp=85150666314&partnerID=8YFLogxK
U2 - 10.1093/bib/bbad087
DO - 10.1093/bib/bbad087
M3 - Journal article
SN - 1467-5463
VL - 24
JO - Briefings in Bioinformatics
JF - Briefings in Bioinformatics
IS - 2
M1 - bbad087
ER -