TY - JOUR
T1 - A comprehensive investigation of metagenome assembly by linked-read sequencing
AU - Zhang, Lu
AU - Fang, Xiaodong
AU - Liao, Herui
AU - Zhang, Zhenmiao
AU - Zhou, Xin
AU - Han, Lijuan
AU - Chen, Yang
AU - Qiu, Qinwei
AU - Li, Shuai Cheng
N1 - Funding Information:
LZ is supported by General Research Fund No. 22201419 HKSRA, IRCMS No. IRCMS/19-20/D02 HKBU, Guangdong Basic and Applied Basic Research Foundation, No. 2019A1515011046. Acknowledgements
Funding Information:
We would first thank Arend Sidow for his informative comments and revision for the paper. We also thank Research Committee of Hong Kong Baptist University and Interdisciplinary Research Clusters Matching Scheme for their kind support this project.
PY - 2020/11/11
Y1 - 2020/11/11
N2 - Background: The human microbiota are complex systems with important roles in our physiological activities and diseases. Sequencing the microbial genomes in the microbiota can help in our interpretation of their activities. The vast majority of the microbes in the microbiota cannot be isolated for individual sequencing. Current metagenomics practices use short-read sequencing to simultaneously sequence a mixture of microbial genomes. However, these results are in ambiguity during genome assembly, leading to unsatisfactory microbial genome completeness and contig continuity. Linked-read sequencing is able to remove some of these ambiguities by attaching the same barcode to the reads from a long DNA fragment (10–100 kb), thus improving metagenome assembly. However, it is not clear how the choices for several parameters in the use of linked-read sequencing affect the assembly quality. Results: We first examined the effects of read depth (C) on metagenome assembly from linked-reads in simulated data and a mock community. The results showed that C positively correlated with the length of assembled sequences but had little effect on their qualities. The latter observation was corroborated by tests using real data from the human gut microbiome, where C demonstrated minor impact on the sequence quality as well as on the proportion of bins annotated as draft genomes. On the other hand, metagenome assembly quality was susceptible to read depth per fragment (CR) and DNA fragment physical depth (CF). For the same C, deeper CR resulted in more draft genomes while deeper CF improved the quality of the draft genomes. We also found that average fragment length (μFL) had marginal effect on assemblies, while fragments per partition (NF/P) impacted the off-target reads involved in local assembly, namely, lower NF/P values would lead to better assemblies by reducing the ambiguities of the off-target reads. In general, the use of linked-reads improved the assembly for contig N50 when compared to Illumina short-reads, but not when compared to PacBio CCS (circular consensus sequencing) long-reads. Conclusions: We investigated the influence of linked-read sequencing parameters on metagenome assembly comprehensively. While the quality of genome assembly from linked-reads cannot rival that from PacBio CCS long-reads, the case for using linked-read sequencing remains persuasive due to its low cost and high base-quality. Our study revealed that the probable best practice in using linked-reads for metagenome assembly was to merge the linked-reads from multiple libraries, where each had sufficient CR but a smaller amount of input DNA. [MediaObject not available: see fulltext.].
AB - Background: The human microbiota are complex systems with important roles in our physiological activities and diseases. Sequencing the microbial genomes in the microbiota can help in our interpretation of their activities. The vast majority of the microbes in the microbiota cannot be isolated for individual sequencing. Current metagenomics practices use short-read sequencing to simultaneously sequence a mixture of microbial genomes. However, these results are in ambiguity during genome assembly, leading to unsatisfactory microbial genome completeness and contig continuity. Linked-read sequencing is able to remove some of these ambiguities by attaching the same barcode to the reads from a long DNA fragment (10–100 kb), thus improving metagenome assembly. However, it is not clear how the choices for several parameters in the use of linked-read sequencing affect the assembly quality. Results: We first examined the effects of read depth (C) on metagenome assembly from linked-reads in simulated data and a mock community. The results showed that C positively correlated with the length of assembled sequences but had little effect on their qualities. The latter observation was corroborated by tests using real data from the human gut microbiome, where C demonstrated minor impact on the sequence quality as well as on the proportion of bins annotated as draft genomes. On the other hand, metagenome assembly quality was susceptible to read depth per fragment (CR) and DNA fragment physical depth (CF). For the same C, deeper CR resulted in more draft genomes while deeper CF improved the quality of the draft genomes. We also found that average fragment length (μFL) had marginal effect on assemblies, while fragments per partition (NF/P) impacted the off-target reads involved in local assembly, namely, lower NF/P values would lead to better assemblies by reducing the ambiguities of the off-target reads. In general, the use of linked-reads improved the assembly for contig N50 when compared to Illumina short-reads, but not when compared to PacBio CCS (circular consensus sequencing) long-reads. Conclusions: We investigated the influence of linked-read sequencing parameters on metagenome assembly comprehensively. While the quality of genome assembly from linked-reads cannot rival that from PacBio CCS long-reads, the case for using linked-read sequencing remains persuasive due to its low cost and high base-quality. Our study revealed that the probable best practice in using linked-reads for metagenome assembly was to merge the linked-reads from multiple libraries, where each had sufficient CR but a smaller amount of input DNA. [MediaObject not available: see fulltext.].
KW - Linked-reads
KW - Metagenome assembly
KW - PacBio CCS long-reads
KW - Parameter space
KW - Short-reads
UR - http://www.scopus.com/inward/record.url?scp=85095997061&partnerID=8YFLogxK
U2 - 10.1186/s40168-020-00929-3
DO - 10.1186/s40168-020-00929-3
M3 - Journal article
C2 - 33176883
AN - SCOPUS:85095997061
SN - 2049-2618
VL - 8
JO - Microbiome
JF - Microbiome
IS - 1
M1 - 156
ER -