TY - JOUR
T1 - LRTK: a platform agnostic toolkit for linked-read analysis of both human genome and metagenome
AU - Yang, Chao
AU - Zhang, Zhenmiao
AU - Huang, Yufen
AU - Xie, Xuefeng
AU - Liao, Herui
AU - Xiao, Jin
AU - Veldsman, Werner Pieter
AU - Yin, Kejing
AU - Fang, Xiaodong
AU - Zhang, Lu
N1 - This research was partially supported by the open project of BGI-Shenzhen, Shenzhen 518000, China (BGIRSZ20220012); the Hong Kong Research Grant Council Early Career Scheme (HKBU 22201419); Young Collaborative Research Grant (C2004- 23Y); Health and Medical Research Fund (11221026); HKBU Start- up Grant Tier 2 (RC-SGT2/19–20/SCI/007); HKBU IRCMS (No. IRCMS/19–20/D02); the Guangdong Basic and Applied Basic Re- search Foundation (No. 2021A1515012226); and the Science Tech- nology and Innovation Committee of Shenzhen Municipality, China (SGDX20190919142801722).
© The Author(s) 2024. Published by Oxford University Press GigaScience.
PY - 2024/6/13
Y1 - 2024/6/13
N2 - BACKGROUND: Linked-read sequencing technologies generate high-base quality short reads that contain extrapolative information on long-range DNA connectedness. These advantages of linked-read technologies are well known and have been demonstrated in many human genomic and metagenomic studies. However, existing linked-read analysis pipelines (e.g., Long Ranger) were primarily developed to process sequencing data from the human genome and are not suited for analyzing metagenomic sequencing data. Moreover, linked-read analysis pipelines are typically limited to 1 specific sequencing platform.FINDINGS: To address these limitations, we present the Linked-Read ToolKit (LRTK), a unified and versatile toolkit for platform agnostic processing of linked-read sequencing data from both human genome and metagenome. LRTK provides functions to perform linked-read simulation, barcode sequencing error correction, barcode-aware read alignment and metagenome assembly, reconstruction of long DNA fragments, taxonomic classification and quantification, and barcode-assisted genomic variant calling and phasing. LRTK has the ability to process multiple samples automatically and provides users with the option to generate reproducible reports during processing of raw sequencing data and at multiple checkpoints throughout downstream analysis. We applied LRTK on linked reads from simulation, mock community, and real datasets for both human genome and metagenome. We showcased LRTK's ability to generate comparative performance results from preceding benchmark studies and to report these results in publication-ready HTML document plots.CONCLUSIONS: LRTK provides comprehensive and flexible modules along with an easy-to-use Python-based workflow for processing linked-read sequencing datasets, thereby filling the current gap in the field caused by platform-centric genome-specific linked-read data analysis tools.
AB - BACKGROUND: Linked-read sequencing technologies generate high-base quality short reads that contain extrapolative information on long-range DNA connectedness. These advantages of linked-read technologies are well known and have been demonstrated in many human genomic and metagenomic studies. However, existing linked-read analysis pipelines (e.g., Long Ranger) were primarily developed to process sequencing data from the human genome and are not suited for analyzing metagenomic sequencing data. Moreover, linked-read analysis pipelines are typically limited to 1 specific sequencing platform.FINDINGS: To address these limitations, we present the Linked-Read ToolKit (LRTK), a unified and versatile toolkit for platform agnostic processing of linked-read sequencing data from both human genome and metagenome. LRTK provides functions to perform linked-read simulation, barcode sequencing error correction, barcode-aware read alignment and metagenome assembly, reconstruction of long DNA fragments, taxonomic classification and quantification, and barcode-assisted genomic variant calling and phasing. LRTK has the ability to process multiple samples automatically and provides users with the option to generate reproducible reports during processing of raw sequencing data and at multiple checkpoints throughout downstream analysis. We applied LRTK on linked reads from simulation, mock community, and real datasets for both human genome and metagenome. We showcased LRTK's ability to generate comparative performance results from preceding benchmark studies and to report these results in publication-ready HTML document plots.CONCLUSIONS: LRTK provides comprehensive and flexible modules along with an easy-to-use Python-based workflow for processing linked-read sequencing datasets, thereby filling the current gap in the field caused by platform-centric genome-specific linked-read data analysis tools.
KW - 10x Genomics
KW - Humans Genome
KW - Metagenome
KW - TELL-Seq
KW - linked-read sequencing
KW - stLFR
KW - metagenome
KW - human genome
UR - http://www.scopus.com/inward/record.url?scp=85196075493&partnerID=8YFLogxK
U2 - 10.1093/gigascience/giae028
DO - 10.1093/gigascience/giae028
M3 - Journal article
C2 - 38869148
SN - 2047-217X
VL - 13
JO - GigaScience
JF - GigaScience
M1 - giae028
ER -