TY - JOUR
T1 - PhytoCluster: a generative deep learning model for clustering plant single-cell RNA-seq data
AU - Wang, Hao
AU - Fu, Xiangzheng
AU - Liu, Lijia
AU - Wang, Yi
AU - Hong, Jingpeng
AU - Pan, Bintao
AU - Cao, Yaning
AU - Chen, Yanqing
AU - Cao, Yongsheng
AU - Ma, Xiaoding
AU - Fang, Wei
AU - Yan, Shen
N1 - We would like to thank the anonymous reviewers for their valuable suggestions. This work was supported by the National Natural Science Foundation of China (32371996 and 62372158), the National Key R&D Program of China (2022YFF0711802), the STI 2030-Major Projects (2022ZD04017), the National Key Research and Development Program of China (2019YFA0802202 and 2020YFA0803401).
Publisher Copyright:
© The Author(s) 2025.
PY - 2025/6
Y1 - 2025/6
N2 - Single-cell RNA sequencing (scRNA-seq) technology enables a deep understanding of cellular differentiation during plant development and reveals heterogeneity among the cells of a given tissue. However, the computational characterization of such cellular heterogeneity is complicated by the high dimensionality, sparsity, and biological noise inherent to the raw data. Here, we introduce PhytoCluster, an unsupervised deep learning algorithm, to cluster scRNA-seq data by extracting latent features. We benchmarked PhytoCluster against four simulated datasets and five real scRNA-seq datasets with varying protocols and data quality levels. A comprehensive evaluation indicated that PhytoCluster outperforms other methods in clustering accuracy, noise removal, and signal retention. Additionally, we evaluated the performance of the latent features extracted by PhytoCluster across four machine learning models. The computational results highlight the ability of PhytoCluster to extract meaningful information from plant scRNA-seq data, with machine learning models achieving accuracy comparable to that of raw features. We believe that PhytoCluster will be a valuable tool for disentangling complex cellular heterogeneity based on scRNA-seq data.
AB - Single-cell RNA sequencing (scRNA-seq) technology enables a deep understanding of cellular differentiation during plant development and reveals heterogeneity among the cells of a given tissue. However, the computational characterization of such cellular heterogeneity is complicated by the high dimensionality, sparsity, and biological noise inherent to the raw data. Here, we introduce PhytoCluster, an unsupervised deep learning algorithm, to cluster scRNA-seq data by extracting latent features. We benchmarked PhytoCluster against four simulated datasets and five real scRNA-seq datasets with varying protocols and data quality levels. A comprehensive evaluation indicated that PhytoCluster outperforms other methods in clustering accuracy, noise removal, and signal retention. Additionally, we evaluated the performance of the latent features extracted by PhytoCluster across four machine learning models. The computational results highlight the ability of PhytoCluster to extract meaningful information from plant scRNA-seq data, with machine learning models achieving accuracy comparable to that of raw features. We believe that PhytoCluster will be a valuable tool for disentangling complex cellular heterogeneity based on scRNA-seq data.
KW - Cellular heterogeneity
KW - Clustering
KW - Deep learning
KW - Latent features
KW - scRNA-seq
UR - http://www.scopus.com/inward/record.url?scp=85218245999&partnerID=8YFLogxK
U2 - 10.1007/s42994-025-00196-6
DO - 10.1007/s42994-025-00196-6
M3 - Journal article
C2 - 40641652
AN - SCOPUS:85218245999
SN - 2096-6326
VL - 6
SP - 189
EP - 201
JO - aBIOTECH
JF - aBIOTECH
IS - 2
ER -