TY - JOUR
T1 - Pathway-Based Single-Cell RNA-Seq Classification, Clustering, and Construction of Gene-Gene Interactions Networks Using Random Forests
AU - Wang, Hailun
AU - Sham, Pak
AU - TONG, Tiejun
AU - Pang, Herbert
N1 - Funding Information:
Manuscript received March 1, 2019; revised June 17, 2019 and September 2, 2019; accepted September 18, 2019. Date of publication October 1, 2019; date of current version June 5, 2020. This work has been partially supported by the Research Grants Council - General Research Fund no. 17157416. (Corresponding author: Herbert Pang.) H. Wang and H. Pang are with the School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China (e-mail: u3005316@connect.hku.hk; herbpang@hku.hk).
Funding Information:
This work has been partially supported by the Research Grants Council - General Research Fund no. 17157416.
PY - 2020/6
Y1 - 2020/6
N2 - Single-cell RNA-Sequencing (scRNA-Seq), an advanced sequencing technique, enables biomedical researchers to characterize cell-specific gene expression profiles. Although studies have adapted machine learning algorithms to cluster different cell populations for scRNA-Seq data, few existing methods have utilized machine learning techniques to investigate functional pathways in classifying heterogeneous cell populations. As genes often work interactively at the pathway level, studying the cellular heterogeneity based on pathways can facilitate the interpretation of biological functions of different cell populations. In this paper, we propose a pathway-based analytic framework using Random Forests (RF) to identify discriminative functional pathways related to cellular heterogeneity as well as to cluster cell populations for scRNA-Seq data. We further propose a novel method to construct gene-gene interactions (GGIs) networks using RF that illustrates important GGIs in differentiating cell populations. The co-occurrence of genes in different discriminative pathways and 'cross-talk' genes connecting those pathways are also illustrated in our networks. Our novel pathway-based framework clusters cell populations, prioritizes important pathways, highlights GGIs and pivotal genes bridging cross-talked pathways, and groups co-functional genes in networks. These features allow biomedical researchers to better understand the functional heterogeneity of different cell populations and to pinpoint important genes driving heterogeneous cellular functions.
AB - Single-cell RNA-Sequencing (scRNA-Seq), an advanced sequencing technique, enables biomedical researchers to characterize cell-specific gene expression profiles. Although studies have adapted machine learning algorithms to cluster different cell populations for scRNA-Seq data, few existing methods have utilized machine learning techniques to investigate functional pathways in classifying heterogeneous cell populations. As genes often work interactively at the pathway level, studying the cellular heterogeneity based on pathways can facilitate the interpretation of biological functions of different cell populations. In this paper, we propose a pathway-based analytic framework using Random Forests (RF) to identify discriminative functional pathways related to cellular heterogeneity as well as to cluster cell populations for scRNA-Seq data. We further propose a novel method to construct gene-gene interactions (GGIs) networks using RF that illustrates important GGIs in differentiating cell populations. The co-occurrence of genes in different discriminative pathways and 'cross-talk' genes connecting those pathways are also illustrated in our networks. Our novel pathway-based framework clusters cell populations, prioritizes important pathways, highlights GGIs and pivotal genes bridging cross-talked pathways, and groups co-functional genes in networks. These features allow biomedical researchers to better understand the functional heterogeneity of different cell populations and to pinpoint important genes driving heterogeneous cellular functions.
KW - cellular heterogeneity
KW - functional pathways
KW - gene-gene interactions (GGIs) networks
KW - Random Forests (RF)
KW - single-cell RNA-Sequencing
UR - http://www.scopus.com/inward/record.url?scp=85086236259&partnerID=8YFLogxK
U2 - 10.1109/JBHI.2019.2944865
DO - 10.1109/JBHI.2019.2944865
M3 - Article
C2 - 31581101
AN - SCOPUS:85086236259
SN - 2168-2194
VL - 24
SP - 1814
EP - 1822
JO - IEEE Journal of Biomedical and Health Informatics
JF - IEEE Journal of Biomedical and Health Informatics
IS - 6
M1 - 8854213
ER -