Pathway-Based Single-Cell RNA-Seq Classification, Clustering, and Construction of Gene-Gene Interactions Networks Using Random Forests

Hailun Wang, Pak Sham, Tiejun Tong, Herbert Pang*

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

16 Citations (Scopus)


Single-cell RNA-Sequencing (scRNA-Seq), an advanced sequencing technique, enables biomedical researchers to characterize cell-specific gene expression profiles. Although studies have adapted machine learning algorithms to cluster different cell populations for scRNA-Seq data, few existing methods have utilized machine learning techniques to investigate functional pathways in classifying heterogeneous cell populations. As genes often work interactively at the pathway level, studying the cellular heterogeneity based on pathways can facilitate the interpretation of biological functions of different cell populations. In this paper, we propose a pathway-based analytic framework using Random Forests (RF) to identify discriminative functional pathways related to cellular heterogeneity as well as to cluster cell populations for scRNA-Seq data. We further propose a novel method to construct gene-gene interactions (GGIs) networks using RF that illustrates important GGIs in differentiating cell populations. The co-occurrence of genes in different discriminative pathways and 'cross-talk' genes connecting those pathways are also illustrated in our networks. Our novel pathway-based framework clusters cell populations, prioritizes important pathways, highlights GGIs and pivotal genes bridging cross-talked pathways, and groups co-functional genes in networks. These features allow biomedical researchers to better understand the functional heterogeneity of different cell populations and to pinpoint important genes driving heterogeneous cellular functions.

Original languageEnglish
Article number8854213
Pages (from-to)1814-1822
Number of pages9
JournalIEEE Journal of Biomedical and Health Informatics
Issue number6
Publication statusPublished - Jun 2020

Scopus Subject Areas

  • Biotechnology
  • Computer Science Applications
  • Electrical and Electronic Engineering
  • Health Information Management

User-Defined Keywords

  • cellular heterogeneity
  • functional pathways
  • gene-gene interactions (GGIs) networks
  • Random Forests (RF)
  • single-cell RNA-Sequencing


Dive into the research topics of 'Pathway-Based Single-Cell RNA-Seq Classification, Clustering, and Construction of Gene-Gene Interactions Networks Using Random Forests'. Together they form a unique fingerprint.

Cite this