Abstract
Single-cell RNA-Sequencing (scRNA-Seq), an advanced sequencing technique, enables biomedical researchers to characterize cell-specific gene expression profiles. Although studies have adapted machine learning algorithms to cluster different cell populations for scRNA-Seq data, few existing methods have utilized machine learning techniques to investigate functional pathways in classifying heterogeneous cell populations. As genes often work interactively at the pathway level, studying the cellular heterogeneity based on pathways can facilitate the interpretation of biological functions of different cell populations. In this paper, we propose a pathway-based analytic framework using Random Forests (RF) to identify discriminative functional pathways related to cellular heterogeneity as well as to cluster cell populations for scRNA-Seq data. We further propose a novel method to construct gene-gene interactions (GGIs) networks using RF that illustrates important GGIs in differentiating cell populations. The co-occurrence of genes in different discriminative pathways and 'cross-talk' genes connecting those pathways are also illustrated in our networks. Our novel pathway-based framework clusters cell populations, prioritizes important pathways, highlights GGIs and pivotal genes bridging cross-talked pathways, and groups co-functional genes in networks. These features allow biomedical researchers to better understand the functional heterogeneity of different cell populations and to pinpoint important genes driving heterogeneous cellular functions.
Original language | English |
---|---|
Article number | 8854213 |
Pages (from-to) | 1814-1822 |
Number of pages | 9 |
Journal | IEEE Journal of Biomedical and Health Informatics |
Volume | 24 |
Issue number | 6 |
DOIs | |
Publication status | Published - Jun 2020 |
Scopus Subject Areas
- Biotechnology
- Computer Science Applications
- Electrical and Electronic Engineering
- Health Information Management
User-Defined Keywords
- cellular heterogeneity
- functional pathways
- gene-gene interactions (GGIs) networks
- Random Forests (RF)
- single-cell RNA-Sequencing