TY - JOUR
T1 - Multi-study Integration of Brain Cancer Transcriptomes Reveals Organ-Level Molecular Signatures
AU - Sung, Jaeyun
AU - Kim, Pan Jun
AU - Ma, Shuyi
AU - Funk, Cory C.
AU - Magis, Andrew T.
AU - Wang, Yuliang
AU - Hood, Leroy
AU - Geman, Donald
AU - Price, Nathan D.
N1 - Funding: This work was supported by the National Institutes of Health/National Cancer Institute Howard Temin Pathway to Independence Award in Cancer Research (NDP), a Young Investigator Grant from the Roy J. Carver Charitable Trust (NDP), the Camille Dreyfus Teacher-Scholar Awards Program (NDP), Basic Science Research Program through the National Research Foundation of Korea (NRF-2012R1A1A2008925) (PJK), a National Science Foundation Graduate Research Fellowship (SM), the National Institutes of Health/National Center for Research Resources Grant UL1 RR 025005 (DG), and the Grand Duchy of Luxembourg-Institute for Systems Biology Program (LH, NDP). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
PY - 2013/7/25
Y1 - 2013/7/25
N2 - We utilized abundant transcriptomic data for the primary classes of brain cancers to study the feasibility of separating all of these diseases simultaneously based on molecular data alone. These signatures were based on a new method reported herein - Identification of Structured Signatures and Classifiers (ISSAC) - that resulted in a brain cancer marker panel of 44 unique genes. Many of these genes have established relevance to the brain cancers examined herein, with others having known roles in cancer biology. Analyses on large-scale data from multiple sources must deal with significant challenges associated with heterogeneity between different published studies, for it was observed that the variation among individual studies often had a larger effect on the transcriptome than did phenotype differences, as is typical. For this reason, we restricted ourselves to studying only cases where we had at least two independent studies performed for each phenotype, and also reprocessed all the raw data from the studies using a unified pre-processing pipeline. We found that learning signatures across multiple datasets greatly enhanced reproducibility and accuracy in predictive performance on truly independent validation sets, even when keeping the size of the training set the same. This was most likely due to the meta-signature encompassing more of the heterogeneity across different sources and conditions, while amplifying signal from the repeated global characteristics of the phenotype. When molecular signatures of brain cancers were constructed from all currently available microarray data, 90% phenotype prediction accuracy, or the accuracy of identifying a particular brain cancer from the background of all phenotypes, was found. Looking forward, we discuss our approach in the context of the eventual development of organ-specific molecular signatures from peripheral fluids such as the blood.
AB - We utilized abundant transcriptomic data for the primary classes of brain cancers to study the feasibility of separating all of these diseases simultaneously based on molecular data alone. These signatures were based on a new method reported herein - Identification of Structured Signatures and Classifiers (ISSAC) - that resulted in a brain cancer marker panel of 44 unique genes. Many of these genes have established relevance to the brain cancers examined herein, with others having known roles in cancer biology. Analyses on large-scale data from multiple sources must deal with significant challenges associated with heterogeneity between different published studies, for it was observed that the variation among individual studies often had a larger effect on the transcriptome than did phenotype differences, as is typical. For this reason, we restricted ourselves to studying only cases where we had at least two independent studies performed for each phenotype, and also reprocessed all the raw data from the studies using a unified pre-processing pipeline. We found that learning signatures across multiple datasets greatly enhanced reproducibility and accuracy in predictive performance on truly independent validation sets, even when keeping the size of the training set the same. This was most likely due to the meta-signature encompassing more of the heterogeneity across different sources and conditions, while amplifying signal from the repeated global characteristics of the phenotype. When molecular signatures of brain cancers were constructed from all currently available microarray data, 90% phenotype prediction accuracy, or the accuracy of identifying a particular brain cancer from the background of all phenotypes, was found. Looking forward, we discuss our approach in the context of the eventual development of organ-specific molecular signatures from peripheral fluids such as the blood.
UR - http://www.scopus.com/inward/record.url?scp=84880851558&partnerID=8YFLogxK
U2 - 10.1371/journal.pcbi.1003148
DO - 10.1371/journal.pcbi.1003148
M3 - Journal article
C2 - 23935471
AN - SCOPUS:84880851558
SN - 1553-734X
VL - 9
JO - PLoS Computational Biology
JF - PLoS Computational Biology
IS - 7
M1 - e1003148
ER -