TY - JOUR
T1 - LSMM: a statistical approach to integrating functional annotations with genome-wide association studies
AU - Ming, Jingsi
AU - Dai, Mingwei
AU - Cai, Mingxuan
AU - Wan, Xiang
AU - Liu, Jin
AU - Yang, Can
N1 - This work was supported in part by the National Science Funding of China [61501389]; the Hong Kong Research Grant Council [22302815, 12316116 and 12301417]; The Hong Kong University of Science and Technology [startup grant R9405]; Innovative Technology Funding of Hong Kong [ITF391/15FX (P0162)]; Duke-NUS Medical School WBS [R-913-200-098-263]; Ministry of Education, Singapore [MOE2016-T2-2-029]; Shenzhen Fundamental Research Fund [KQTD2015033114415450].
Publisher Copyright:
© The Author(s) 2018. Published by Oxford University Press. All rights reserved.
PY - 2018/8
Y1 - 2018/8
N2 - Motivation: Thousands of risk variants underlying complex phenotypes (quantitative traits and diseases) have been identified in genome-wide association studies (GWAS). However, there are still two major challenges towards deepening our understanding of the genetic architectures of complex phenotypes. First, the majority of GWAS hits are in non-coding region and their biological interpretation is still unclear. Second, accumulating evidence from GWAS suggests the polygenicity of complex traits, i.e. a complex trait is often affected by many variants with small or moderate effects, whereas a large proportion of risk variants with small effects remain unknown. Results: The availability of functional annotation data enables us to address the above challenges. In this study, we propose a latent sparse mixed model (LSMM) to integrate functional annotations with GWAS data. Not only does it increase the statistical power of identifying risk variants, but also offers more biological insights by detecting relevant functional annotations. To allow LSMM scalable to millions of variants and hundreds of functional annotations, we developed an efficient variational expectation-maximization algorithm for model parameter estimation and statistical inference. We first conducted comprehensive simulation studies to evaluate the performance of LSMM. Then we applied it to analyze 30 GWAS of complex phenotypes integrated with nine genic category annotations and 127 cell-type specific functional annotations from the Roadmap project. The results demonstrate that our method possesses more statistical power than conventional methods, and can help researchers achieve deeper understanding of genetic architecture of these complex phenotypes.
AB - Motivation: Thousands of risk variants underlying complex phenotypes (quantitative traits and diseases) have been identified in genome-wide association studies (GWAS). However, there are still two major challenges towards deepening our understanding of the genetic architectures of complex phenotypes. First, the majority of GWAS hits are in non-coding region and their biological interpretation is still unclear. Second, accumulating evidence from GWAS suggests the polygenicity of complex traits, i.e. a complex trait is often affected by many variants with small or moderate effects, whereas a large proportion of risk variants with small effects remain unknown. Results: The availability of functional annotation data enables us to address the above challenges. In this study, we propose a latent sparse mixed model (LSMM) to integrate functional annotations with GWAS data. Not only does it increase the statistical power of identifying risk variants, but also offers more biological insights by detecting relevant functional annotations. To allow LSMM scalable to millions of variants and hundreds of functional annotations, we developed an efficient variational expectation-maximization algorithm for model parameter estimation and statistical inference. We first conducted comprehensive simulation studies to evaluate the performance of LSMM. Then we applied it to analyze 30 GWAS of complex phenotypes integrated with nine genic category annotations and 127 cell-type specific functional annotations from the Roadmap project. The results demonstrate that our method possesses more statistical power than conventional methods, and can help researchers achieve deeper understanding of genetic architecture of these complex phenotypes.
UR - http://www.scopus.com/inward/record.url?scp=85055132870&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/bty187
DO - 10.1093/bioinformatics/bty187
M3 - Journal article
C2 - 29608640
AN - SCOPUS:85055132870
SN - 1367-4803
VL - 34
SP - 2788
EP - 2796
JO - Bioinformatics
JF - Bioinformatics
IS - 16
ER -