A computational framework to prioritize disease-associated low frequency variants from Identity-By-Descent regions

Project: Research project

Project Details


Many efforts have been made to investigate the role of single nucleotide variant (SNV) in human complex diseases. The identified disease susceptible common and rare SNVs can explain only a small proportion of disease heritability. Low allele frequency (LAF) SNV is another kind of variant that is not easy to be investigated because it requires a large sample size for association analysis or multiple families with similar manifestations for linkage analysis. The disease-associated LAF SNVs could be located in Identity-By-Descent (IBD) regions that are inherited from recent common ancestor and carried by sporadic patients. Extensive work has been done for IBD detection by common SNVs, losing the power from LAF SNVs to improve the sensitivity of detecting short IBD regions. In addition, previous tools only consider the effects of SNVs on gene functions rather than on particular diseases. Identifying which SNV in the IBD region is truly associated with the disease pose a substantial challenge. In this proposal, we plan to design a computational framework to prioritize disease- associated LAF SNVs from IBD regions. A novel statistical model is designed to detect IBD regions by making use of LAF SNVs from sporadic patients. For LAF SNVs, the calculation of haplotype frequency is influenced by sequencing errors substantially. Instead, the model assumes the haplotype is sampled from Bernoulli distribution and its parameter could be evaluated and updated by the observations from training and test sets. The model further finds a way to remove the influence of sequencing error in parameter estimation to avoid double counting of them. Our preliminary study proved the shared LAF SNVs were tended to be observed in IBD regions and hardly to be observed by random chance.

The disease-specific impacts of LAF SNVs are evaluated by integrating gene functional weights and SNV pathogenicity scores. For each IBD region, we apply bi-clustering algorithm to identify the patients sharing the same disease manifestations, which are further used to calculate the susceptibilities of IBD regions by comparing with the number of carriers in controls. The disease seed genes are identified by examining patients’ manifestations and used to calculate gene functional weight, which is calculated by the functional similarities between SNV altering and disease seed genes. The gene functional weight is further integrated with SNV pathogenicity scores to calculate disease-specific SNV pathogenicity score.
Effective start/end date1/09/1928/02/23

UN Sustainable Development Goals

In 2015, UN member states agreed to 17 global Sustainable Development Goals (SDGs) to end poverty, protect the planet and ensure prosperity for all. This project contributes towards the following SDG(s):

  • SDG 3 - Good Health and Well-being


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.