Uncovering Spatiotemporal Patterns of Disease Diffusion through Data-Driven Phylogeographic Inference

Project: Research project

Project Details


The geographical spread of infectious diseases has been and will continue to be a serious public health concern in both Hong Kong and mainland China. Typical examples of these diseases include the influenza A(H5N1) outbreak in 1997, the swine flu (H1N1) pandemic in 2009, and the influenza A(H7N9) epidemic in 2013.

Computationally, disease diffusion networks can be used to characterize how these diseases spread from one geographic location to another. In these networks, nodes represent locations and edges represent the diffusion dynamics between the locations. From a particular disease diffusion network, we can identify underlying disease hotspots and critical diffusion paths, information that can help public health authorities to achieve active surveillance and efficient control of the disease. However, in practice, such networks are often hidden; we can observe only the locations and times or genome sequences of disease incidences. In this project, we will develop and evaluate an integrated Bayesian inference approach to uncovering real-world disease diffusion networks. The uniqueness of this approach is that it mines and incorporates informative priors from heterogeneous data sources. Examples of such priors include the spatiotemporal distribution of the disease hosts and the evolutionary relationships of the viral sequences.

Taking the influenza A(H5N1) viruses in China as a case study, this project contributes to the interdisciplinary field of phylogeographic inference (that is aimed to reveal the geographic spread of disease viruses based on their evolutionary relationships, i.e., phylogenetic trees). Firstly, we propose a Bayesian inference approach to mapping disease diffusion networks based on a reconstructed phylogenetic tree. Secondly, we develop a novel clustering method to characterize spatiotemporal distributions of disease hosts from sparse and biased observation data to avoid the general network inference problem of estimating a large number of free parameters (i.e., n(n-1) unknown diffusion rates for n locations). Thirdly, we design novel Markov chain Monte Carlo algorithms to account for the uncertainties in generating phylogenetic trees from the genome sequences of disease viruses and evaluate the proposed methods/algorithms by comparing them with those from existing studies, using both synthetic and real-world datasets.

As far as we know, this project is one of the first attempts to incorporate spatial ecology and viral evolution in a phylogeographic inference study. Such computationally obtained results can offer new insights into investigating diffusion networks of other diseases, whose geographic diffusion is caused by host mobility and migration.
Effective start/end date1/11/1530/04/19

UN Sustainable Development Goals

In 2015, UN member states agreed to 17 global Sustainable Development Goals (SDGs) to end poverty, protect the planet and ensure prosperity for all. This project contributes towards the following SDG(s):

  • SDG 3 - Good Health and Well-being
  • SDG 11 - Sustainable Cities and Communities
  • SDG 13 - Climate Action


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.