Improved Metabolic Network Reconstruction and Metabolite Profiles Prediction using Complete and Strain-Resolved Microbial Genomes

Project: Research project

Project Details


The study of metabolic networks and metabolite profiling is crucial for understanding the role of microbial communities in human health and disease. However, accurate results in microbial metabolomics depend on the availability of high-quality genomes and genome annotations. Metagenome-assembled genomes (MAGs) generated from short-read sequencing are often fragmented and incomplete, resulting in overlooked essential genes and incomplete functional annotation of genes involved in metabolic pathways. Long-read sequencing presents an opportunity to reconstruct complete genomes and build accurate metabolic networks.
This project aims to develop a computational framework to reconstruct metabolic networks and to predict metabolite profiles using complete and strain-resolved microbial genomes generated from long-read metagenomic sequencing. Our approach can be divided into four tasks:
Task 1. Generate complete and strain-resolved metagenome-assembled genomes using long-read metagenomic sequencing and available reference genomes. We will design a novel maximum bipartite matching algorithm to traverse the conjugate graph and use sequences from reference genomes as support to link contigs. We will further develop a novel strain phasing
algorithm to extend our previous work by simultaneously considering linkage disequilibrium and long-read connectedness of genomic variants.
Task 2. Taxonomic annotation and gene function prediction using genome specific deep learning language models. Due to a lack of microbial genomic information, current algorithms used to annotate genomes with taxonomic classifications and gene functions rely on searching for marker genes and orthologous proteins, respectively. We will mitigate this limitation by
developing two language models, LMax and LMfun, based on pre-trained and fine-tuning procedures. LMax will be designed to understand the connections between taxonomic ranks and microbial genomic sequences by considering coding and noncoding sequences as well as gene order. LMfun will be focused on modeling the relationship between orthologous protein sequences and gene clusters on the genome to learn how to annotate gene functions.
Task 3. Predict metabolomic profiles using well-characterized microbiomes and
reconstruct high-quality species and multispecies metabolic networks. We will integrate current tools and databases to reconstruct microbial metabolic networks and their accompanying metabolite profiles using complete genomic sequences and high-quality annotations.
Task 4. Application to identify novel biomarkers associated with diarrhea-predominant irritable bowel syndrome (IBS-D). We will carry out long-read sequencing of the gut microbiome of 50 IBS-D and 50 healthy individuals and generate untargeted/targeted metabolomic profiles for their samples. We will then identify novel biomarkers associated with IBS-D and validate them using an independent cohort.
StatusNot started
Effective start/end date30/06/2431/05/27


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.