TY - JOUR
T1 - Resolution-Adaptive Binning Enhances Machine Learning Modeling by Interbatch and Multiplatform Orbitrap-Based Shotgun Mass Spectrometry Data Integration
AU - Ngan, Hiu-Lok
AU - Zhang, Jialing
AU - Kwan, Kenneth Kin-Leung
AU - Cheu, Jacinth Wing-Sum
AU - Zhong, Li
AU - Guo, Yike
AU - Yang, Xian
AU - Wong, Carmen Chak-Lui
AU - Yan, Hong
AU - Cai, Zongwei
N1 - J.Z. thanks the Collaborative Research Fund from the Research Grants Council of Hong Kong SAR (C2011-21GF) and the Tier2 Fund (RC-OFSGT_20_21_SCI_007) from Hong Kong Baptist University (HKBU). H.Y. thanks the Start-up Grant for New Academics – YAN Hong (165520). The authors from HKBU appreciate the contributions made by C.C.-L.W.’s team in developing and preparing the animal models. They also acknowledge Beijing Viktor Technology Co., Ltd. for their instrument support. Special thanks are extended to Mrs. Junru Xia for her valuable discussions in algorithm development.
Publisher copyright:
© 2025 The Authors. Published by American Chemical Society.
PY - 2025/12/9
Y1 - 2025/12/9
N2 - Machine learning (ML) modeling on mass spectrometry (MS)-based shotgun data facilitates feature selection and disease modeling. However, batch-specific models often struggle with limited transferability and generalizability, necessitating data integration from multiple batches and platforms. Traditional binning methods can either disintegrate or aggregate m/z features, making data combination unreliable. In this study, we introduce a mass resolution-adaptive binning and integration strategy to overcome these challenges. This approach recovers 88-99% of ground truth features in a low mass region (70-434 m/z) from 49 mixed standard solutions at 250, 500, and 1000 ppb. Compared to conventional methods, it demonstrates stable binning and integration across low (100-450 m/z), mid (450-900 m/z), and high (900-1500 m/z) mass regions, resulting in superior predictive models. Using a mouse model of hepatocellular carcinoma as a proof-of-concept study, we identify 10 generic metabolites that showcase advancements in using ambient MS imaging (MSI) data for modeling and deploy the attained model to shotgun data. This facilitates disease detection via various sample introduction methods, including MSI on liver cryosections (F1 score = 0.87) and glass smears (F1 score = 0.80), as well as rapid direct infusion analysis (recall = 0.89 and precision = 0.63). This novel mass resolution-adaptive binning and integration strategy offers a promising approach for integrating different data sets, potentially improving disease detection accuracy in MS applications.
AB - Machine learning (ML) modeling on mass spectrometry (MS)-based shotgun data facilitates feature selection and disease modeling. However, batch-specific models often struggle with limited transferability and generalizability, necessitating data integration from multiple batches and platforms. Traditional binning methods can either disintegrate or aggregate m/z features, making data combination unreliable. In this study, we introduce a mass resolution-adaptive binning and integration strategy to overcome these challenges. This approach recovers 88-99% of ground truth features in a low mass region (70-434 m/z) from 49 mixed standard solutions at 250, 500, and 1000 ppb. Compared to conventional methods, it demonstrates stable binning and integration across low (100-450 m/z), mid (450-900 m/z), and high (900-1500 m/z) mass regions, resulting in superior predictive models. Using a mouse model of hepatocellular carcinoma as a proof-of-concept study, we identify 10 generic metabolites that showcase advancements in using ambient MS imaging (MSI) data for modeling and deploy the attained model to shotgun data. This facilitates disease detection via various sample introduction methods, including MSI on liver cryosections (F1 score = 0.87) and glass smears (F1 score = 0.80), as well as rapid direct infusion analysis (recall = 0.89 and precision = 0.63). This novel mass resolution-adaptive binning and integration strategy offers a promising approach for integrating different data sets, potentially improving disease detection accuracy in MS applications.
UR - https://www.scopus.com/pages/publications/105024909471
U2 - 10.1021/acs.analchem.5c05874
DO - 10.1021/acs.analchem.5c05874
M3 - Journal article
C2 - 41288337
SN - 0003-2700
VL - 97
SP - 26877
EP - 26885
JO - Analytical Chemistry
JF - Analytical Chemistry
IS - 48
ER -