TY - JOUR
T1 - Machine Learning for Enhanced Identification Probability in RPLC/HRMS Nontargeted Workflows
AU - Ngan, Hiu-Lok
AU - Turkina, Viktoriia
AU - van Herwerden, Denice
AU - Yan, Hong
AU - Cai, Zongwei
AU - Samanipour, Saer
N1 - H.-L.N. acknowledges the sponsorship from the Graduate School of Hong Kong Baptist University for overseas research attachment, the Environmental Monitoring and Computational Mass Spectrometry (EMCMS, www.emcms.info) group for their insights and feedback, and Prof. S.S. for his host of overseas research experience program. S.S. and V.T. thank the ChemistryNL for financial support. H.Y thanks the Start-up Grant for New Academics–YAN Hong (165520).
Publisher copyright:
© 2025 The Authors. Published by American Chemical Society.
PY - 2025/8/26
Y1 - 2025/8/26
N2 - In HRMS-based nontargeted analysis (NTA), spectral matching is crucial for chemical identification, particularly in the absence of retention information. This study introduces class probability of true positives (P(TP)) as an innovative approach, leveraging data from MS/MS spectra and calibrant-free predicted retention time indices (RTIs) through 3 machine learning (ML) models to enhance identification probability (IP). The first model is a molecular fingerprint (MF)-to-RTI model trained on 4713 calibrants. The second model, a cumulative neutral loss (CNL)-to-RTI model, utilized 485,577 experimental spectra. The final model, a binary classification model, was trained using 1,686,319 TP and semisynthetic true negative (TN) spectral matches. High correlations between MF-derived and CNL-derived RTI values (R2 = 0.96 for training; 0.88 for testing) suggest reduced RTI errors in TP spectral matches. Incorporating reference spectral library searches and RTI errors, the k-nearest neighbors algorithm achieved a weighted F1 score of 0.65 and a Matthews correlation coefficient of 0.30 for pesticides at concentrations of 1 to 1000 ppb in blank samples, with a recall of 0.60 in black tea matrices. Compared to solely library matching, the average IPs for pesticides increased by 54.5, 52.1, and 46.7% when spiked in blank, 10× diluted, and 100× diluted tea matrices, respectively. This work demonstrates the effectiveness of ML in enhancing the chemical IPs of annotated compounds within complex matrices.
AB - In HRMS-based nontargeted analysis (NTA), spectral matching is crucial for chemical identification, particularly in the absence of retention information. This study introduces class probability of true positives (P(TP)) as an innovative approach, leveraging data from MS/MS spectra and calibrant-free predicted retention time indices (RTIs) through 3 machine learning (ML) models to enhance identification probability (IP). The first model is a molecular fingerprint (MF)-to-RTI model trained on 4713 calibrants. The second model, a cumulative neutral loss (CNL)-to-RTI model, utilized 485,577 experimental spectra. The final model, a binary classification model, was trained using 1,686,319 TP and semisynthetic true negative (TN) spectral matches. High correlations between MF-derived and CNL-derived RTI values (R2 = 0.96 for training; 0.88 for testing) suggest reduced RTI errors in TP spectral matches. Incorporating reference spectral library searches and RTI errors, the k-nearest neighbors algorithm achieved a weighted F1 score of 0.65 and a Matthews correlation coefficient of 0.30 for pesticides at concentrations of 1 to 1000 ppb in blank samples, with a recall of 0.60 in black tea matrices. Compared to solely library matching, the average IPs for pesticides increased by 54.5, 52.1, and 46.7% when spiked in blank, 10× diluted, and 100× diluted tea matrices, respectively. This work demonstrates the effectiveness of ML in enhancing the chemical IPs of annotated compounds within complex matrices.
UR - https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=hkbuirimsintegration2023&SrcAuth=WosAPI&KeyUT=WOS:001548396500001&DestLinkType=FullRecord&DestApp=WOS_CPL
U2 - 10.1021/acs.analchem.5c01873
DO - 10.1021/acs.analchem.5c01873
M3 - Journal article
C2 - 40791078
SN - 0003-2700
VL - 97
SP - 18028
EP - 18035
JO - Analytical Chemistry
JF - Analytical Chemistry
IS - 33
ER -