TY - JOUR
T1 - ChemFH: an integrated tool for screening frequent false positives in chemical biology and drug discovery
AU - Shi, Shaohua
AU - Fu, Li
AU - Yi, Jiacai
AU - Yang, Ziyi
AU - Zhang, Xiaochen
AU - Deng, Youchao
AU - Wang, Wenxuan
AU - Wu, Chengkun
AU - Zhao, Wentao
AU - Hou, Tingjun
AU - Zeng, Xiangxiang
AU - Lyu, Aiping
AU - Cao, Dongsheng
N1 - National Key Research and Development Program of China [2021YFF1201400]; National Natural Science Foundation of China [22173118, 22220102001]; Hunan Provincial Science Fund for Distinguished Young Scholars [2021JJ10068]; Science and Technology Innovation Program of Hunan Province [2021RC4011]; Natural Science Foundation of Hunan Province [2022JJ80104]; 2020 Guangdong Provincial Science and Technology Innovation Strategy Special Fund [2020B1212030006, Guangdong-Hong Kong-Macau Joint Lab]. Funding for open access charge: HKBU Strategic Development Fund project [SDF19-0402-P02].
Publisher Copyright:
© 2024 The Author(s). Published by Oxford University Press on behalf of Nucleic Acids Research.
PY - 2024/7/5
Y1 - 2024/7/5
N2 - High-throughput screening rapidly tests an extensive array of chemical compounds to identify hit compounds for specific biological targets in drug discovery. However, false-positive results disrupt hit compound screening, leading to wastage of time and resources. To address this, we propose ChemFH, an integrated online platform facilitating rapid virtual evaluation of potential false positives, including colloidal aggregators, spectroscopic interference compounds, firefly luciferase inhibitors, chemical reactive compounds, promiscuous compounds, and other assay interferences. By leveraging a dataset containing 823 391 compounds, we constructed high-quality prediction models using multi-task directed message-passing network (DMPNN) architectures combining uncertainty estimation, yielding an average AUC value of 0.91. Furthermore, ChemFH incorporated 1441 representative alert substructures derived from the collected data and ten commonly used frequent hitter screening rules. ChemFH was validated with an external set of 75 compounds. Subsequently, the virtual screening capability of ChemFH was successfully confirmed through its application to five virtual screening libraries. Furthermore, ChemFH underwent additional validation on two natural products and FDA-approved drugs, yielding reliable and accurate results. ChemFH is a comprehensive, reliable, and computationally efficient screening pipeline that facilitates the identification of true positive results in assays, contributing to enhanced efficiency and success rates in drug discovery. ChemFH is freely available via https://chemfh.scbdd.com/.
AB - High-throughput screening rapidly tests an extensive array of chemical compounds to identify hit compounds for specific biological targets in drug discovery. However, false-positive results disrupt hit compound screening, leading to wastage of time and resources. To address this, we propose ChemFH, an integrated online platform facilitating rapid virtual evaluation of potential false positives, including colloidal aggregators, spectroscopic interference compounds, firefly luciferase inhibitors, chemical reactive compounds, promiscuous compounds, and other assay interferences. By leveraging a dataset containing 823 391 compounds, we constructed high-quality prediction models using multi-task directed message-passing network (DMPNN) architectures combining uncertainty estimation, yielding an average AUC value of 0.91. Furthermore, ChemFH incorporated 1441 representative alert substructures derived from the collected data and ten commonly used frequent hitter screening rules. ChemFH was validated with an external set of 75 compounds. Subsequently, the virtual screening capability of ChemFH was successfully confirmed through its application to five virtual screening libraries. Furthermore, ChemFH underwent additional validation on two natural products and FDA-approved drugs, yielding reliable and accurate results. ChemFH is a comprehensive, reliable, and computationally efficient screening pipeline that facilitates the identification of true positive results in assays, contributing to enhanced efficiency and success rates in drug discovery. ChemFH is freely available via https://chemfh.scbdd.com/.
UR - https://academic.oup.com/nar/article/52/W1/W557/7688985
UR - http://www.scopus.com/inward/record.url?scp=85197807279&partnerID=8YFLogxK
U2 - 10.1093/nar/gkae424
DO - 10.1093/nar/gkae424
M3 - Journal article
SN - 0305-1048
VL - 52
SP - W439–W449
JO - Nucleic Acids Research
JF - Nucleic Acids Research
IS - W1
ER -