TY - JOUR
T1 - Lack of methodological rigor and limited coverage of generative artificial intelligence in existing artificial intelligence reporting guidelines
T2 - a scoping review
AU - Luo, Xufei
AU - Wang, Bingyi
AU - Shi, Qianling
AU - Wang, Zijun
AU - Lai, Honghao
AU - Liu, Hui
AU - Qin, Yishan
AU - Chen, Fengxian
AU - Song, Xuping
AU - Ge, Long
AU - Zhang, Lu
AU - Bian, Zhaoxiang
AU - Chen, Yaolong
AU - ADVANCED Working Group
AU - He, Hongfeng
AU - Wang, Ye
AU - Li, Haodong
AU - Zhang, Huayu
AU - Zhu, Di
AU - Yao, Yuanyuan
AU - Peng, Dongrui
AU - Li, Zhewei
AU - Zhang, Jie
AU - Qin, Yishan
AU - Wang, Fan
AU - Tang, Zhenyu
AU - Li, Yueyan
AU - Liu, Hanxiang
AU - Zhao, Jungang
N1 - The author gratefully acknowledges the support of K.C.Wong Education Foundation, Hong Kong (Yaolong Chen).
Publisher Copyright:
© 2025 Elsevier Inc.
PY - 2025/10
Y1 - 2025/10
N2 - Objectives: This study aimed to systematically map the development methods, scope, and limitations of existing artificial intelligence (AI) reporting guidelines in medicine and to explore their applicability to generative AI (GAI) tools, such as large language models (LLMs). Study Design and Setting: We reported a scoping review adhering to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews. Five information sources were searched, including MEDLINE (via PubMed), Enhancing the QUAlity and Transparency Of health Research (EQUATOR) Network, China National Knowledge Infrastructure, FAIRsharing, and Google Scholar, from inception to December 31, 2024. Two reviewers independently screened records and extracted data using a predefined Excel template. Data included guideline characteristics (eg, development methods, target audience, AI domain), adherence to EQUATOR Network recommendations, and consensus methodologies. Discrepancies were resolved by a third reviewer. Results: Sixty-eight AI reporting guidelines were included; 48.5% focused on general AI, whereas only 7.4% addressed GAI/LLMs. Methodological rigor was limited; 39.7% described development processes, 42.6% involved multidisciplinary experts, and 33.8% followed EQUATOR recommendations. Significant overlap existed, particularly in medical imaging (20.6% of guidelines). GAI-specific guidelines (14.7%) lacked comprehensive coverage and methodological transparency. Conclusion: Existing AI reporting guidelines in medicine have suboptimal methodological rigor, redundancy, and insufficient coverage of GAI applications. Future and updated guidelines should prioritize standardized development processes, multidisciplinary collaboration, and expanded focus on emerging AI technologies like LLMs.
AB - Objectives: This study aimed to systematically map the development methods, scope, and limitations of existing artificial intelligence (AI) reporting guidelines in medicine and to explore their applicability to generative AI (GAI) tools, such as large language models (LLMs). Study Design and Setting: We reported a scoping review adhering to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews. Five information sources were searched, including MEDLINE (via PubMed), Enhancing the QUAlity and Transparency Of health Research (EQUATOR) Network, China National Knowledge Infrastructure, FAIRsharing, and Google Scholar, from inception to December 31, 2024. Two reviewers independently screened records and extracted data using a predefined Excel template. Data included guideline characteristics (eg, development methods, target audience, AI domain), adherence to EQUATOR Network recommendations, and consensus methodologies. Discrepancies were resolved by a third reviewer. Results: Sixty-eight AI reporting guidelines were included; 48.5% focused on general AI, whereas only 7.4% addressed GAI/LLMs. Methodological rigor was limited; 39.7% described development processes, 42.6% involved multidisciplinary experts, and 33.8% followed EQUATOR recommendations. Significant overlap existed, particularly in medical imaging (20.6% of guidelines). GAI-specific guidelines (14.7%) lacked comprehensive coverage and methodological transparency. Conclusion: Existing AI reporting guidelines in medicine have suboptimal methodological rigor, redundancy, and insufficient coverage of GAI applications. Future and updated guidelines should prioritize standardized development processes, multidisciplinary collaboration, and expanded focus on emerging AI technologies like LLMs.
KW - Artificial intelligence
KW - Generative artificial intelligence
KW - Large language models
KW - Methodological quality
KW - Reporting guidelines
KW - Scoping review
UR - http://www.scopus.com/inward/record.url?scp=105012972746&partnerID=8YFLogxK
U2 - 10.1016/j.jclinepi.2025.111903
DO - 10.1016/j.jclinepi.2025.111903
M3 - Journal article
C2 - 40684889
AN - SCOPUS:105012972746
SN - 0895-4356
VL - 186
JO - Journal of Clinical Epidemiology
JF - Journal of Clinical Epidemiology
M1 - 111903
ER -