Abstract
Objective: To evaluate an automated reporting checklist generation tool using large language models and retrieval augmentation generation technology, called RAPID.
Materials and Methods: This study utilized large language models to develop a retrieval augmentation generation architecture. To assess its performance, a total of 91 published journal articles were collected and manually annotated in accordance with the CONSORT and CONSORT-AI medical reporting guidelines. These articles comprised 50 randomized controlled trials conducted without AI intervention and 41 randomized controlled trials that incorporated AI tools.
Results: Fifty RCT articles without the intervention of AI tools and 41 RCT articles with the intervention of AI tools were collected as CONSORT and CONSORT-AI datasets. All of the CONSORT reporting items (37) were included in the tool. RAPID achieved a high average accuracy rate of 92.11% and a content consistency score of 81.14% on the CONSORT dataset. Of the CONSORT-AI reporting items, 11 items related to the intervention of AI tools were included in the tool. RAPID achieved an average accuracy of 83.81% with a content consistency score of 72.51% on the CONSORT-AI dataset.
Discussion: RAPID may effectively save time and improve working efficiency for different user groups such as medical authors, researchers, editors, and reviewers.
Conclusion: RAPID has strong scalability, which can be easily adapted to different medical reporting guidelines without transfer learning on a large dataset. RAPID got state-of-the-art performance on 2 datasets for 2 different checklists compared to other methods.
Materials and Methods: This study utilized large language models to develop a retrieval augmentation generation architecture. To assess its performance, a total of 91 published journal articles were collected and manually annotated in accordance with the CONSORT and CONSORT-AI medical reporting guidelines. These articles comprised 50 randomized controlled trials conducted without AI intervention and 41 randomized controlled trials that incorporated AI tools.
Results: Fifty RCT articles without the intervention of AI tools and 41 RCT articles with the intervention of AI tools were collected as CONSORT and CONSORT-AI datasets. All of the CONSORT reporting items (37) were included in the tool. RAPID achieved a high average accuracy rate of 92.11% and a content consistency score of 81.14% on the CONSORT dataset. Of the CONSORT-AI reporting items, 11 items related to the intervention of AI tools were included in the tool. RAPID achieved an average accuracy of 83.81% with a content consistency score of 72.51% on the CONSORT-AI dataset.
Discussion: RAPID may effectively save time and improve working efficiency for different user groups such as medical authors, researchers, editors, and reviewers.
Conclusion: RAPID has strong scalability, which can be easily adapted to different medical reporting guidelines without transfer learning on a large dataset. RAPID got state-of-the-art performance on 2 datasets for 2 different checklists compared to other methods.
| Original language | English |
|---|---|
| Pages (from-to) | 1340-1349 |
| Number of pages | 10 |
| Journal | Journal of the American Medical Informatics Association |
| Volume | 32 |
| Issue number | 8 |
| Early online date | 17 Jun 2025 |
| DOIs | |
| Publication status | Published - Aug 2025 |
User-Defined Keywords
- large language model
- medical reporting guidelines
- randomized clinical trial
- retrieval augmentation generation