RSS-Feed abonnieren

DOI: 10.1055/s-0045-1809430
Comment on “Evaluating ChatGPT-4's Performance in Identifying Radiological Anatomy in FRCR Part 1 Examination Questions”
Funding None.
Dear Editor,
The publication on “Evaluating ChatGPT-4's Performance in Identifying Radiological Anatomy in FRCR Part 1 Examination Questions”[1] is noteworthy for its evaluation of ChatGPT-4's capacity to answer radiological anatomy questions, a critical component of the FRCR Part 1 Anatomy test. However, the research approach has significant drawbacks that impact the reliability and interpretation of the findings. For example, taking questions from a free website without explicitly mentioning the source may introduce biases into the question quality. It is also uncertain how diverse the question sets are in terms of organs, organ systems, and radiological imaging modalities such as magnetic resonance imaging, computed tomography, or ultrasound, all of which may have an impact on the model's performance.
The statistical analysis in this study is somewhat limited, with no inter-rater reliability reported for assessing the model's descriptions, which is critical when the assessment is subjective. Furthermore, no comparative statistical tests (e.g., t-test or ANOVA [analysis of variance]) were used to compare the two test conditions (with or without context), and no significance level (p-value) or confidence interval analysis was performed, precluding any academically significant conclusions on whether contextual environments truly improve model performance.
A major limitation of the AI (artificial intelligence) model, as reflected in the results, is its very low ability to understand visual data, especially when asked in the form of “Identify the structure where the arrow is pointing” is a task that requires direct image processing, and natural language processing (NLP) models like ChatGPT are not truly designed for image reading. Although they can properly identify all modalities, this demonstrates that the model is effective at evaluating surrounding text or queries, but it lacks the fundamental capacity to evaluate images, which is at the heart of radiology.
The question that should be asked is: If a language model like ChatGPT cannot directly perform clinical picture tasks, what role should AI play in the radiology profession? Should AI be used as an additional linguistic tool, such as when writing reports, or should it be integrated with computer vision models to develop hybrid systems in the future? In the next step, we should investigate the collaboration of NLP models with image-based models, such as multi-modal models like CLIP or GPT-4V, which can analyze both text and images. Furthermore, comparisons with groups of doctors or interns will aid in determining AI's genuine capabilities and may serve as a foundation for effectively using AI in specialized medical education.
#
Conflict of interest
None declared.
Data Availability Statement
The data that support the findings of this study are available from the corresponding author, H.D., upon reasonable request.
Declaration of Generative AI
The authors use language editing computational tool in preparation of the article.
Authors' Contributions
H.D.: 50% ideas, writing, analyzing, and approval.
V.W.: 50% ideas, supervision, and approval.
-
Reference
- 1 Sarangi PK, Datta S, Panda BB, Panda S, Mondal H. Evaluating ChatGPT-4's performance in identifying radiological anatomy in FRCR Part 1 examination questions. Indian J Radiol Imaging 2024; 35 (02) 287-294
Address for correspondence
Publikationsverlauf
Artikel online veröffentlicht:
13. Juni 2025
© 2025. Indian Radiological Association. This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Thieme Medical and Scientific Publishers Pvt. Ltd.
A-12, 2nd Floor, Sector 2, Noida-201301 UP, India
-
Reference
- 1 Sarangi PK, Datta S, Panda BB, Panda S, Mondal H. Evaluating ChatGPT-4's performance in identifying radiological anatomy in FRCR Part 1 examination questions. Indian J Radiol Imaging 2024; 35 (02) 287-294