J Neurol Surg B Skull Base 2025; 86(S 01): S1-S576
DOI: 10.1055/s-0045-1803695
Presentation Abstracts
Podium Presentations
Poster Presentations

Assessing the Diagnostic Power and Management Recommendations of ChatGPT-4 Vision on Orbital Fracture in CT Scan

Omar Sadat
1   West Virginia University, Morgantown, West Virginia, United States
,
Kareem Ibrahim-Bacha
1   West Virginia University, Morgantown, West Virginia, United States
,
Diane Wang
1   West Virginia University, Morgantown, West Virginia, United States
,
Richard Cui
1   West Virginia University, Morgantown, West Virginia, United States
,
John N. Nguyen
1   West Virginia University, Morgantown, West Virginia, United States
› Institutsangaben
 

Purpose: Facial CT scan is essential in the evaluation and management of fractures. ChatGPT-4 Vision (GPT-4v), a multimodal large language model that allows a user to upload an image as input and engage in a conversation with the model, shows promising result when utilized for diagnosing distal radius fractures, providing patient information, and assisting with the decision-making process. We here in evaluate the performance of ChatGPT-4 Vision in the analysis of facial CT for diagnosis of orbital fracture along with its recommended management, compared with the assessments and recommendations from oculofacial plastic surgeons.

Materials and Methods: Nineteen cases of various orbital floor fractures with CT images were obtained from open-source online image search, and cases including two or more views (axial and coronal) were included. Each case was assessed for identification of fractures, laterality, size of fractures, likelihood of extraocular muscles entrapment, and treatment recommendations including medical and surgical management. ChatGPT-4 Vision was given a prompt to assess the CT images along with making recommendations ([Fig. 1]). Separately, an attending physician and a fellow who are blinded to the image collection were asked to assess the images and to make recommendations purely on the CT images. Performance of CT analysis by GPT-4v was compared with surgeons with radiologist’s interpretation served as gold-standard.

Zoom
Fig. 1

Results: GPT-4v and surgeons correctly identified the presence of an orbital fracture in all nineteen cases. GPT-4v’s ability to accurately identify laterality of fracture was found to be 47.37%, significantly lower than surgeons at 100% (χ2 = 24.26, p < 0.01). Identification of fractured bone was found to be 100% across GPT-4v and surgeons. Inferior rectus entrapment (5/19 cases) was found to be 73.68% by GPT-4v, significantly lower than surgeons at 100% (χ2 = 8.60, p < 0.01). Size of fracture was accurately described 63.16% by GPT-4v, also significantly lower versus surgeons at 100% (χ2 = 15.96, p < 0.01). Lastly, recommendation of surgical management was accurately depicted 68.42% of the time by GPT-4v which was significantly lower than 94.75% by surgeons (χ2 = 7.27, p < 0.01).

Conclusion: GPT-4v is able to accurately identify the presence of fractured bone on CT images, but has significant deficiencies compared with oculofacial trauma surgeons in its ability to identify fracture laterality, likelihood of muscle entrapment, and correctly recommend surgical intervention.



Publikationsverlauf

Artikel online veröffentlicht:
07. Februar 2025

© 2025. Thieme. All rights reserved.

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany