RSS-Feed abonnieren
DOI: 10.1055/a-2628-8408
Comparing the Performances of a 54-Year-Old Computer-Based Consultation to ChatGPT-4o
Autoren
Funding None.
Abstract
Objectives
This study aimed to evaluate and compare the diagnostic responses generated by two artificial intelligence (AI) models developed 54 years apart, and encourage physicians to explore the use of large language models (LLMs) like GPT-4o in clinical practice.
Methods
A clinical case of metabolic acidosis was presented to GPT-4o, and the model's diagnostic reasoning, data interpretation, and management recommendations were recorded. These outputs were then compared with the responses from Schwartz's 1970 AI model built with a decision-tree algorithm using Conversational Algebraic Language (CAL). Both models were given the same patient data to ensure a fair comparison.
Results
GPT-4o generated an advanced analysis of the patient's acid–base disturbance, correctly identifying likely causes and suggesting relevant diagnostic tests and treatments. It provided a detailed, narrative explanation of the metabolic acidosis. The 1970 CAL model, while correctly recognizing the metabolic acidosis and flagging implausible inputs, was constrained by its rule-based design. CAL offered only basic stepwise guidance and required sequential prompts for each data point, reflecting a limited capacity to handle complex or unanticipated information. GPT-4o, by contrast, integrated the data more holistically, although it occasionally ventured beyond the provided information.
Conclusion
This comparison illustrates substantial advances in AI capabilities over five decades. GPT-4o's performance demonstrates the transformative potential of modern LLMs in clinical decision-making, showcasing abilities to synthesize complex data and assist diagnosis without specialized training, yet necessitating further validation, rigorous clinical trials, and adaptation to clinical contexts. Although innovative for its era and offering certain advantages over GPT-4o, the rule-based CAL system had technical limitations. Rather than viewing one as simply “better,” this study provides perspective on how far AI in medicine has progressed while acknowledging that current AI tools remain supplements to—not replacements for—physician judgment.
Keywords
artificial intelligence - computer-assisted decision-making - GPT-4o - large language modelsProtection of Human and Animal Subjects
No human or animal subjects were involved in this project.
Publikationsverlauf
Eingereicht: 20. Dezember 2024
Angenommen: 05. Juni 2025
Accepted Manuscript online:
06. Juni 2025
Artikel online veröffentlicht:
07. November 2025
© 2025. Thieme. All rights reserved.
Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany
-
References
- 1 Akbilgic O, Davis RL. The promise of machine learning: When will it be delivered?. J Card Fail 2019; 25 (06) 484-485
- 2 Das S, Ghoshal A. Can artificial intelligence ever develop the human touch and replace a psychiatrist? - A letter to the editor of the Journal of Medical Systems: Regarding “Artificial Intelligence in Medicine & ChatGPT: De-Tether the Physician.”. J Med Syst 2023; 47 (01) 72
- 3 Turing AM. Computing machinery and intelligence. Mind 1950; 59 (236) 433-460
- 4 McCarthy J, Minsky ML, Rochester N, Shannon CE. A proposal for the Dartmouth Summer Research Project on artificial intelligence. AI Mag 1955; 27 (04) 12
- 5 Ledley RS, Lusted LB. Reasoning foundations of medical diagnosis; symbolic logic, probability, and value theory aid our understanding of how physicians reason. Science 1959; 130 (3366) 9-21
- 6 Warner HR, Toronto AF, Veasey LG, Stephenson R. A mathematical approach to medical diagnosis. Application to congenital heart disease. JAMA 1961; 177: 177-183
- 7 Schwartz WB. Medicine and the computer. The promise and problems of change. N Engl J Med 1970; 283 (23) 1257-1264
- 8 Deng J, Dong W, Socher R. et al. ImageNet: A large-scale hierarchical image database. Paper presented at: 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL; 2009 ;pp. 248–255.
- 9 Open AI. ChatGPT Model (GPT-4o) [Large Language Model]. 2024 . Accessed September 2024 at: https://chat.openai.com/
- 10 Safi S, Thiessen T, Schmailzl KJ. Acceptance and resistance of new digital technologies in medicine: Qualitative study. JMIR Res Protoc 2018; 7 (12) e11072
- 11 American Medical Association. AMA Augmented Intelligence Research: Physician Sentiments Around the Use of AI in Healthcare: Motivations, Opportunities, Risks, and Use Cases. 2025
- 12 Hicks MT, Humphries J, Slater J. ChatGPT is bullshit. Ethics Inf Technol 2024; 26 (02) 1-10
- 13 Palm V, Rengier F, Rajiah P, Heussel CP, Partovi S. Acute pulmonary embolism: Imaging techniques, findings, endovascular treatment and differential diagnoses. Rofo 2020; 192 (01) 38-49
- 14 Griot M, Hemptinne C, Vanderdonckt J, Yuksel D. Large language models lack essential metacognition for reliable medical reasoning. Nat Commun 2025; 16 (01) 642
- 15 Haugeland J. Artificial intelligence: The very idea. MIT Press; 1985
- 16 Confalonieri R, Coba L, Wagner B, Besold TR. A historical perspective of explainable artificial intelligence. Wiley Interdiscip Rev Data Min Knowl Discov 2021; 11 (01) e1391
- 17 Zahid IA, Joudar SS, Albahri AS. et al. Unmasking large language models by means of OpenAI GPT-4 and Google AI: A deep instruction-based analysis. Intelligent Systems with Applications 2024; 23: 200431
- 18 Winter SD, Pearson JR, Gabow PA, Schultz AL, Lepoff RB. The fall of the serum bicarbonate level in chronic metabolic acidosis: A quantitative approach. Arch Intern Med 1967; 119 (05) 496-502
- 19 Mehandru N, Miao BY, Almaraz ER, Sushil M, Butte AJ, Alaa A. Evaluating large language models as agents in the clinic. NPJ Digit Med 2024; 7 (01) 84
- 20 Mahowald K, Ivanova AA, Blank IA, Kanwisher N, Tenenbaum JB, Fedorenko E. Dissociating language and thought in large language models. Trends Cogn Sci 2024; 28 (06) 517-540
- 21 Gerke S, Minssen T, Cohen IG. Ethical and legal challenges of artificial intelligence–driven healthcare. In: Artificial Intelligence in Healthcare. Elsevier; 2020. :pp. 295-336
- 22 OpenEvidence. AI literature search engine for evidence-based medicine. Accessed June 10, 2025 at: https://www.openevidence.com/
