Open Access
CC BY 4.0 · Eur J Dent
DOI: 10.1055/s-0045-1812064
Original Article

Comparative Benchmark of Seven Large Language Models for Traumatic Dental Injury Knowledge

Authors

  • Kittipat Termteerapornpimol

    1   Department of Occlusion, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand
  • Sirinya Kulvitit

    2   Department of Operative Dentistry, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand
  • Sasiprapa Prommanee

    3   Clinical Research Center, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand
  • Zohaib Khurshid

    4   Department of Prosthodontics and Dental Implantology, College of Dentistry, King Faisal University, Hofuf, Kingdom of Saudi Arabia
    5   Center of Excellence in Precision Medicine and Digital Health, Geriatric Dentistry and Special Patients Care International Program, Department of Physiology, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand
  • Thantrira Porntaveetus

    5   Center of Excellence in Precision Medicine and Digital Health, Geriatric Dentistry and Special Patients Care International Program, Department of Physiology, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand

Funding T.P. was supported by Health Systems Research Institute (68–032, 68–059), Faculty of Dentistry (DRF69_005), Thailand Science Research and Innovation Fund Chulalongkorn University (HEA_FF_68_008_3200_001).
Preview

Abstract

Objectives

Traumatic dental injuries (TDIs) are complex clinical conditions that require timely and accurate decision-making. With the rise of large language models (LLMs), there is growing interest in their potential to support dental management. This study evaluated the accuracy and consistency of DeepSeek R1's responses across all categories of TDIs and benchmarked its performance against other common LLMs.

Materials and Methods

DeepSeek R1 and six other LLMs, ChatGPT-4o mini, ChatGPT-4o, Claude 3.5 Sonnet, Claude 3 Opus, Gemini 1.5 Flash, and Gemini 1.5 Advanced, were assessed using a validated question set (125 items) covering five subtopics: general introduction, fractures, luxations, avulsions of permanent teeth, and TDIs in the primary dentition (25 items per group) with a specific prompt. Each model was tested with five repetitions for all items.

Statistical Analysis

Accuracy was calculated as the percentage of correct responses, while consistency was measured using Fleiss' kappa analysis. Kruskal–Wallis H and Dunn's post-hoc test were applied for comparisons of three or more independent groups.

Results

DeepSeek R1 achieved the highest overall score of 86.4% ± 2.5%, despite the most inconsistent responses (κ = 0.694), statistically higher than those of ChatGPT-4o mini (74.7% ± 0.9%), Claude 3 Opus (75.2% ± 1.0%), and Gemini 1.5 Flash (73.85% ± 2.3%) (p < 0.0001). Across all models, accuracy was notably lower for luxation injury questions (68.3% ± 3.2%).

Conclusions

LLMs achieved moderate to high accuracy, yet this was tempered by varying degrees of inconsistency, particularly in the top-performing DeepSeek model. Difficulty with complex scenarios like luxation highlights current limitations in artificial intelligence (AI)'s diagnostic reasoning. AI should be viewed as a valuable dental educational and clinical adjunctive tool for knowledge acquisition and analysis, not a replacement for clinical expertise.

Data Availability

Data available on request from the authors.


Author Contributions

K.T.: conceptualization, methodology, investigation, data curation, formal analysis, visualization, writing—original draft. S.K.: conceptualization, methodology, validation, writing—original draft (discussion), writing—review and editing, writing—review and editing. S.P.: data curation. Z.K.: formal analysis, writing—review and editing. T.P.: conceptualization, writing—review and editing.


Supplementary Material



Publication History

Article published online:
22 October 2025

© 2025. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/)

Thieme Medical and Scientific Publishers Pvt. Ltd.
A-12, 2nd Floor, Sector 2, Noida-201301 UP, India