RSS-Feed abonnieren
DOI: 10.1055/a-2689-2685
Evaluating AI Responses to Postoperative Questions in Mohs Reconstruction

Abstract
Introduction
Patients frequently ask questions after Mohs facial reconstruction. AI tools, particularly large language models (LLMs), may optimize this communication.
Objectives and Hypotheses
We evaluated four LLMs—Claude AI, ChatGPT, Microsoft Copilot, and Google Gemini—on responses to postoperative questions, hypothesizing variation in quality, accuracy, comprehensiveness, and readability.
Study Design
Prospective observational study following STROBE guidelines.
Methods
A total of 31 common postoperative questions were created. Each was submitted to all four LLMs using a standardized prompt. Responses were evaluated by blinded facial plastic surgeons using validated scoring tools (EQIP, Likert scales, readability formulas). IRB exemption was granted.
Results
Claude AI outperformed others in quality (EQIP: 90.3), accuracy (4.55/5), and comprehensiveness (4.60/5). All LLMs exceeded the recommended 6th-grade reading level.
Conclusion
LLMs show potential for supporting postoperative communication, but variation in readability and content depth highlights the continued need for physician oversight.
Declaration of GenAI Use
During the writing process of this article, the authors used ChatGPT-4o and Claude AI to assist with editing and formatting. The authors have reviewed and edited the text and take full responsibility for the content of the article.
Publikationsverlauf
Eingereicht: 22. Juli 2025
Angenommen: 25. August 2025
Accepted Manuscript online:
25. August 2025
Artikel online veröffentlicht:
04. September 2025
© 2025. Thieme. All rights reserved.
Thieme Medical Publishers, Inc.
333 Seventh Avenue, 18th Floor, New York, NY 10001, USA
-
References
- 1 Dave T, Athaluri SA, Singh S. ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell 2023; 6: 1169595
- 2 Tangadulrat P, Sono S, Tangtrakulwanich B. Using ChatGPT for clinical practice and medical education: cross-sectional survey of medical students' and physicians' perceptions. JMIR Med Educ 2023; 9: e50658
- 3 Pressman SM, Borna S, Gomez-Cabello CA, Haider SA, Haider CR, Forte AJ. Clinical and surgical applications of large language models: a systematic review. J Clin Med 2024; 13 (11) 3041
- 4 Jeyaraman M, Balaji S, Jeyaraman N, Yadav S. Unraveling the ethical enigma: artificial intelligence in healthcare. Cureus 2023; 15 (08) e43262
- 5 Cascella M, Semeraro F, Montomoli J, Bellini V, Piazza O, Bignami E. The breakthrough of large language models release for medical applications: 1-year timeline and perspectives. J Med Syst 2024; 48 (01) 22
- 6 Shieh A, Tran B, He G, Kumar M, Freed JA, Majety P. Assessing ChatGPT 4.0's test performance and clinical diagnostic accuracy on USMLE STEP 2 CK and clinical case reports. Sci Rep 2024; 14 (01) 9330
- 7 Abbasi J, Hswen Y, How AI. How AI assistants could help answer patients' messages—and potentially improve their outcomes. JAMA 2024; 331 (02) 95-97
- 8 Kung TH, Cheatham M, Medenilla A. et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health 2023; 2 (02) e0000198
- 9 Small WR, Wiesenfeld B, Brandfield-Harvey B. et al. Large language model-based responses to patients' in-basket messages. JAMA Netw Open 2024; 7 (07) e2422399
- 10 Lee EB, Ford A, Clarey D, Wysong A, Sutton AV. Patient outcomes and satisfaction after Mohs micrographic surgery in patients with nonmelanoma skin cancer. Dermatol Surg 2021; 47 (09) 1190-1194 [published correction appears in Dermatol Surg. 2021 Dec 1;47(12):1682. doi: 10.1097/DSS.0000000000003316.]
- 11 London AJ. Artificial intelligence in medicine: overcoming or recapitulating structural challenges to improving patient care?. Cell Rep Med 2022; 3 (05) 100622
- 12 Garcia P, Ma SP, Shah S. et al. Artificial intelligence-generated draft replies to patient inbox messages. JAMA Netw Open 2024; 7 (03) e243201
- 13 Baxter SL, Longhurst CA, Millen M, Sitapati AM, Tai-Seale M. Generative artificial intelligence responses to patient messages in the electronic health record: early lessons learned. JAMIA Open 2024; 7 (02) ooae028
- 14 Ye J, Rust G, Fry-Johnson Y, Strothers H. E-mail in patient-provider communication: a systematic review. Patient Educ Couns 2010; 80 (02) 266-273
- 15 Wang LW, Miller MJ, Schmitt MR, Wen FK. Assessing readability formula differences with written health information materials: application, results, and recommendations. Res Social Adm Pharm 2013; 9 (05) 503-516
- 16 Moult B, Franck LS, Brady H. Ensuring quality information for patients: development and preliminary validation of a new instrument to improve the quality of written health care information. Health Expect 2004; 7 (02) 165-175
- 17 Charvet-Berard AI, Chopard P, Perneger TV. Measuring quality of patient information documents with an expanded EQIP scale. Patient Educ Couns 2008; 70 (03) 407-411
- 18 Chen D, Parsa R, Hope A. et al. Physician and artificial intelligence Chatbot responses to cancer questions from social media. JAMA Oncol 2024; 10 (07) 956-960
- 19 Goshtasbi K, Best C, Powers B. et al. Comparative performance of the leading large language models in answering complex rhinoplasty consultation questions. Facial Plast Surg Aesthet Med 2025; 27 (04) 378-383
- 20 Fernandez C, Dukharan V, Marroquin NA, Bolen R, Leavitt A, Cabbad NC. Assessing the accuracy of ChatGPT in appropriately triaging common postoperative concerns regarding Mohs micrographic surgery. JMIR Dermatol 2025; 8: e72706
- 21 Lauck KC, Cho SW, DaCunha M. et al. The utility of artificial intelligence platforms for patient-generated questions in Mohs micrographic surgery: a multi-national, blinded expert panel evaluation. Int J Dermatol 2024; 63 (11) 1592-1598
- 22 Durairaj KK, Baker O, Bertossi D. et al. Artificial intelligence versus expert plastic surgeon: comparative study shows ChatGPT “wins” rhinoplasty consultations: should we be worried?. Facial Plast Surg Aesthet Med 2024; 26 (03) 270-275
- 23 Crossley SA, Balyan R, Liu J, Karter AJ, McNamara D, Schillinger D. Predicting the readability of physicians' secure messages to improve health communication using novel linguistic features: findings from the ECLIPPSE study. J Commun Healthc 2020; 13 (04) 1-13
- 24 Fang MC, Panguluri P, Machtinger EL, Schillinger D. Language, literacy, and characterization of stroke among patients taking warfarin for stroke prevention: Implications for health communication. Patient Educ Couns 2009; 75 (03) 403-410
- 25 Eltorai AE, Ghanian S, Adams Jr CA, Born CT, Daniels AH. Readability of patient education materials on the American Association for Surgery of Trauma website. Arch Trauma Res 2014; 3 (02) e18161
- 26 Meyer MKR, Kandathil CK, Davis SJ. et al. Evaluation of rhinoplasty information from ChatGPT, Gemini, and Claude for readability and accuracy. Aesthetic Plast Surg 2025; 49 (07) 1868-1873
- 27 Robinson MA, Belzberg M, Cai C, Lim J, Liu TYA, Ng E. Patients prefer artificial intelligence large language model-generated responses to those prepared by the American College of Mohs Surgery: a double-blind comparative study using ChatGPT and Google Gemini. JAAD Int 2025; 21: 52-54