Subscribe to RSS
DOI: 10.1055/a-2735-0527
Physician Perspectives on Large Language Models in Health Care: A Cross-Sectional Survey Study
Authors
Funding None.
Abstract
Objectives
This study aims to evaluate physicians' practices and perspectives regarding large language models (LLMs) in health care settings.
Methods
A cross-sectional survey study was conducted between May and July 2024, comparing physician perspectives at two major academic medical centers (AMCs), one with institutional LLM access and one without. Participants included both clinical faculty and trainees recruited through departmental leadership and snowball sampling. Primary outcomes were current LLM use frequency, ranked importance of evaluation metrics, liability concerns, and preferred learning topics.
Results
Among 306 respondents (217 attending physicians [70.9%], 80 trainees [26.1%]), 197 (64.4%) reported using LLMs. The AMC with institutional LLM access reported significantly lower liability concerns (49.2 vs. 66.7% reporting high concern; 17.5 percentage points difference [95% CI, 6.8–28.2]; p = 0.0082). Accuracy was prioritized across all specialties (median rank 1.0 [interquartile range; IQR, 1.0–2.0]). Of the respondents, 287 physicians (94%) requested additional training. Key learning priorities were clinical applications (206 [71.9%]) and risk management (181 [63.1%]). Despite widespread personal use, only 8 physicians (2.6%) recommended LLMs to patients. Notable specialty and demographic variations emerged, with younger physicians showing higher enthusiasm but also elevated legal concerns.
Conclusion
This survey study provides insights into physicians' current usage patterns and perspectives on LLMs. Liability concerns appear to be lessened in settings with institutional LLM access. The findings suggest opportunities for medical centers to consider when developing LLM-related policies and educational programs.
Keywords
cross-sectional survey - large language model - physician - academic medical centers - machine learningProtection of Human and Animal Subjects
This study was done in adherence to the World Medical Association Declaration of Helsinki on Ethical Principles for Medical Research Involving Human Subjects and reviewed by the Institutional Review Boards at both participating institutions.
Authors' Contributions
H.J.H., N.H.S., and L.S.L. conceived the study and designed the methodology. H.J.H. wrote the first draft of the manuscript. L.S.L., N.H.S., and M.A.P. provided critical review and substantially contributed to the manuscript.
Data Availability
The de-identified survey data and analysis code used in this study will be made available upon publication by the corresponding author upon reasonable request.
Publication History
Received: 07 June 2025
Accepted: 29 October 2025
Accepted Manuscript online:
30 October 2025
Article published online:
14 November 2025
© 2025. Thieme. All rights reserved.
Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany
-
References
- 1 Huang Z, Yang E, Shen J. et al. A pathologist–AI collaboration framework for enhancing diagnostic accuracies and efficiencies. Nat Biomed Eng 2025; 9: 455-470
- 2 Dong L, He W, Zhang R. et al. Artificial intelligence for screening of multiple retinal and optic nerve diseases. JAMA Netw Open 2022; 5 (05) e229960
- 3 Tan TF, Thirunavukarasu AJ, Jin L. et al. Artificial intelligence and digital health in global eye health: Opportunities and challenges. Lancet Glob Health 2023; 11 (09) e1432-e1443
- 4 Ahn JS, Shin S, Yang SA. et al. Artificial intelligence in breast cancer diagnosis and personalized medicine. J Breast Cancer 2023; 26 (05) 405-435
- 5 Shah NH, Entwistle D, Pfeffer MA. Creation and adoption of large language models in medicine. JAMA 2023; 330 (09) 866-869
- 6 Goh E, Gallo R, Hom J. et al. Large language model influence on diagnostic reasoning: A randomized clinical trial. JAMA Netw Open 2024; 7 (10) e2440969
- 7 Adams L, Fontaine E, Lin S, Crowell T, Chung VCH, Gonzalez AA. Artificial intelligence in health, health care, and biomedical science: An AI Code of Conduct Principles and Commitments Discussion Draft. NAM Perspect 2024 2024. 10.31478/202404a
- 8 CHAI - Coalition for Health AI. Assurance Standards Guide. Accessed August 19, 2024 at: https://chai.org/assurance-standards-guide/
- 9 Bedi S, Liu Y, Orr-Ewing L. et al. A systematic review of testing and evaluation of healthcare applications of large language models (LLMs). medRxiv Preprint posted online. August 16, 2024
- 10 Ng MY, Helzer J, Pfeffer MA, Seto T, Hernandez-Boussard T. Development of secure infrastructure for advancing generative artificial intelligence research in healthcare at an academic medical center. J Am Med Inform Assoc 2025; 32 (03) 586-588
- 11 Spotnitz M, Idnay B, Gordon ER. et al. A survey of clinicians' views of the utility of large language models. Appl Clin Inform 2024; 15 (02) 306-312
- 12 Goldberg CB, Adams L, Blumenthal D. et al. To do no harm—and the most good—with AI in health care. NEJM AI 2024; 1 (03) AIp2400036
- 13 Davis FD. Perceived usefulness, perceived ease of use, and user acceptance of information technology. Manage Inf Syst Q 1989; 13 (03) 319-340
- 14 Venkatesh V, Morris MG, Davis GB, Davis FD. user acceptance of information technology: Toward a unified view. Manage Inf Syst Q 2003; 27 (03) 425-478
- 15 Jena AB, Seabury S, Lakdawalla D, Chandra A. Malpractice risk according to physician specialty. N Engl J Med 2011; 365 (07) 629-636
- 16 Schaffer AC, Jena AB, Seabury SA, Singh H, Chalasani V, Kachalia A. Rates and characteristics of paid malpractice claims among US physicians by specialty, 1992-2014. JAMA Intern Med 2017; 177 (05) 710-718
- 17 Mello MM, Guha N. Policy Brief Understanding Liability Risk from Healthcare AI | Stanford HAI. Policy Brief HAI Policy & Society. February 2024. Accessed July 19, 2024 at: https://hai.stanford.edu/policy-brief-understanding-liability-risk-healthcare-ai
- 18 Selbst AD. Negligence and AI's Human Users. UCLA School of Law. Preprint posted online March 11, 2019. Accessed August 26, 2024 at: https://papers.ssrn.com/abstract=3350508
- 19 Maliha G, Gerke S, Cohen IG, Parikh RB. Artificial intelligence and liability in medicine: Balancing safety and innovation. Milbank Q 2021; 99 (03) 629-647
- 20 Price II WN, Cohen IG. Locating Liability for Medical AI. Published online January 1, 2024. Accessed November 3, 2025 at: https://repository.law.umich.edu/articles/2957
- 21 Thirunavukarasu AJ, Hassan R, Mahmood S. et al. Trialling a large language model (ChatGPT) in general practice with the applied knowledge test: Observational study demonstrating opportunities and limitations in primary care. JMIR Med Educ 2023; 9: e46599
- 22 Lee S, Kim WJ, Chang J, Ye JC. LLM-CXR: Instruction-finetuned LLM for CXR image understanding and generation. arXiv:2305.11490. Preprint posted online March 17, 2024
- 23 Jin D, Pan E, Oufattole N, Weng WH, Fang H, Szolovits P. What disease does this patient have? A large-scale open domain question answering dataset from medical exams. Appl Sci (Basel) 2021; 11 (14) 6421
- 24 Kung TH, Cheatham M, Medenilla A. et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLoS Digit Health 2023; 2 (02) e0000198
- 25 Hager P, Jungmann F, Holland R. et al. Evaluation and mitigation of the limitations of large language models in clinical decision-making. Nat Med 2024; 30 (09) 2613-2622
- 26 McDuff D, Schaekermann M, Tu T. et al. Towards accurate differential diagnosis with large language models. Nature 2025; 642 (8067): 451-457
- 27 Katz U, Cohen E, Shachar E. et al. GPT versus resident physicians — a benchmark based on official board scores. NEJM AI 2024; 1 (05) 5
- 28 Rutledge GW. Diagnostic accuracy of GPT-4 on common clinical scenarios and challenging cases. Learn Health Syst 2024; 8 (03) e10438
- 29 Verdi EB, Akbilgic O. Comparing the performances of a fifty-four-year-old computer-based consultation to ChatGPT-4o. Appl Clin Inform 2025; (E-pub ahead of print)
- 30 Singhal K, Tu T, Gottweis J. et al. Toward expert-level medical question answering with large language models. Nat Med 2025; 31 (03) 943-950
- 31 Cabral S, Restrepo D, Kanjee Z. et al. Clinical reasoning of a generative artificial intelligence model compared with physicians. JAMA Intern Med 2024; 184 (05) 581-583
- 32 Bedi S, Jain SS, Shah NH. Evaluating the clinical benefits of LLMs. Nat Med 2024; 30 (09) 2409-2410
- 33 Fleming SL, Lozano A, Haberkorn WJ. et al. MedAlign: A clinician-generated dataset for instruction following with electronic medical records. Proc AAAI Conf Artif Intell. 2024; 38 (20) 22021-2208
- 34 Pencina MJ, McCall J, Economou-Zavlanos NJ. A federated registration system for artificial intelligence in health. JAMA 2024; 332 (10) 789-790
- 35 Rosenbluth T. Dr. Chatbot Will See You Now. The New York Times. September 11, 2024. Accessed October 2, 2024 at: https://www.nytimes.com/2024/09/11/health/chatbots-health-diagnosis-treatments.html
- 36 Hong HJ, Schmiesing CA, Goodell AJ. Enhancing the readability of preoperative patient instructions using large language models. Anesthesiology 2024; 141 (03) 608-610
