Curriculum for Early Exposure to Clinical Informatics and Data Science for Noninformatics Trainees to Promote Interest and Inclusion in Informatics

Abstract Background  Curricula aimed at increasing exposure to informatics and practical data analytics among medical trainees could increase their effectiveness in clinical research, quality improvement, and clinical operations. Objectives  The Clinical Informatics Data Science (CI-DS) pathway is a cross-disciplinary curriculum aimed at improving informatics exposure among medical trainees. We describe the development of this novel curriculum, the inaugural cohort, and lessons learned. Methods  The CI-DS pathway is framed around upfront informatics didactics followed by a longitudinal, experiential training focused on mentorship, clinical data extraction/machine learning, and health technology governance. The curriculum was evaluated based on pre- and postpathway surveys completed by learners and logs of the elective activities selected by learners. Results  The CI-DS pathway attracted 19 learners across 12 medical subspecialties, from medical students to fellows. Baseline surveys showed limited exposure to informatics across learners. The top three longitudinal activities completed were participating in electronic health record (EHR) governance meetings, data science supplemental courses, and designated mentorship meetings. Comparison of baseline with postpathway surveys demonstrated significant improvements in learner self-reported confidence in appraising an EHR modification ticket, accessing UCSF's deidentified data, exploring a database with basic structured query language (SQL), extracting data using SQL, and interpreting machine learning models. Conclusion  An early exposure curriculum in clinical informatics with training in data extraction and governance can successfully recruit a diverse array of learners and improve confidence in practical informatics skills. We reflect on the strengths and weaknesses of this curriculum, and summarize the lessons learned to guide others in creating similar curricula for noninformatics clinicians.


Background and Significance
2][3] Advanced training programs such as clinical informatics fellowships, 4,5 master's degree programs, 6 and certificate programs exist but are geared for those seeking careers in informatics.However, courses aimed at engaging medical trainees with unexplored interests in clinical informatics are still early in development.These types of curricula may serve two purposes: one, to introduce the breadth of the field of informatics to trainees and encourage them to pursue advanced training and two, to provide those who may not be interested in advanced training with the practical skills and informatics proficiencies needed for noninformaticists.
Several institutions have implemented elective clinical informatics curriculums at varying levels of training.][8][9][10] These curricula vary in format and duration, ranging from 2 to 4 weeks' electives to multimonth longitudinal programs.However, across all of these curricula there have been no learning activities oriented toward EHR data extraction and data science competencies besides predictive analytics. 6,9,10hile these EHR-focused curricula may be a useful for aspiring informaticists, they may not be ideal for noninformatics clinicians.For these physicians, skills in practical data analytics are vital in the era of the modern EHR because of the centrality of EHR data in several key nonclinical roles for physicians including clinical research, 11 quality improvement, 12 and clinical operations.Many scholarly and operational endeavors of noninformaticists would benefit from practical skills of how to query, interpret, use, and understand the limitations of EHR data.

Objectives
We describe the development and delivery of a Clinical Informatics-Data Science (CI-DS) curriculum aimed at UME and GME learners that incorporates practical data science competencies and reflect on lessons learned from the inaugural cohort.

Methods
The CI-DS curriculum was tailored for a tertiary, multisite academic medical center with a clinical informatics fellowship, but no other informatics curricula at the institution.The curriculum was developed based on the Kern six-step framework 13 with three major needs identified through observations and informal discussions: (1) appeal of the curriculum among informaticists and noninformaticists; (2) hands-on experiential training that enabled them to access and extract real-world data from EHR databases; and (3) accessible mentorship to promote diversity in levels of training and professional discipline.
We refined and specified competencies that would have broad utility for both informatics-and noninformaticsbound trainees based on a list of knowledge and skills compiled from surveying clinical informatics-certified physicians. 3Existing clinical informatics curricula for UME and GME learners were referenced for the competencies focused on health information systems, while we created original learning activities for the data science competencies. 1,10These competencies are described in ►Table 1.
The overarching educational strategy was one initial week of structured didactics followed by a longitudinal program spanning 5 to 9 months (►Fig. 1).This structure fit within the framework of an existing GME pathways program, which is a multidisciplinary program of supplementary curricula aimed at building specialized knowledge and skills. 14Enrollment in the CI-DS pathway was open to medical students, residents, and fellows to afford broad accessibility of the curriculum.
Based on our targeted needs assessment, the structured didactics portion of the curriculum were designed to consolidate foundational concepts in informatics for learners across levels of training and experience in informatics to establish a baseline for all trainees. 13Didactics consisted of a combination of readings, lectures, workshops, and panels facilitated by practicing clinical informaticians at the University of California, San Francisco (UCSF).Topics ranged from the Impact of Informatics on the Field of Medicine to Electronic Health Records and Clinical Decision Support to panels on training in clinical informatics and industry as well as introductory lectures on data science concepts (►Table 2).
The CI-DS pathway's longitudinal component was developed using behaviorist and cognitivist theories of education emphasizing two primary loci for learning: skill acquisition through practice and real-world experience paired with selfreflection. 15Activities were categorized as "Clinical Informatics" or "Data Science.""Clinical Informatics" emphasized learning and understanding the workings of EHR design and governance.Learners were encouraged to attend EHR governance meetings and capstone the experience by shepherding an EHR modification through the process."Data Science" offered activities designed by faculty and clinical informatics fellows that walked learners through gaining access to and directly querying from a deidentified clinical data warehouse at UCSF.We created self-guided tutorials for learners to perform structured query language (SQL) tasks and extract clinical data.Learners were also encouraged to define a question related to an ongoing or new quality improvement or research project that they could answer using the clinical data warehouse.The full list of lectures and activities is shown in ►Table 2.
The CI-DS pathway was advertised to learners through emails by UCSF GME and online information.Learners completed baseline surveys about their knowledge in clinical informatics and exposure to informatics mentorship prior to start of the curriculum.During the longitudinal pathway, learners submitted informal reflections for each foundational or elective activity completed.After pathway completion, longitudinal learners were asked to complete a postsurvey to evaluate for changes in knowledge and confidence.Survey data were compared using a Fisher's exact test given the small sample size.All learner surveys and responses were collected through Qualtrics and data were analyzed using R, version 4.1.2.

Results
The inaugural cohort of the CI-DS pathway included 19 learners, 11 of whom completed the longitudinal curricu-lum.Among the 19 learners, there was enrollment from fourth year medical student to clinical fellows (►Table 3), gender parity, with approximately 47% women enrolled (►Table 3), and representation from 12 subspecialties of medicine.
Baseline survey response rate was 95% (►Table 4).Learners reported limited knowledge and exposure to EHR governance, data querying and extraction, and machine learning prior to enrollment in the pathway.
Postpathway surveys were completed by 5 out of 11 learners (45%; ►Table 4).Learners reported a shift in their   confidence about all activities ranging from EHR governance to data querying and extraction to machine learning.Specifically, there was a significant difference in confidence in appraising an EHR modification ticket, accessing UCSF's deidentified data, exploring a database with basic SQL, extracting data using SQL, and interpreting machine learning models.There was no statistically significant difference in the percentage of learners working on an informatics project before and after the pathway, nor a difference in interest in clinical informatics fellowship training.In unstructured comments, learners reported a strong interest in learning more about querying data through SQL and applying machine learning to datasets.
Finally, the elective activities completed by longitudinal learners is summarized in ►Table 5.The top three activities completed by learners were the EHR governance committee activity, the UCSF Library Data Science Initiative course (synchronous virtual workshops offered quarterly on topics ranging from SQL, python, R, natural language processing, and machine learning), and monthly mentorship meetings.

Discussion
We describe the development of a new informatics curriculum for noninformatics clinicians at a tertiary care academic medical center as well as lessons learned from its first cohort.Unique elements of the curriculum include a strong emphasis on practical skills tailored to physicians in clinical data extraction/modeling from the EHR, availability of the curriculum across disciplines and UME and GME levels, and longitudinal mentorship.The CI-DS pathway attracted an initial cohort diverse in training level, specialty, and gender.We reflect on the strengths and weaknesses of this curriculum and summarize the lessons learned to continue improving the CI-DS pathway and to guide others in creating similar clinical informatics and data science curricula for noninformatics clinicians.
First, we found that learners reported a significant difference in their ability to interpret the results of machine learning models but did not show a significant difference in using machine learning on a clinical dataset.This is supported by elective activity completion logs, which show that learners completed supplemental data science online didactics but did not complete the SQL and machine  Informatics Curriculum for Noninformaticists Ravi et al. e83 learning exercises.This is notable given that end of pathway learners reported an interest in learning more about the data science and machine learning curricula.While these survey data are limited by response rate and sampling bias, these differences may still be useful in deriving insight into the best ways to teach similar curricula to physicians.When designing the data science exercises, we followed a flipped classroom approach, 16 by allowing independent, adult learners to review and complete the activity at their own pace, with several interactive, explanatory examples for learners to work with in the deidentified data warehouse.However, this structure may not have been adequate for medical trainees who may not have a computer science background.
Accompanying didactics or other asynchronous audiovisual material could provide the framing and structure needed for future interactive data science curricula aimed at clinicians.This is further supported by the popularity of the UCSF Library Data Science Initiative courses, which allow for learners to enroll in additional didactic material about specific data science skills.Independent learning activities like these, which rely on a flipped classroom-esque homework, may also benefit from small group discussion or office hour sessions to consolidate knowledge, address questions, and clarify the material.
While EHR ticket appraisal and submission may be more useful for informaticists than noninformatics clinicians, having insight, context, and expectations around the governance process may still be a useful skill for these learners.Among these skills, learners reported a significant difference in skill with appraising an EHR modification ticket but showed no significant difference in ability to submit their own ticket.This is further supported by elective activity completion logs which confirm high rates of governance committee meeting attendance, but limited EHR modification ticket submissions.The reason for this discrepancy may be due to the availability and frequency of EHR governance committee meetings compared with the number of opportunities for learners to submit a novel EHR modification request.Others considering similar curricula may consider inverting the concept of this idea: instead of submitting a new modification request, learners can apply their governance knowledge and skills to remove unnecessary or harmful elements in the EHR akin to a "Clickbusters" program. 17Such programs could allow noninformaticists to leverage their clinical domain expertise and to better understand the importance of the governance process in vetting future requests.
We also noted attrition from seminar learners (n ¼ 19) to longitudinal learners (n ¼ 11).As others have found that protecting research time for trainees may be correlated with academic output, 18,19 one potential hypothesis for this discrepancy is that our trainees, who did not have protected time for this elective curriculum may have been limited in their ability to participate because of their clinical workload.
Although some residency programs afford flexibility through X þ Y systems, in which residents switch between 4 weeks of an inpatient service followed by 2 to 4 weeks of an elective rotation with flexibility for protected academic time, this may not be true for all residencies or fellowships, particularly procedural subspecialties. 20To reach these learners, others designing similar curricula for noninformaticists may need to target advertisement and outreach to medical students, residents, and fellows who have a period of protected time in their training.For example, fourth year medical students or third year general surgery or cardiology learners starting their research years.More broadly, this issue will require demonstrating the curriculum's value to and coordinating with these clinical departments to facilitate participation.
Our study has several limitations.First, the small initial cohort limits comparative power.Second, given limited response rate of the postpathway survey (95% response rate before and 45% response rate after), there may be sampling bias from respondents.Future studies could include reevaluating the CI-DS pathway or similar curricula after further iteration as well as a follow-up of learners' ultimate career decision and involvement in informatics in the future.

Conclusion
A novel clinical informatics and data science pathway with experiential training in SQL for clinical data extraction, machine learning models, and EHR governance aimed at noninformatics learners across levels of training and professional discipline can successfully recruit a diverse array of learners.Similar curricula may be important for both training a modern workforce of clinical informatics leaders and informatics-fluent noninformaticists.

Clinical Relevance Statement
Our study describes the development and implementation of a targeted informatics curricula for noninformaticists.Key aspects of this curricula include focused core didactics, experiential training in SQL for clinical data extraction, machine learning models, and EHR governance for noninformaticists.This curriculum can attract a diverse group of learners, although important lessons to consider for future curricula include generating further buy-in from clinical departments for additional learner time and adding more structure for aspects of the curriculum less familiar to clinicians like computer science or health IT governance.

Protection of Human and Animal Subjects
The study was performed in compliance with the World Medical Association Declaration of Helsinki on Ethical Principles for Medical Research Involving Human Subjects and was reviewed and exempted by the UCSF Institutional Review Board.

Fig. 1
Fig.1Schema depicting core elements of CI-DS pathway.All learners participate in 1 week of core didactics.Seminar-only learners stop here, but longitudinal learners continue in the pathway for 5 to 6 months.This followed by two required activities: an electronic health record governance activity, and introductory data science exercises.Learners are then required to complete 10 to 15 other elective activities driven by personalized interests.Finally, mentorship is incorporated throughout each phase of the curriculum.CI-DS, Clinical Informatics Data Science.

Table 1
Distilled clinical informatics competencies for Clinical Informatics Data Science pathway

Table 2
Seminar lectures, foundational activities, and elective activities in the Clinical Informatics Data Science pathway Complete SQL data science and machine learning exercises Write Independent SLQ Query for Research or Quality Improvement Project Attend Monthly Mentorship Check-in meetings

Table 4
Learner self-assessment of skills and interest in informatics before and after Clinical Informatics Data Science pathway a p < 0.05.b p < 0.01.

Table 5
Longitudinal pathway learner reported elective activity completion Abbreviation: SQL, structured query language.