Doctors Identify Hemorrhage Better during Chart Review when Assisted by Artificial Intelligence

Martin S. Laursen; Jannik S. Pedersen; Rasmus S. Hansen; Thiusius R. Savarimuthu; Rasmus B. Lynggaard; Pernille J. Vinholt

doi:10.1055/a-2121-8380

Applied Clinical Informatics, Table of Contents

Appl Clin Inform 2023; 14(04): 743-751
DOI: 10.1055/a-2121-8380

Research Article

Doctors Identify Hemorrhage Better during Chart Review when Assisted by Artificial Intelligence

Authors

Martin S. Laursen^*

¹SDU Robotics, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark
Jannik S. Pedersen^*

¹SDU Robotics, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark
Rasmus S. Hansen

²Department of Clinical Biochemistry, Odense University Hospital, Odense, Denmark
Thiusius R. Savarimuthu

¹SDU Robotics, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark
Rasmus B. Lynggaard

²Department of Clinical Biochemistry, Odense University Hospital, Odense, Denmark
Pernille J. Vinholt

²Department of Clinical Biochemistry, Odense University Hospital, Odense, Denmark

Abstract

Full Text

PDF Download

Keywords

electronic health records - hemorrhage - artificial intelligence - decision support systems - natural language processing

Background and Significance

The electronic health record (EHR) contains important information on patients' medical history.[1] It is used by medical doctors (MDs) to obtain knowledge on patient history before and during patient contact to diagnose and guide their treatment. However, critical information may be embedded in unstructured, narrative text[2] from which it can be time-consuming and complex to extract.[3] Diagnoses are registered using the International Classification of Diseases 10 (ICD) codes but detailed information is lost when coding. Furthermore, ICD codes are known to be error-prone[4] [5] [6] [7] [8] and are only used for primary diagnoses leading to admission. Complications such as hemorrhage are not coded. As a result, important information can easily be missed in clinical practice and research, and there is a risk of degraded patient treatment.

Artificial intelligence (AI) methods can be used for fast, automatic identification of information in EHR text.[9] [10] [11] [12] [13] [14] [15] [16] This can be used in an acute clinical setting to find information in the EHR. Clinical evaluation of these methods is currently lacking.[17]

The hemorrhage history is one information that is often in the form of unstructured, narrative text. Hemorrhage history is a cornerstone in hemorrhage risk assessment as previous hemorrhage is a prominent risk factor for hemorrhage in the future.[18] [19] It is recommended to assess hemorrhage risk in all patients admitted to hospital, and for some applications, a risk score is calculated. The assessment of hemorrhage risk can be relevant in other situations, for example prior to surgical procedures or to address the hemorrhage risk upon treatment with antithrombotic or anticoagulant treatment as the therapy itself increases hemorrhage risk.[18] Previous studies on risk scores show lack of adherence to guidelines.[20] [21] [22] A main cause for nonadherence was that the MDs did not have time for the task.[21] In the acute clinical setting, they prioritized direct patient care. Automatically directing the MD to the text passages relevant for reading during chart review would greatly reduce the time needed for the task.

Objectives

The aim of this study is to evaluate if MDs identify more relevant information on hemorrhage events in a clinical setting when assisted by the AI model and to measure the MDs' perception of using the AI model.

Methods

Data and Study Population

The data were extracted from the electronic health care information system COSMIC (Cambio Healthcare Systems, Søborg, Denmark) from a 5-year period at the Odense University Hospital (OUH), Denmark. Information regarding comorbidities was based on ICD. ICD codes and date of death were retrieved from administrative registers.

The AI model was developed using 900 randomly sampled EHRs with a registered ICD code for hemorrhage to ensure representation of hemorrhages from all organs (ICD codes in [Supplementary Appendix A], available in the online version). The data consist of 25,862 sentences labeled as positive or negative with respect to indicating a hemorrhage event in the patient. Part of the data was established earlier.[16] In the current study, we increased the size and quality of the dataset to improve model performance and robustness between different patient characteristics and hemorrhage locations.

After development, the AI model was evaluated in a test cohort consisting of 566 admissions. We sampled admissions both with and without ICD codes for hemorrhage. The evaluation data did not contain any text that had previously been seen by the AI model during training.

Development of Artificial Intelligence Model for Hemorrhage Identification

We developed a model that can analyze Danish EHR note text and find sentences indicative of hemorrhage.

For developing the AI model, we used the sentences from the 900 EHRs labeled as either positive or negative for hemorrhage. The data were reviewed and labeled by MDs with experience in hemorrhage disorders. All sentences were reviewed by one MD. All sentences that the MD found to mention a current, prior, or possible hemorrhage event were reviewed by three MDs. Moreover, hemorrhage events were categorized into one of 12 different groups based on the anatomical location of the hemorrhage: Central nervous system (CNS), Eye, Ear–nose–throat, Airway, Gastrointestinal , Internal, Gynecological, Urological, Muscle and joint, or Dermatological. Some sentences could not be assigned to an anatomical location and were categorized as Unknown. Sentences that mentioned hemorrhages at multiple anatomical locations were assigned to the category Multiple locations.

The AI model takes a sentence as input and creates a numerical representation that is used to classify it as either positive or negative for hemorrhage. It then highlights the sentences in the text, which were classified as positive. The 25,862 annotated sentences were split into a training (80%), validation (10%), and test set (10%). The training set was used to train the AI model, the validation set was used to guide and tune hyperparameters during training, and the test set was used to evaluate the AI model. The training, validation, and test sets each contained 50% positive sentences and 50% negative sentences. Positive sentences of the validation and test sets were balanced using 130 sentences from each anatomical location except from muscle and joint hemorrhage and hemorrhages from multiple locations, which had too few samples. The test set was used to compare the performances of three models: a logistic regression model, a gated recurrent unit (GRU) cell combined with a convolutional neural network (CNN), and a transformer-based ELECTRA model. Further details can be found in [Supplementary Appendix B]> (available in the online version).

After development, we evaluated the AI model on a test cohort consisting of 566 full admissions reviewed by the MDs. To detect potential bias and address the clinical usefulness, we compared the performance of the AI model between selected groups: minor versus major hemorrhage, young (<70 years) versus elderly (≥70 years), males versus females, and between hemorrhage locations. Minor and major hemorrhage were defined as by Decousus et al.[18]

User Studies

For the user studies, we included a total of 16 MDs with 1 to 25 years of clinical experience. The participants worked in departments for either clinical biochemistry, genetics, gynecology, hematological, infectious medicine, orthopaedic surgery, pathology, radiology, or urology. Patient types and the roles of MDs varied, but hemorrhage was relevant in all departments.

For each task, MDs were provided with an EHR text and 10 minutes to complete the task to imitate the work in a clinical setting and give them a moderate time constraint. Participants were given a thorough description of the task before inclusion. The evaluation was conducted in June and July 2022.

Evaluation of Manual Information Extraction with Eye Tracking

Six MDs participated in a workshop with the purpose of investigating their reading workflow when doing manual chart review. The participants were told to extract information regarding hemorrhage events from a fictive EHR containing 79 clinical notes. The fictive EHR contained 21 sentences describing hemorrhage events.

We used Pupil Center Corneal Reflection[23] eye tracking to analyze their gaze during the task. The EHR was a fictive patient case written by an experienced MD and made available in the Cosmic EHR system, which was the system used in clinical practice at the time, to ensure familiarity with the user interface. We defined a mention of a hemorrhage event as having been read and identified if the MD's gaze was fixed on it for at least 1 second.

Clinical Use Study of Artificial Intelligence-Assisted Chart Review

To compare manual chart review with AI-assisted chart review, we performed a clinical use study with two admissions from the EHR system of the OUH. 13 MDs participated in the study. Seven participants were given admission A and six participants were given admission B. The two admissions (A and B) used to evaluate the MDs' chart review performance with and without AI model assistance contained 63 and 51 hemorrhages, respectively.

First, participants reviewed the admission without AI model assistance and then with AI model assistance. When assisted by the AI model, sentences that mentioned hemorrhage according to the AI model had been highlighted with yellow background color. Between the nonassisted and assisted review, participants were given a random admission to review with AI model assistance to practice using the AI model and for them to not remember the location of hemorrhages from the first review.

The participants were informed about the performance of the AI model and given a definition of what constituted a relevant hemorrhage event. They were told to review the admissions as they would do in clinical practice and that they had a maximum of 10 minutes to review one admission to simulate clinical practice. MDs reported hemorrhage events by highlighting the sentence with red color.

The admissions were anonymized and presented in Microsoft Word (Microsoft, Redmond, WA) with black text, font size 10, and formatted as the electronic health care information system of the OUH.

After completion, participants were asked to rate the AI model's usefulness in a clinical setting on a 5-point scale from 1 to 5 where 5 was the most useful. Furthermore, participants were interviewed in a semistructured format. They were for example asked to describe advantages and disadvantages of the AI model and if they preferred chart review with or without AI model assistance.

Statistics

For developing the AI model, we used Python 3.6 and TensorFlow 2.0.

For evaluation of the AI model, we reported descriptive statistics, sensitivity, and specificity.

For the user studies, we reported sensitivity and participants' ratings. Means were reported with change in standard deviation (SD). Frequencies were reported as counts and percentages.

Results

Artificial Intelligence Model for Hemorrhage Identification

We trained three AI models and found that a transformer-based architecture, ELECTRA,[24] performed best with a sensitivity and specificity of 95.8% on the balanced test set. See [Supplementary Appendix B]> (available in the online version) for further details regarding the model development and hyperparameter tuning.

Evaluation of Artificial Intelligence Model for Hemorrhage Identification

[Table 1] shows the patient characteristics of the evaluation population consisting of 566 admissions. Patients had a median age of 71 years and 47.2% were women. Admissions had a median of 39 EHR notes, ranging from 2 to 542 notes. Essential hypertension was the most frequently coded symptom or disease with the patient experiencing it in 7.1% of admissions.

Table 1
Patient characteristics of the 566 admissions for evaluation
Patient characteristic	Value
Admissions, n	566
EHR notes, n	37,058
EHR notes per admission, median (range)	39 (2–542)
Sex, % women	47.2%
Age (y), median (range)	71 (0–97)
Registered diagnoses during admissions, (%) (2% of admissions)	Essential hypertension (7.1%) Atrial fibrillation or atrial flutter (4.2%) Pneumonia (3.5%) Chronic obstructive pulmonary disease (2.8%) Retention of urine (2.1%) Chronic kidney disease (2.1%) Anemia (2.1%)

Abbreviations: EHR, electronic health record; ICD, International Classification of Diseases.

Notes: Registered diagnoses were based on ICD codes during admission. ICD codes for hemorrhage were omitted.

The most frequent hemorrhage location was urological with 1,019 sentences from 143 admissions indicating hemorrhage. The least frequent hemorrhage location was muscle and joints with 16 sentences from nine admissions indicating hemorrhage. Patients experienced a total of 637 hemorrhages (range: 1–7) in different anatomical locations during 385 admissions. The patients experienced hemorrhage in a single anatomical location during 226 admissions and in two or more anatomical locations during 159 admissions. Admissions had a median of three sentences indicating hemorrhage, ranging from 0 to 96. The majority of admissions with sentences indicating hemorrhage in an unknown location (43/47) had other sentences indicating hemorrhage in a specific location. The 566 admissions included 4,413 sentences indicating hemorrhage and 421,798 sentences not indicating hemorrhage. See [Supplementary Appendix C]> for further results (available in the online version).

The AI model had a sensitivity of 93.7% and specificity of 98.1% on sentences in the 566 admissions.

[Fig. 1] shows the sentence-level sensitivity of the AI model by anatomical location on the 566 evaluation admissions. The AI model had the lowest sensitivity for dermatological hemorrhage at 87.6%, whereas it had the highest sensitivity for muscle and joint hemorrhage and eye hemorrhage at 100%.

Fig. 1 Sentence-level sensitivity of the artificial intelligence model by anatomical location on the 566 evaluation admissions. Unknown are sentences that could not be assigned to an anatomical location.

We compared the performance of the AI model on the 566 evaluation admissions between selected patient groups to detect potential bias. There were no major differences. Minor differences in sensitivity were between minor hemorrhage (95.0%) and major hemorrhage (91.1%). There were no noteworthy differences between sensitivity on males versus females (93.6 vs. 93.8%) or young versus elderly (93.0 vs. 94.3%).

On average, the AI model processed all sentences of an admission in approximately 0.5 seconds using a Nvidia Tesla v100 GPU.

User Studies

Eye Tracking

The study showed that MDs overlooked 39% of possible hemorrhage events during manual chart review. On average, MDs identified 8.5 of the 10 hemorrhage events described in bullet-pointed text, whereas they only identified 5.3 of the 11 hemorrhage events described in paragraphs. Moreover, MDs identified more hemorrhage events that were described in the beginning of a note (7 out of 7) or paragraph (6.3 out of 7) compared with hemorrhage described in the middle of a note (6.8 out of 14) or paragraph (7.5 out of 14). See [Supplementary Appendix D]> for further results (available in the online version).

Clinical Use Study

[Fig. 2] shows an example of EHR text when being reviewed with AI model assistance.

Fig. 2 An example of electronic health record text when being reviewed with artificial intelligence model assistance.

[Fig. 3] shows the performance of the participants when reviewing with and without AI model assistance for admission A and B. All participants increased the absolute number of identified hemorrhages when reviewing with AI model assistance.

Fig. 3 Visualization of change in identified hemorrhages between reviewing without and with artificial intelligence model assistance for A.1: absolute number of identified hemorrhages for admission A; A.2: percent identified hemorrhages in text reviewed for admission A; B.1: absolute number of identified hemorrhages for admission B.1; and B.2: percent identified hemorrhages in text reviewed for admission B. Participants 1 to 7 reviewed admission A, and participants 8 to 13 reviewed admission B. MD = medical doctor.

For admission A, the participants identified on average 45% (SD: ± 8) of hemorrhages on the full admission when reviewing without AI model assistance. With AI model assistance, it increased to 93% (SD: ± 13). Without AI model assistance, participants missed 33% (SD: ± 16) of hemorrhage sentences in the text reviewed. With AI model assistance, participants missed no hemorrhage sentences.

For admission B, the participants identified on average 26% (SD: ± 17) of hemorrhages on the full admission when reviewing without AI model assistance. With AI model assistance, it increased to 75% (SD: ± 10). Without AI model assistance, participants missed 46% (SD: ± 24) of hemorrhage sentences in the text reviewed. With AI model assistance, participants only missed 11% (SD: ± 6) of hemorrhage sentences.

On average, participants who reviewed admission A rated the usefulness as 4.0 (SD: ± 0.9) on a scale from 1 to 5 (additional information in [Supplementary Appendix E]>, available in the online version). They described the AI model as useful for fast detection of relevant information and for sorting information. Six out of seven preferred AI model assistance over no AI model assistance and one did not know.

Participants who reviewed admission B rated the usefulness as 3.7 (SD: ± 0.5) on a scale from 1 to 5. They described that the AI model was useful, but that it did not find all relevant information, leading to a risk of overlooking important information if basing decisions solely on the AI model's findings. Five out of six preferred AI model assistance over no AI model assistance.

In the semistructured interviews, participants stated that the AI model should save time, resources, and be useful from a medical perspective. Also, it was important that it spared the MD from a task.

Participants noted that the AI model could be of assistance in an emergency department where MDs must retrieve information about the patient's medical history from many different specialties. They stated that it is difficult to decide what is useful and define search terms for information from specialties that they are not necessarily well-versed in, and that they preferred if the AI model could find the relevant information.

They expected that an AI model would perform better than a busy MD on the task of extracting information from EHRs.

The participants generally expressed that the AI model's output in the form of yellow markings was an advantage. They reported that if they trust the AI model, they would be more likely to only look for the highlighted text and thus perform chart review faster than without AI model assistance. The participants gave no exact guidance about what performance they expected from the AI model for them to trust it.

Participants expressed a concern about blindly trusting the AI model and that one runs the risk of losing focus in assessment of clinical information. One participant stated that it did not aid in the understanding of the text content. Another reported being taught (or reminded) by the AI model that the phrase “grade 1” is pertaining to a scale for measuring blood in the urine, therefore indicating hematuria. Thus, the AI model had helped with learning and understanding.

[Supplementary Appendix F]> (available in the online version) contains relevant quotes from the semistructured interviews.

Discussion

This study evaluated an AI model for finding hemorrhage events. It had a sensitivity of 93.7% and acceptable performance in relevant clinical settings. MDs miss more than 33% of relevant sentences when doing chart review without AI assistance and improved the number of hemorrhages found when reviewing the EHR when assisted by the AI model. The user satisfaction with the AI assistance was high.

This study found that the transformer-based model performed better than a logistic regression and a GRU–CNN model. This is in line with recent developments in language technology where the transformer-based models have achieved state-of-the-art performance on many benchmarks.[25] [26]

Automatic hemorrhage identification has been investigated in previous studies.[9] [10] [11] [12] [13] [14] [15] [16] Pedersen et al used an AI model to classify Danish clinical text as positive or negative for hemorrhage and achieved an accuracy of 90% on a balanced test set.[16] For English EHRs, Li et al used an AI model to detect hemorrhage events in EHR sentences with an F1 score of 94%,[11] and Taggart et al detected hemorrhage events at a note level using a rule-based approach with an F1 score of 74%.[10] Mitra et al used an AI model to extract single words that indicated hemorrhage and achieved an F1 score of 75%.[13] However, most studies used small test sets during model development (<1,000 samples)[10] [11] [12] [13] [14] [16] and did not investigate model performance between different hemorrhage types or patient characteristics.[9] [10] [11] [12] [13] [14] [16] In contrast to previous studies on hemorrhage identification, we made an evaluation in a test cohort.

Properly investigating AI model performance is important since studies found that AI models for text are subject to many sources of bias, e.g., gender bias that could influence the performance of the AI model in clinical settings.[27] In this regard, it is of clinical importance that the AI model has high performance in all settings and patient groups. In this study, we found that the AI model performed similarly on males versus females, major versus minor hemorrhage, and for patients with an age of <70 and ≥70. We also showed that the model performed similarly on different hemorrhage locations. Evaluating performance across hemorrhage locations is important because the phrasing used to describe hemorrhage varies greatly depending on the location,[28] e.g., epistaxis is a word used specifically for nose hemorrhage. If the model had not been able to detect hemorrhages for all locations, it would be reflected as bias toward specific patient groups in clinical practice. Overall, the present study shows that the AI model for hemorrhage identification has an acceptable performance for clinical use.

Previous studies did not investigate MDs' performance when using or perception of using an AI model for finding hemorrhage events in a clinical setting. Our study provides a clinical use evaluation, which showed that important hemorrhage information can easily be overlooked when reviewing clinical text without assistance. MDs did not register up to 46% of hemorrhage sentences when reviewing an admission under moderate time pressure. Albeit they may have reached the right conclusion about the patients' hemorrhage history, it suggests that critical information can be missed. Few hemorrhage events were missed when assisted by an AI model for hemorrhage identification, showing a potential role for such tools in clinical practice. Congruently, MDs were positive toward using the AI model as decision support and stated that it was easier to review with AI model assistance and found it useful for providing a fast overview of hemorrhage events. The MDs did not request any explanation of the internal mechanisms of the AI model. Instead, they reported to be satisfied with the transparent output of the model in terms of highlighting of relevant text. The characteristics of the innovation that participants highlighted are factors that are also positively associated with adoption of innovation. These perspectives include trialability (the ability to try out the innovation) and observability (the ability to observe the functionality), which were achieved in the study.[29] Also, the ease of use is relevant for adoption of innovation. The ease-of-use concerns both the specific application itself and the context of its use. Thus, the implementation of an innovation should not introduce an extra task but make medical decisions and workflow more efficient. Further, users expressed a preference for reviewing text with AI assistance, most likely because they find it easier. This is consistent with psychological theories as automated decisions made by “system 1” require less cognitive effort than “system 2” decisions that involve processing and judgment of information.[30] A risk of solely focusing on highlighted text is overlooking important information. However, there is also potential for learning by highlighting text that a health care professional may not have realized was relevant to the topic, as also demonstrated in this study.

An AI model that highlights sentences ensures that MDs can capture all the relevant information. However, it still requires cognitive effort to condense and evaluate the highlighted content, and the procedure does not eliminate manual work. On the other hand, there is no loss of information, and the approach ensures that the data can be processed in various contexts since nothing has been filtered out. In specific clinical scenarios, such as when the MD is interested in a specific previous hemorrhage, all the output from the AI model might not have to be reviewed, which would reduce manual workload.

Limitations

It is a limitation to the generalizability of the study that the data were sampled using ICD codes and therefore is not indicative of a typical distribution of patients at the hospital, and that all data came from a single hospital.

The user studies may not fully reflect the potential value of the AI model as it was tested in a mock-up, and thus, the setting may have influenced the exact results in terms of number of hemorrhages identified and text reviewed. This study evaluated the AI model by having the participants review an admission with and without AI assistance. While this methodology provides a straightforward means of comparing manual chart review with and without AI assistance, it has a disadvantage regarding potential recall bias if participants were able to remember the exact position of hemorrhage events in the admission. We mitigated this by using long admissions with many scattered sentences indicative of bleeding and by having participants review a random admission in between the nonassisted and assisted review. The time constraint is also expected to have reduced their collection in memory. An alternative approach to accurately represent a clinical scenario could be to ask MDs targeted questions regarding hemorrhage events and compare the performances of two groups of MDs, one group utilizing AI assistance and the other not. Nonetheless, the user ratings and the results clearly indicate a positive clinically relevant effect.

For the eye-tracking study, we defined a hemorrhage event as being identified if the MDs' gaze remained fixed on the event for at least 1 second. Future work should conduct a sensitivity analysis on this time-threshold to analyze its impact.

Conclusion

We developed an AI model for hemorrhage identification that correctly identifies 93.7% of sentences indicating hemorrhage in an evaluation on a test cohort. Moreover, we found that MDs identified more hemorrhages during chart review when assisted by the AI model that highlights sentences with hemorrhage compared with manual extraction where MDs miss more than 33% of relevant sentences. MDs were positive toward using an AI model for hemorrhage identification in clinical practice. Overall, the study shows that the technology is clinically useful for information extraction from EHRs.

Clinical Relevance Statement

We developed an AI model for hemorrhage identification that can support MDs during chart review. The implications are a less time-consuming chart review and that MDs find more hemorrhages during chart review. This improves patient treatment.

Multiple-Choice Questions

What advantages are there to AI-assisted chart review for hemorrhage events?
- An AI removes the need for manual chart review
- MDs find more relevant information when assisted
- MDs have more time to do chart review
- People without medical knowledge can perform the chart review
Correct Answer: The correct answer is option b. When assisted by an AI, the MDs found more relevant information in the form of hemorrhage events than when doing nonassisted manual chart review.
Which opinion did some MDs express about AI-assisted chart review?
- Assisted chart review will lead to worse patient treatment
- Assisted chart review is confusing to the patients
- Some MDs are not comfortable with the technology
- Blindly trusting the AI assistance is a concern
Correct Answer: The correct answer is option d. Some MDs stated concerns about blindly trusting the AI and that one runs the risk of losing focus in assessment of clinical information. However, there is also potential for learning by highlighting text that a health care professional may not have realized was relevant to the topic.

References

References
1 Entzeridou E, Markopoulou E, Mollaki V. Public and physician's expectations and ethical concerns about electronic health record: benefits outweigh risks except for information security. Int J Med Inform 2018; 110: 98-107
2 Miotto R, Wang F, Wang S, Jiang X, Dudley JT. Deep learning for healthcare: review, opportunities and challenges. Brief Bioinform 2018; 19 (06) 1236-1246
3 Tayefi M, Ngo P, Chomutare T. et al. Challenges and opportunities beyond structured data in analysis of electronic health records. Wiley Interdiscip Rev Comput Stat 2021; 13 (06) e1549
4 Valkhoff VE, Coloma PM, Masclee GMC. et al; EU-ADR Consortium. Validation study in four health-care databases: upper gastrointestinal bleeding misclassification affects precision but not magnitude of drug-related upper gastrointestinal bleeding risk. J Clin Epidemiol 2014; 67 (08) 921-931
5 Øie LR, Madsbu MA, Giannadakis C. et al. Validation of intracranial hemorrhage in the Norwegian Patient Registry. Brain Behav 2018; 8 (02) e00900
6 Delekta J, Hansen SM, AlZuhairi KS, Bork CS, Joensen AM. The validity of the diagnosis of heart failure (I50.0-I50.9) in the Danish National Patient Register. Dan Med J 2018; 65 (04) A5470
7 Higgins TL, Deshpande A, Zilberberg MD. et al. Assessment of the accuracy of using ICD-9 diagnosis codes to identify pneumonia etiology in patients hospitalized with pneumonia. JAMA Network Open 2020; 3 (07) e207750-e207750
8 Wabe N, Li L, Lindeman R. et al. Evaluation of the accuracy of diagnostic coding for influenza compared to laboratory results: the availability of test results before hospital discharge facilitates improved coding accuracy. BMC Med Inform Decis Mak 2021; 21 (01) 168
9 Lee HJ, Jiang M, Wu Y. et al. A comparative study of different methods for automatic identification of clopidogrel-induced bleedings in electronic health records. AMIA Jt Summits Transl Sci Proc 2017; 2017: 185-192
10 Taggart M, Chapman WW, Steinberg BA. et al. Comparison of 2 natural language processing methods for identification of bleeding among critically ill patients. JAMA Netw Open 2018; 1 (06) e183451-e183451
11 Li R, Hu B, Liu F. et al. Detection of bleeding events in electronic health record notes using convolutional neural network models enhanced with recurrent neural network autoencoders: deep learning approach. JMIR Med Inform 2019; 7 (01) e10788
12 Elkin PL, Mullin S, Mardekian J. et al. Using artificial intelligence with natural language processing to combine electronic health record's structured and free text data to identify nonvalvular atrial fibrillation to decrease strokes and death: evaluation and case-control study. J Med Internet Res 2021; 23 (11) e28946
13 Mitra A, Rawat BPS, McManus D, Kapoor A, Yu H. Bleeding entity recognition in electronic health records: a comprehensive analysis of end-to-end systems. In: AMIA Annual Symposium Proceedings. Vol. 2020. American Medical Informatics Association; 2020: 860
14 Mitra A, Rawat BPS, McManus DD, Yu H. Relation classification for bleeding events from electronic health records using deep learning systems: an empirical study. JMIR Med Inform 2021; 9 (07) e27527
15 Shung D, Tsay C, Laine L. et al. Early identification of patients with acute gastrointestinal bleeding using natural language processing and decision rules. J Gastroenterol Hepatol 2021; 36 (06) 1590-1597
16 Pedersen JS, Laursen MS, Rajeeth Savarimuthu T. et al. Deep learning detects and visualizes bleeding events in electronic health records. Res Pract Thromb Haemost 2021; 5 (04) e12505
17 Rajpurkar P, Chen E, Banerjee O, Topol EJ. AI in health and medicine. Nat Med 2022; 28 (01) 31-38
18 Decousus H, Tapson VF, Bergmann JF. et al; IMPROVE Investigators. Factors at admission associated with bleeding risk in medical patients: findings from the IMPROVE investigators. Chest 2011; 139 (01) 69-79
19 National Guideline Centre #x0028;UK#x0029;. Venous thromboembolism in over 16s: Reducing the risk of hospital-acquired deep vein thrombosis or pulmonary embolism. London: National Institute for Health and Care Excellence #x0028;NICE#x0029;; March 2018.
20 Amin A, Stemkowski S, Lin J, Yang G. Thromboprophylaxis rates in US medical centers: success or failure?. J Thromb Haemost 2007; 5 (08) 1610-1616
21 Rwabihama JP, Audureau E, Laurent M. et al; améliorer la prophylaxie de la Maladie ThromboEmbolique veineuse en milieu gériatrique (MATEV) Study Group. Prophylaxis of venous thromboembolism in geriatric settings: a cluster-randomized multicomponent interventional trial. J Am Med Dir Assoc 2018; 19 (06) 497-503
22 Amin A, Stemkowski S, Lin J, Yang G. Appropriate thromboprophylaxis in hospitalized cancer patients. Clin Adv Hematol Oncol 2008; 6 (12) 910-920
23 Matthiesen S, Meboldt M, Ruckpaul A, Mussgnug M. et al. Eye tracking, a method for engineering design research on engineers' behavior while analyzing technical systems. In: DS 75-7: Proceedings of the 19th International Conference on Engineering Design (ICED13), Design for Harmonies, Vol. 7: Human Behaviour in Design,. Seoul, Korea, August 19–22, 2013: 277-286
24 Clark K, Luong MT, Le QV, Manning CD. ELECTRA: pre-training text encoders as discriminators rather than generators. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia,. April 26–30, 2020 . OpenReview.net. https://openreview.net/forum?id=r1xMH1BtvB
25 Wang A, Singh A, Michael J, Hill F, Levy O, Bowman S. GLUE: a multi-task benchmark and analysis platform for natural language understanding. In: Proceedings of the 2018 EMNLP Workshop BLackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Association for Computational Linguistics; 2018: 353-355
26 Wang A, Pruksachatkun Y, Nangia N. et al. SuperGLUE: a stickier benchmark for general-purpose language understanding systems. In: Wallach H, Larochelle H, Beygelzimer A, d'Alché-Buc F, Fox E, Garnett R. eds. Advances in Neural Information Processing Systems. Vol 32. Curran Associates, Inc.; 2019. https://proceedings.neurips.cc/paper_files/paper/2019/file/4496bf24afe7fab6f046bf4923da8de6-Paper.pdf
27 Sun T, Gaut A, Tang S. et al. Mitigating gender bias in natural language processing: literature review. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019: 1630-1640
28 Pedersen JS, Laursen MS, Vinholt PJ, Alnor AB, Savarimuthu TR. Investigating anatomical bias in clinical machine learning algorithms. In: Findings of the European Chapter of the Association for Computational Linguistics: EACL 2023. Association for Computational Linguistics; 2023
29 Rogers EM, Singhal A, Quinlan MM. Diffusion of innovations: an integrated approach to communication theory and research. Routledge; 2014: 432-448
30 Kahneman D. Thinking, Fast and Slow. Macmillan; 2011: 19-31
31 Pedersen JS, Laursen MS, Soguero-Ruiz C, Savarimuthu TR, Hansen RS, Vinholt PJ. Domain over size: clinical ELECTRA surpasses general BERT for bleeding site classification in the free text of electronic health records. In: 2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI). IEEE; 2022: 1-4
32 Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching word vectors with subword information. Trans Assoc Comput Linguist 2017; 5: 135-146
33 Laursen MS, Pedersen JS, Vinholt PJ, Hansen RS, Savarimuthu TR. Benchmark for evaluation of Danish clinical word embeddings. North Eur J Lang Technol. 2023;9(01):
34 Salton G, Wong A, Yang CS. A vector space model for automatic indexing. Commun ACM 1975; 18 (11) 613-620

Figures

Fig. 1 Sentence-level sensitivity of the artificial intelligence model by anatomical location on the 566 evaluation admissions. Unknown are sentences that could not be assigned to an anatomical location.

Fig. 2 An example of electronic health record text when being reviewed with artificial intelligence model assistance.

Fig. 3 Visualization of change in identified hemorrhages between reviewing without and with artificial intelligence model assistance for A.1: absolute number of identified hemorrhages for admission A; A.2: percent identified hemorrhages in text reviewed for admission A; B.1: absolute number of identified hemorrhages for admission B.1; and B.2: percent identified hemorrhages in text reviewed for admission B. Participants 1 to 7 reviewed admission A, and participants 8 to 13 reviewed admission B. MD = medical doctor.

Supplementary Material

Supplementary Material (PDF)