A Scoping Review of Artificial Intelligence Algorithms in Clinical Decision Support Systems for Internal Medicine Subspecialties

Abstract Objectives Artificial intelligence (AI)-based clinical decision support systems (CDSS) have been developed to solve medical problems and enhance health care management. We aimed to review the literature to identify trends and applications of AI algorithms in CDSS for internal medicine subspecialties. Methods A scoping review was conducted in PubMed, IEEE Xplore, and Scopus to determine articles related to CDSS using AI algorithms that use deep learning, machine learning, and pattern recognition. This review synthesized the main purposes of CDSS, types of AI algorithms, and overall accuracy of algorithms. We searched the original research published in English between 2009 and 2019. Results Given the volume of articles meeting inclusion criteria, the results of 218 of the 3,467 articles were analyzed and presented in this review. These 218 articles were related to AI-based CDSS for internal medicine subspecialties: neurocritical care (n = 89), cardiovascular disease (n = 79), and medical oncology (n = 50). We found that the main purposes of CDSS were prediction (48.4%) and diagnosis (47.1%). The five most common algorithms include: support vector machine (20.9%), neural network (14.6%), random forest (10.5%), deep learning (9.2%), and decision tree (8.8%). The accuracy ranges of algorithms were 61.8 to 100% in neurocritical care, 61.6 to 100% in cardiovascular disease, and 54 to 100% in medical oncology. Only 20.1% of those algorithms had an explainability of AI, which provides the results of the solution that humans can understand. Conclusion More AI algorithms are applied in CDSS and are important in improving clinical practice. Supervised learning still accounts for a majority of AI applications in internal medicine. This study identified four potential gaps: the need for AI explainability, the lack of ubiquity of CDSS, the narrow scope of target users of CDSS, and the need for AI in health care report standards.


Background and Significance Clinical Decision Support Systems
According to the Office of the National Coordinator for Health Information Technology, "clinical decision support (CDS) provides clinicians, staff, patients, or other individuals with knowledge and person-specific information, intelligently filtered or presented at appropriate times, to enhance health and health care." 1 CDS can be used on a variety of tools and systems for clinical decision-making. Examples of CDS tools include alerts, reminders, clinical guidelines, recommendations, condition-specific order sets, data reports, documentation templates, diagnostic support, and databases. 2 CDS systems (CDSS) are computerized tools to help clinicians make clinical decisions and manage information. 3 Examples of CDSS include automated laboratory alerting systems that help the user focus on key messages such as highlighting abnormal laboratory values, 4 and pharmacy information systems that provide alerts for drug allergies or interactions. 5 Advanced CDSS delivers more accurate information to clinicians, for instance, personalized drug dosage calculators, case-based recommendations, and suggestions for laboratory testing based on diseases. Because of the rapid growth of electronic health records (EHR), CDSS has been increasingly integrated in the EHR system and the existing workflow that the clinician can efficiently receive and act on system generated recommendations. 6 To manage a large amount of clinical data and effectively transform health care systems, artificial intelligence (AI) and machine learning (ML) have been applied to computerized CDSS. [7][8][9] Artificial Intelligence AI was defined in 1955 by John McCarthy as "the science and engineering of making intelligent machines," which has been designed to resolve complex challenges and hopefully someday will be as intelligent as humans. 10 The first introduction of AI in health care was in the 1970s at Stanford University, California. They developed the MYCIN rule-based system to advise physicians regarding antimicrobial therapy. The MYCIN suggested possible pathogens and recommended a dosage of antibiotics based on body weight. 11,12 ML is a subset of AI defined as "the field of study that gives computers the ability to learn without being explicitly programmed" by Arthur Samuel. 13 ML algorithms have four types: supervised learning, unsupervised learning, semisupervised learning, and reinforcement learning. Using data containing both inputs and target outcomes, supervised learning algorithms build a model. Conversely, unsupervised learning algorithms use data that contain only inputs to find the structure or pattern of the data. Semisupervised learning is an algorithm mixed between supervised and unsupervised learning algorithms to improve the accuracy of the model. 14 Reinforcement learning does not require input/output pairs, and it focuses on a tradeoff between exploration and exploitation. 15 ML models learn from training data to detect or predict outcomes with high accuracy. ML supports clinical work in prognosis, diagnosis, treatment, and clinical workflow. 14 For example, ML was widely used in studies predict-ing hospital readmission to reduce the payment for patients readmitted within 30 days of discharge. The most utilized algorithms in these studies were decision tree (DT)-based methods and support vector machine (SVM). 16 Deep learning (DL) is a subset of ML that consists of layered sets of algorithms to progressively extract higherlevel features from the raw input, inspired by neural networks (NN) of the human brain. The representation of one layer starting with the raw data input is fed and transformed into the next layer representation that enables learning highly complex functions. 17 DL works very well at discovering complex structures in high-dimensional data in medicine. For example, DL was used to identify malignancy from pictures of skin lesions, 18 detecting pneumonia from chest radiographs, 19,20 and diagnosing diabetic retinopathy based on retinal photographs. 21 These studies demonstrate that combining advanced computational methodologies with CDSS may reduce medical errors and improve care processes. 6,[22][23][24] Explainable Artificial Intelligence Explainable AI (XAI) was defined by Matt Turek from the Defense Advanced Research Projects Agency XAI program.
Turek claims, "XAI proposes creating a suite of ML techniques that (1) produces more explainable models while maintaining a high level of learning performance (prediction accuracy) and (2) enables human users to understand, appropriately trust, and effectively manage the emerging generation of artificially intelligent partners. 25 " Many ML algorithms could not explain how and why a specific decision has been made. Thus, it raised the question: how can we make ML algorithms explainable? In 2018, the European Union General Data Protection Regulation discussed how to explain AI algorithms, and this discussion led to a debate among AI researchers regarding the "right to explanation." 26 The right to explanation is a right to be given an explanation for the output of the algorithm. Because many AI algorithms, such as the output of the deep NN, are not easily explainable, XAI becomes more important and seeks to provide an explanation from AI algorithms. The explainability of AI could help to enhance the trust of AI-based systems from medical professionals. 27 Thus, AI-based CDSS requires not only good performance but also explainability that is trustworthy, transparent, and interpretable. 28 To analyze the explainability of AI-based CDSS, we can consider four perspectives from a multidisciplinary approach: technological, legal, medical, and patient perspectives. 29 The technological perspective considers the explainability of the model by characteristics of the algorithm. From the legal perspective, there are three issues needed to be considered for explainability: (1) informed consent, (2) certification and approval as medical devices from the Food and Drug Administration (FDA), 30 and (3) liability. Using unexplainable AI algorithms in CDSS for medical purposes has been controlled by the FDA; hence, it would affect the trend of using XAI and AI in the future. From the medical perspective, AI-based CDSS will be considered two levels of explainability: understanding the output ACI Open Vol. 5 No. 2/2021 © 2021. The Author(s).
from the system and identifying feature importance. Last, from the patient perspective, explainability can provide personalized recommendations based on the patient's characteristics and risk factors. XAI-based CDSS could enhance patient engagement and provide an accurate risk perception. 31,32

AI in Internal Medicine
In medicine, AI is widely used to understand medical conditions, to predict diagnoses, to process extensive health data, and to aid physicians in making clinical decisions. 33 Examples of the current systems include IBM's Watson Health solutions 34 for the field of Clinical Medicine and MeVis medical solutions 35 for oncological radiology. Internal medicine is a medical specialty dealing with diagnosis, treatment, and prevention of adult diseases. 36 Internal medicine specialty has 20 subspecialties 37 and has the largest number of active physicians in the United States. 38 For the field of neurovascular disorders, Murray et al 39 reviewed the literature on acute stroke diagnostic-focused AI from 2014 to 2019 using the search terms: "artificial intelligence" or "machine learning or deep learning" and "ischemic stroke" or "large vessel occlusion." A total of 20 studies were identified, and the results show that random forest (RF) learning was used for the Alberta Stroke Program Early Computerized Tomography (CT) Score. In contrast, convolutional NN were used for detecting large vessel occlusions. The authors also identified platforms, including Brainomix, General Electric, iSchemaView, and Viz.ai. The authors suggested that AI improves stroke detection; however, the standardization of performance assessment is required.
For the field of cardiovascular diseases, Kilic 40 reviewed articles related to AI, ML, and cardiovascular health care that were published up to 2019. The author categorized ML algorithms into two major types, namely, supervised and unsupervised learning algorithms. Supervised learning algorithms include the Naïve Bayes theorem (NB), k-nearest neighbors, SVM, RF, extreme gradient boosting, and DT. Unsupervised learning algorithms include k-mean clustering, hierarchical clustering, principal component analysis, and singular value decomposition. The author summarized the potential application of ML in cardiovascular health care into three groups: (1) automated imaging interpretation, (2) natural language processing from EHR, and (3) predictive analytics. The author mentioned the challenges of implementing ML into clinical practice, including unexplainable results, privacy and ethical issues, validation and long-term evaluation, and the need for a large amount of data.
For an example of applied AI in the field of oncology, Jin et al 41 conducted a systematic review on AI in gastric cancer using the search terms: "artificial intelligence" and "gastric cancer," and a total of 68 studies were included. The study reported that AI was used for omic data analyses, the identification of Helicobacter pylori infection and chronic atrophic gastritis, endoscopic diagnosis for gastric cancer, invasion depth prediction, digital pathology, bleeding detection, surgery (preoperative, intraoperative, and postoperative procedures), metastases and staging prediction, and prognosis prediction. The authors also grouped AI applica-tions in gastric cancer, as mentioned above, into detection, treatment, and prognosis. The authors suggested that large randomized controlled trials (RCTs) are required to validate the AI models. However, it is difficult to conduct large RCTs in the rapidly changing environment of an EHR due to costs, interoperability, quality of data, and privacy and data security considerations. [42][43][44] After reviewing several systematic reviews of AI in medicine, we concluded that AI applications in medicine could be grouped as prognosis/prediction, diagnosis/detection, treatment, and clinical workflow. Current ML implementation in clinical practice lacks the explainability of AI. Last, there is a need for the standardization to validate clinical performance of AI applications.

Objectives
There were many systematic review studies related to AI in Medicine. However, few studies reported the frequency and explainability of AI algorithms used in CDSS. We aimed to extract key information to identify a potential gap for further study.
In this study, we conducted a scoping review of literature in the past decade to analyze the implementation of applied AI in CDSS for subspecialties in internal medicine. Subspecialties in this study refer to the additional training to "subspecialize" in additional areas of internal medicine. 37 We aimed to answer three research questions (RQs), which are: (RQ1) What is the frequency of applications regarding purposes of CDSS among prediction, diagnosis, treatment optimization, and clinical workflow optimization?
(RQ2) What is the frequency of applications regarding AI algorithms used in CDSS?
(RQ3) What is the overall accuracy of those algorithms?

Inclusion and Exclusion Criteria
Articles were included if they met the following criteria: (1) addressed CDSS using AI algorithms; (2) the AI algorithms studied include DL, ML, or automated pattern recognition; (3) they were related to the internal medicine specialty; (4) they were published between January 1, 2009 and December 31, 2019; (5) they were published in English; and (6) were original research. We excluded articles using natural language or text processing that did not use AI algorithms. We also excluded articles proposing a new platform of CDSS without reporting results, technical reports of new algorithms without applications in medical research, and review papers.

Search Strategy
We searched three databases, including PubMed, IEEE Xplore, and Scopus, using the combination of search terms: "Clinical Decision Support Systems" AND ("Artificial Intelligence" OR "Deep Learning" OR "Machine Learning" OR "Automated Pattern Recognition") and limited results from January 1, 2009 to December 31, 2019. We included ACI Open Vol. 5 No. 2/2021 © 2021. The Author(s). automated pattern recognition in our search terms because pattern recognition is interchangeably used for ML. 45 We limited results from 2009 to 2019 because Meaningful Use introduced in 2009 in the United States promoted the electronic exchange of health information via certified EHR technology. 46,47

Study Selection
First, we reviewed the literature by screening the titles and abstracts and classified each paper as relevant, not relevant, or unclear. Second, the unclear category was revisited by reading the full-text and re-categorizing it as relevant or not relevant. Third, the full-text articles were read and key information was extracted. Those articles that met the inclusion criteria were included in the final set of articles. Last, we categorized all included articles into different internal medicine subspecialties including neurocritical care, cardiovascular disease, medical oncology, infectious disease, endocrinology, diabetes, and metabolism, critical care medicine, nephrology, gastroenterology, pulmonary disease, hematology, rheumatology, allergy and immunology, and geriatric medicine. We excluded articles related to other medical areas, including anesthesiology, dermatology, emergency medicine, obstetrics and gynecology, ophthalmology, orthopedic surgery, otolaryngology-head and neck surgery, pathology, pediatrics, physical medicine and rehabilitation, preventive medicine, psychiatry and neurocritical care, radiology, surgery, thoracic surgery, urology, orthodontics, and pharmacology from our review. Disagreements on inclusion, exclusion, and information extraction were resolved by consensus-based discussion among three authors (P.N., M.S.K. and S.A.B).

Data Extraction and Analysis
Key information was extracted from all articles by P.N. (►Appendix A). The characteristics of articles included publication year, author, journal title, article title, study design (observational and experimental studies), purpose, decision, input data (a type of data, number of cases, and period of study), primary algorithms, comparison methods, balancing technique, explainability, accuracy, users, and ubiquity. The primary purpose of CDSS functions were categorized into four groups: prediction, diagnosis, treatment optimization, and clinical workflow optimization. 14 XAI was determined from the included articles. If their methodology used an AI algorithm that maintained a high level of learning performance (prediction accuracy) and enabled human users to understand, appropriately trust, and effectively manage the emerging generation of AI partners, 25 we classified it as "explainable." Otherwise, they were categorized as "unexplainable." In ►Table 1, P.N. and M.S.K. categorized those AI algorithms into four types: supervised ML, semisupervised ML, unsupervised ML, and DL.

Identification of Eligible Articles
Our systematic searches identified 4,101 articles. There were 634 duplicate articles removed. The remaining 3,467 articles were screened using the inclusion criteria by titles, abstracts, and keywords. We excluded 1,973 articles based on exclusion criteria, which are articles proposing a new platform of CDSS without reporting results, technical reports of new algorithms without applications in medical research, and review papers. A full-text article assessment was conducted of 1,261 articles for eligibility. We removed 820 articles that were not related to the internal medicine specialty (►Fig. 1). Out of 441 eligible articles, we considered the top three subspecialties composing 49.4% in internal medicine-related articles, which were neurocritical care (n ¼ 89), cardiovascular disease (n ¼ 79), and medical oncology (n ¼ 50) (►Table 2). A total of 218 articles for these three subspecialties were further analyzed, and information was extracted to answer our RQ.
AI algorithms applied to CDSS for subspecialties in medicine had a wide range covering supervised ML, semisupervised ML, unsupervised ML, and DL. Although DL is a part of ML, we separated DL into a specific category because we wanted to compare the prevalence of DL applications to other types of algorithms. Of the 18 AI algorithms in ►Table 3, 85.8% were supervised ML, of which 79.5% of those algorithms were unexplainable AI. The majority of CDSS were developed for physician use (218, 96.9%), followed by patient use (4, 1.8%) and nurse use (3, 1.3%).
The trend of using AI algorithms has been changing over time, as shown in ►Figs.

Answers to Research Questions
After synthesizing findings from 218 included articles, we attempted to answer our RQs as follows: • RQ1: What is the frequency of applications regarding purposes of CDSS among prediction, diagnosis, treatment optimization, and clinical workflow optimization?
We grouped the purposes of CDSS into four categories: prediction, diagnosis, treatment optimization, and clinical workflow optimization. This review showed that the majority of CDSS were developed for prediction (48.4%) and diagnosis (47.1%) purposes. • RQ2: What is the frequency of applications regarding AI algorithms used in CDSS? There were wide ranges of AI algorithms used in medical research. After categorization, we found 18 different types of algorithms and the top five common algorithms among all subspecialties were SVM (20.9%), NN (14.6%), RF (10.5%), DL (9.2%), and DT (8.8%).
As the breadth of these data demonstrates, each model has its pros and cons and is potentially suited for different subspecialties (►Table 4). From our results, we found that  SVM and NN were common among those three subspecialties. The reason could be that SVM can handle multiple-class classification and small datasets. Moreover, SVM and NN are easier to use for prediction or classification and more stable than DT. However, the results from SVM and NN can be hard to explain. We also found that DL is more prevalent in neurocritical care and medical oncology than in cardiovascular disease. After further examination of the data modalities used in the original studies, we found that, in neurocritical care, several frequently applied data types are suitable for using DL, such as intracranial electroencephalogram, 48 facial video clips, 49 electroencephalogram, 50-58 and magnetic resonance imaging (MRI). [59][60][61][62][63][64][65][66][67][68][69][70] Similarly, in medical oncology, the DL method is mostly applied to the image data. 71,72 This is reasonable as images are used more in diagnosis in these two subspecialties than cardiovascular disease, and DL is suited to the analysis of image data, such as MRI, CT, positron emission tomography scans, and ultrasound images. • RQ3: What is the overall accuracy of those algorithms?
Accuracy is the percentage of correct predictions for the input data and is calculated by the number of correct predictions divided by the total number of predictions made. In a simple way, accuracy is the percentage of our model got right. 73 The accuracy of CDSS should be tested because inaccurate recommendations can endanger the safety or well-being of patients. 9 It is challenging to report the average accuracy of AI algorithms because various metrics have been used to measure accuracy in these articles. For the articles reporting accuracy scores, we found that the accuracy ranges of AI algorithms in neurocritical care, cardiovascular disease, medical oncology were 61.8 to 100%, 61.6 to 100%, and 54 to 100%, respectively.
Because of the inconsistency in reporting results of individual articles, it is particularly challenging to synthesize and report the results from included articles. To address this issue, Hernandez-Boussard et al 74 presented MINimum Information for Medical AI Reporting or MINIMAR to standardize the report on AI in health care. The standard report should satisfy four essential requirements: (1) study population and setting, (2) patient demographic characteristics, (3) model architecture, and (4) model evaluation. The study population and setting include population, study setting, data source, and cohort selection. The patient demographic characteristics are age, sex, race, ethnicity, and socioeconomic status. For the model architecture, researchers should report model output, target user, data splitting, gold standard, model task, model architecture, features, and missingness. The report should include optimization, internal model validation, external validation, and transparency for the model evaluation. This standard would help provide an accurate and responsible report on AI in health care.

Discussion
We conducted a scoping review to find evidence of applied AI algorithms in CDSS for internal medicine subspecialties. Accordingly, our study found that neurocritical care,    [75][76][77] Cancer is a major health problem worldwide and was the second leading cause of death in the United States in 2019. 78 This review provided significant value regarding CDSS using AI algorithms in internal medicine and globally major health problems. The volume of applied AI algorithms to solve medical problems has continuously increased from 2009 to 2019, with a substantial change in 2018 and 2019. We also observed a significantly growing number of articles involving DL from 2016 to 2019.

Explainability of AI Algorithms
This review shows that most articles have used unexplainable algorithms (79.5%). The use of unexplainable AI models has been debated and discussed in many articles, with an ongoing controversy in current medical practices. We believe that in the future, researchers should move forward applying XAI algorithms, which are AI algorithms that provide results that are understood by human experts. 25 AI explainability is examined primarily from a clinical point of view, highlighting the ability of humans to understand which clinical characteristics drive the prediction. This is important, as the main objective of clinical predictive modeling is the development of CDSS, assisting health professionals in their clinical decision-making, predicting diagnoses, risks, and results. 27,79 It is important to keep in mind that the requirements for CDSS go far beyond the performance of the model. 80 It is established that CDSS for the clinical environment needs to exhibit proven safety and accuracy. 80 The explainability of AI systems is crucial to understand why they do what they do, but more importantly, to understand why and when they may not do what is planned. This transparency is important in light of the growing awareness of potential biases in the models used for health discrimination. An XAI system is essential to provide: a safe interpretation and verification of the results acquired during development; better evaluation of the safety and justice of medical products, especially concerning bias, during the regulatory process; interpretation supported by domain knowledge leading to increased confidence on the part of doctors, other health professionals, and patients. The explainability of AI can help to increase the confidence of medical professionals in future AI systems.

Ubiquity and Usability
We identified information on developed CDSS for ubiquity, i.e., if the CDSS are made to appear anytime and everywhere. Some articles had developed ubiquity, such as software, Web-based tools, and mobile apps. The ubiquity includes neuroQWERTY platform, 81 Heart Failure Manager tool, 82 Chest Pain Rule Out (CPRO) Calculator, 83 the HEARTFAID platform, 84 PaDEL-Survival, 85 OncoMortality, 86 PrediWeb, 87 and The-Optimal-Lymph-Flow (TOLF). 88 Most of the included articles did not report about model applications.
In a CDSS, the outcome of the system can be related to the user interface directly. A successful CDSS should offer an efficient user interface to clinicians to get the most proper consultation results. Miller et al 89 described simplification as including only the elements that are most important for communication. Use of consistent terminology, concise and unambiguous language, and effective visualization improved usability and reduced information density. To improve usability, it is suggested to consider using appropriate font sizes, using meaningful colors, ensuring acceptable contrast between the text and background, and making the icons bold or larger. Space-filling techniques help to maximize the amount of information that can be displayed in the available display space. Visibility factors consider human factors and cognitive computing. A user-centered design process also should be considered during the CDSS development. The user-centered design aims to create the system based on user characteristics using interdisciplinary approaches of cognitive science, psychology, and computer science. 90,91 The user-centered design helps identify the potential deficiencies of CDSS, such as substantial variability in the usability, efficacy, and safety of CDSS. [92][93][94]

Study Limitations
Our study has several limitations. First, we conducted a scoping review, which did not require an assessment of methodological limitations or risk of bias of the evidence 95 ; however, we collected study design (►Appendix A), which can provide the level of evidence of individual studies. Second, we excluded non-English papers, which may constitute a selection bias. Last, we limited the year of publications based on EHR implementation in the United States and associated applications of AI-based CDSS, which may lead to publication bias. However, we believe that the findings of our review were able to answer our RQ.

Conclusion
With the continued advancement of medical techniques and devices, the size, variety, and complexity of data also continue to increase. Many ML and data mining methods have been used in the medical field to help with disease diagnosis, prediction, and treatment optimization. This demonstrates that AI can provide more accurate diagnostic results. We identified four potential research gaps to fill in from this study. First, we found that only 44 articles (20.1%) of the included articles have used XAI algorithms resulting in distrust from clinicians because of the lack of effectiveness and learning performance. We suggest future CDSS should increase the utilization of XAI algorithms, which can help to enhance trust and confidence in using the CDSS among clinicians. Second, we found that there was a lack of ubiquity among the reviewed articles. The CDSS should be available for users anytime and anywhere to make clinical decisions at the point of care; however, only 21 articles (9.6%) developed platforms (i.e., software, web-based tools, and mobile apps) that clinicians and patients can access. Most of the articles did not report the platform development or implementation. We suggest future CDSS should consider not only the model performance but also ubiquity improvement. The ubiquity will increase accessibility for clinicians and patients and lead to opportune use of CDSS in clinical practice. Third, the majority of CDSS were developed for physician users (96.9%). Developers should consider expanding the scope of target users and enhancing engagement in shared decision-making among health care providers and patients to achieve the delivery of patient-centered care. Last, we observed a lack of standardized reporting structure in AI-based CDSS that resulted in inconsistent data extraction. The reviewed articles did not follow the MINIMAR standards when they reported information and failed to provide an accurate, unbiased, and meaningful report. We suggest future articles related to AI in health care should report information following the MINIMAR standards.
Although there are many studies showing the success of using CDSS in health care management, implementation is a significant challenge because of unreliability and inability to exchange EHR data between systems, unfriendly user interfaces, limited choices of implementation and workflow, and technical issues. 96,97 Moreover, in the real world, EHR data can be inaccurate, unreliable, transformed, and insufficient. [98][99][100][101] Hence, the quality of data is an important challenge for applied AI in medicine.

Clinical Relevance Statement
This scoping review showed the trends of utilizing AI algorithms in CDSS for subspecialties in internal medicine between 2009 and 2019. The most frequent numbers of articles related to CDSS using AI algorithms among internal medicine subspecialties were neurocritical care, cardiovascular disease, and medical oncology. This review showed a substantial change in utilizing DL in published articles in 2018 and 2019. This review indicated four potential gaps of CDSS development: the need for AI explainability, the lack of ubiquity of CDSS, the narrow scope of target users, and the need for AI in health care report standards.

Funding
None.