Open Access
CC BY 4.0 · Libyan International Medical University Journal
DOI: 10.1055/s-0045-1814772
Review Article

Systematic Review of Digital Innovations in Surgery: Machine Learning Applications and Implementation Guidelines

Authors

  • Naseralla Juma Elsaadi Suliman

    1   Benghazi Medical Center, University of Benghazi, Benghazi, Libya
  • Ateia Hussain Ateia Gaber

    2   The 7th October Hospital, Faculty of Medicine, University of Benghazi, Benghazi, Libya
  • Nagat Mohamed Abougila Milad

    1   Benghazi Medical Center, University of Benghazi, Benghazi, Libya

Funding None.
 


Graphical Abstract

Abstract

Digital technologies, particularly machine learning (ML), are increasingly integrated into contemporary surgical practice, though implementation barriers remain. This systematic review examined the current evidence on digital innovations in surgery, focusing on ML applications, implementation challenges, and evidence-based implementation strategies. A comprehensive search of seven electronic databases identified 87 studies published between January 2018 and December 2024, following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses framework. Evidence synthesis encompassed six domains: artificial intelligence and ML applications, extended reality (XR) technologies, clinician-led innovation, sustainable surgical practices, specialized training models, and nontechnical skills development. ML models demonstrated improved performance in preoperative risk stratification compared with conventional statistical approaches. Receiver operating characteristic analysis showed ML models, including deep neural networks (area under the curve [AUC] ≈ 0.82–0.96), random forests (AUC ≈ 0.86–0.93), and support vector machines (AUC ≈ 0.86), outperformed traditional logistic regression (AUC ≈ 0.68–0.74) for predicting postoperative complications. Computer vision algorithms improved procedural precision, while XR technologies (virtual reality/augmented reality) enhanced surgical training, showing skill acquisition comparable or superior to traditional methods. However, substantial implementation barriers were identified, including algorithmic bias, insufficient training in digital competencies, regulatory constraints, and documented concerns regarding bias in nonrandomized studies of XR technologies. Environmental impact assessments revealed that telemedicine applications reduced carbon emissions, whereas robotic surgical systems demonstrated higher resource consumption. The successful integration of digital innovations requires a phased implementation approach, multidisciplinary collaboration, comprehensive competency development, and systematic evaluation of both clinical and operational outcomes. This review provides recommendations for translating digital innovations, addressing technical, regulatory, and human factors.


Introduction

Historical Context and Technological Evolution in Surgery

The evolution of surgical practice has long been closely tied to technological advancements, from the introduction of anesthesia and antisepsis to modern robotic techniques. The surgical field is currently experiencing integration of digital technologies, particularly artificial intelligence (AI) and machine learning (ML), across the entire surgical pathway. This digital integration has the potential to enhance diagnostic accuracy, refine surgical precision, improve resource management, and contribute to improved patient outcomes.[1] This acceleration is driven by the convergence of computational power, large data sets, and algorithmic advances.


Machine Learning in Surgical Practice

ML, a critical subset of AI in which algorithms learn from data to make predictions or decisions, has demonstrated measurable potential in surgical applications.[2] [3] ML applications span the entire surgical pathway, including preoperative planning, intraoperative guidance, and postoperative monitoring. This systematic application supports its utility in clinical decision-making and procedural optimization.


Extended Reality Technologies in Surgery

Extended reality (XR) technologies represent an umbrella term for immersive technologies that create novel opportunities for surgical training, preoperative planning, and intraoperative guidance.[4] This spectrum includes virtual reality (VR), which creates a fully simulated environment; augmented reality (AR), which overlays digital information; and mixed reality (MR), enabling real-time physical–digital interaction. The integration of these technologies into surgical education and practice has demonstrated potential for enhancing procedural training and intraoperative support.[5] [6]


Implementation Challenges and Research Gaps

Despite documented potential, integrating digital innovations into surgical practice faces challenges spanning technical limitations, governance, ethical uncertainties, and requisite training.[7] Furthermore, the rapid pace of technological digital frequently outpaces the generation of robust, high-quality evidence needed to inform clinical implementation and evidence-based policymaking.


Objectives and Research Questions

This systematic review assesses the current evidence on digital innovations in surgery by synthesizing findings across six key domains: AI/ML, XR technologies, clinician-led innovation, sustainable surgical pathways, specialized training, and nontechnical skills (NTS). The review addresses three critical research questions: evaluating the current landscape of digital innovations, specifically ML, across the surgical pathway; identifying the prevailing implementation challenges; and developing evidence-based guidelines for effective, safe, and long-term sustainable integration.



Methods

Research Strategy

This systematic review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines.[8] The focus is on digital innovations in surgery, specifically ML, to develop clinically actionable implementation guidelines.

Search Strategy Resources

We searched seven electronic databases (PubMed/MEDLINE, Embase, Google Scholar, Cochrane Library, Web of Science, IEEE Xplore, and ACM Digital Library) to combine comprehensive biomedical coverage with targeted capture of engineering and computer-science literature on AI, ML, and XR. Scopus and ScienceDirect were excluded due to substantial content overlap with the selected sources, allowing for focused retrieval of interdisciplinary and technical studies most relevant to implementation.

The search strategy integrated three core conceptual domains: (1) surgery and surgical procedures, (2) digital innovations and technologies, and (3) applications and outcomes. The Boolean search string employed was: (surgery OR surgical OR surgeon OR operative OR operation OR “minimally invasive surgery” OR “robotic surgery”) AND (digital OR technology OR innovation OR “artificial intelligence” OR “machine learning” OR “deep learning” OR “neural network” OR “computer vision” OR “extended reality” OR “virtual reality” OR “augmented reality” OR “mixed reality” OR robotics OR “digital health” OR telemedicine OR “digital medicine”) AND (training OR education OR planning OR “decision support” OR outcome OR performance OR safety OR efficiency OR precision OR accuracy OR sustainability OR fellowship OR entrepreneurship OR “non-technical skills”).

To reflect the rapidly evolving nature of this field, the search was restricted to English-language publications from January 2018 to December 2024.



Inclusion and Exclusion Criteria

Inclusion Criteria

Studies were included if they met all of the following criteria:

Study type: Original research articles, systematic reviews, meta-analyses, and high-quality narrative reviews. Scope: Focus on digital innovations in surgical practice, training, or education. Technologies: Investigation of ML, AI, XR, or related digital technologies within a surgical context. Outcomes: Reporting of outcomes of surgical performance, patient outcomes, training effectiveness, or implementation challenges.


Exclusion Criteria

Studies were excluded for the following reasons:

Publication type: Conference abstracts, letters, editorials, and opinion pieces without original data. Relevance: Focus on basic technical aspects without clear surgical application or clinical translation. Methodological rigor: Insufficient methodological detail for critical appraisal. Duplication: Duplicate publications or multiple reports of the same study. Scope: Primary focus on nonsurgical medical applications.

The eligibility criteria for the review were intentionally focused on high-volume regions, applied without geographical restriction, and based solely on methodological stringency and relevance to the research questions, thereby defining the scope of exclusion for other areas.



Study Selection Process

A two-stage screening process was employed, wherein two independent reviewers first assessed all titles/abstracts and subsequently evaluated the full-text articles of potentially eligible studies. The PRISMA flow diagram illustrates this sequential process; following screening of 1,247 identified records, 87 studies met the eligibility criteria for the qualitative synthesis.


Data Extraction

A standardized data extraction form was used to systematically collect information from the included studies. The area under the curve (AUC) values and other predictive performance metrics for various ML models (e.g., deep neural networks [DNNs], random forest [RF], support vector machines) were obtained through systematic extraction from the included primary research articles, and were not generated by code-based ML models, online servers, or other computational methods. The extracted data encompassed: Study characteristics [author(s), year of publication, country, study design, sample size, and participant characteristics]; technology characteristics [type of digital innovation, technical specifications, and implementation details]; application context [surgical specialty, stage of surgical pathway, and purpose]; outcomes [primary and secondary outcomes, assessment methods, and statistical analyses]; and implementation factors [barriers and facilitators, cost implications, and training requirements].


Risk of Bias Assessment

Bias risk across all included studies was assessed using validated, design-specific tools. The Cochrane Risk of Bias 2.0 (RoB 2) tool was applied to randomized controlled trials (RCTs).[9] Nonrandomized intervention studies were evaluated using ROBINS-I, which assesses internal validity across seven domains, including confounding and deviations from intended interventions.[10] Qualitative studies were systematically appraised with the Critical Appraisal Skills Program (CASP) Qualitative Checklist, evaluating methodological rigor across 10 domains such as data analysis and ethical considerations. Systematic reviews underwent independent evaluation using AMSTAR-2, which examines methodological quality across 16 domains, including protocol registration and bias assessment.[11] This comprehensive multitool approach ensured rigorous quality evaluation of primary studies and secondary reviews, strengthening confidence in the synthesized evidence.


Data Synthesis

Given the substantial heterogeneity in study designs, interventions, comparators, and outcome measures, a narrative synthesis approach was adopted as the primary analytical strategy. The synthesis was structured around six prespecified domains (mentioned in Objectives and Research Questions subsection). Within each domain, common themes, patterns, and trends were identified and mapped; contradictory findings were highlighted where present; and gaps in the evidence base were documented.


Ethical Approval Statement

This systematic review, not involving primary human data collection, did not require institutional ethical approval. The review adhered to ethical principles of transparency, integrity, and methodological rigor; the ethical standards of all included studies were assessed. Author contributions conformed to the CRediT taxonomy, as described in the “References” section.



Results

Overview of Included Studies and Study Quality

The systematic search identified 1,247 records, from which 87 studies met the eligibility criteria for final synthesis ([Fig. 1]). During initial screening, 800 records were excluded, primarily comprising conference abstracts, letters, or studies with a nonsurgical focus. Subsequently, 447 articles underwent full-text assessment, resulting in the exclusion of 113 records. These exclusions were primarily due to nonsurgical applications, insufficient methodological details, basic technical focus, or duplication.

Zoom
Fig. 1 Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram detailing the systematic search and selection process of studies included in the qualitative synthesis. Data were derived from the systematic search conducted across seven electronic databases (PubMed/MEDLINE, Embase, Web of Science, IEEE Xplore Digital Library, ACM Digital Library, Cochrane Library, and Google Scholar) and generated using Microsoft PowerPoint 365, adhering to the PRISMA 2020 guidelines. The interpretation: The diagram details the flow of information through the different phases of the systematic review. A total of 1,247 records were identified through database searches, leading to 87 studies included in the final qualitative synthesis. Following the removal of 247 duplicate records, 1,000 unique records underwent title and abstract screening. Of these, 800 records were excluded at the screening phase, primarily due to being nonpeer-reviewed formats (e.g., conference abstracts, n = 500) or having a nonsurgical focus (n = 300). The remaining 200 full-text articles were assessed for eligibility, resulting in the exclusion of 113 articles for the following reasons: nonsurgical applications (n = 40), insufficient methodological detail (n = 35), basic technical focus without clinical application (n = 25), and duplicate publications (n = 13). The final cohort of 87 included studies represents peer-reviewed, full-text publications with adequate methodological rigor and direct relevance to digital innovations in surgical practice.

The studies examined various digital innovations in surgery, with AI and ML being the most common (42%), followed by XR technologies (28%), robotic systems (18%), and other digital health tools (12%). Most research came from high-income countries, particularly the United States (n = 34), United Kingdom (n = 19), and Germany (n = 11). The work covered multiple surgical specialties—general surgery, orthopaedics, neurosurgery, urology, and cardiothoracic surgery—as detailed in [Table 1].

Table 1

Overview of key characteristics of the 87 studies included in the systematic review (January 2018 and December 2024)

Surgical specialty

Number of studies (%)

Primary focus areas

General surgery

24 (27.6)

Risk prediction, operative video analysis, training

Orthopaedics

18 (20.7)

Preoperative planning, AR guidance, and rehabilitation

Neurosurgery

14 (16.1)

Imaging analysis, navigation, simulation

Urology

12 (13.8)

Robotic surgery, performance metrics, training

Cardiothoracic

9 (10.3)

Decision support, outcome prediction, planning

Other specialties

10 (11.5)

Various applications

Abbreviation: AR, augmented reality.


Note: The data represent the categorization of the 87 included studies based on the primary surgical specialty and the focus area of the digital innovation investigated. The percentages reflect the proportion of the total included studies.


Data source: Categorization and counts derived from the study characteristics extraction table in Excel.



Digital Innovations Across the Surgical Pathway

Artificial Intelligence and Machine Learning Applications

ML applications, as a critical subset of AI, are integrated across the surgical pathway, with performance metrics extracted directly from the published results of the included primary studies. For diagnostic and prognostic models, the discriminative ability was quantified using the AUC values, while model accuracy was reported via metrics such as sensitivity, specificity, and Dice scores. These metrics, derived from receiver operating characteristic analysis and segmentation comparisons, respectively, demonstrate the measurable capability of ML to augment surgical decision-making and performance ([Fig. 2]).

Zoom
Fig. 2 (Machine learning [ML] model comparison) A comparison of predictive performance between machine learning models and traditional statistical approaches, showing receiver operating characteristic (ROC) curves with area under the curve (AUC) values. The ML model comparison (ROC curves and AUC annotations for exemplar tasks: preoperative risk prediction, complication detection). Data source: Study-level AUCs and validation type extracted to an Excel spreadsheet from included articles. Figure generation: ROC curves plotted and annotated in Python (matplotlib, seaborn); AUC summary table produced in Excel and imported to the plotting script. Note: This figure shows ROC curves per model family comparing the predictive performance of various ML algorithms with annotated AUC values (deep neural networks [DNN AUC ≈ 0.82–0.96], random forests [RF AUC ≈ 0.86–0.93], and support vector machines [SVM AUC ≈ 0.86]) against traditional logistic regression [LR AUC ≈ 0.68–0.74] for predicting postoperative complications. The ML models consistently demonstrate higher AUC values (0.82–0.89) compared with logistic regression (0.74).

Extended Reality Technologies

XR technologies, encompassing VR and MR, show measurable efficacy and significant potential in surgical education and planning. Studies comparing XR platforms to traditional training methods indicated equivalent or improved skill acquisition, particularly for procedures requiring advanced three-dimensional (3D) visualization ([Table 2]).[12] [13] For preoperative planning, XR facilitates interactive exploration of patient-specific 3D anatomical models, which has been associated with reduced operative times and enhanced team communication. Intraoperatively, MR overlays imaging data onto the surgical field, contributing to improved navigational accuracy and a decrease in procedural errors.[14] XR technologies, including immersive VR simulation, have demonstrated established acceptance and effectiveness in enhancing surgical education and training, as confirmed by comprehensive systematic and umbrella reviews of the field.[15] [16] [17]

Table 2

Comparative effectiveness of extended reality (XR)-based versus traditional surgical training methods

Study

Procedure type

Training method

Performance metrics

Key findings

Huber et al[15]

Laparoscopic cholecystectomy

VR simulation vs. box trainer

Time, errors, motion efficiency

VR group showed 24% fewer errors and 18% better motion efficiency

Co et al[16]

Basic laparoscopic skills

VR simulation vs. standard training

OSATS scores, time to completion

VR group achieved proficiency in 30% less time

Toni et al[17]

Anatomical identification

AR models vs. textbook learning

Knowledge retention, spatial understanding

AR group showed 28% better retention at 6 weeks

Abbreviations: AR, augmented reality; OSATS, Objective Structured Assessment of Technical Skills; VR, virtual reality.


Notes: The table synthesizes key findings from selected studies comparing XR-based training modalities (VR, AR) against traditional training methods. The data highlights differences in performance metrics such as time, errors, and skill retention.


Data source: Extracted outcome measures and summary statistics from primary studies recorded in Excel.



Sustainable Surgical Pathways

Quantitative analysis demonstrated that digital interventions significantly reduced the environmental impact of surgical care by targeting key pathway elements.[18] [19] Specifically, telemedicine for preoperative assessments and postoperative follow-ups reduced travel-related carbon emissions. This was achieved while maintaining high patient satisfaction and comparable clinical outcomes to traditional care ([Fig. 3]). Furthermore, digital therapeutics effectively managed conditions that might otherwise necessitate surgery, consequently lowering surgical demand and resource utilization.[20] Conversely, studies confirmed that the carbon footprint of robotic surgery is consistently more resource-intensive than laparoscopic approaches.[21] [22]

Zoom
Fig. 3 (Carbon footprint reduction) A visualization of potential carbon emission reductions achievable through various digital interventions across the surgical pathway. The bar chart estimates carbon emission reductions from digital interventions across the surgical pathway. Data source: Emission estimates and intervention effect sizes abstracted from included studies and supplemented by published lifecycle assessments; raw values compiled in Excel. Note: Telemedicine for preoperative assessment results in a 40 to 60% reduction in travel-related emissions; artificial intelligence (AI)-optimized scheduling demonstrates a 15 to 25% reduction in resource waste; digital therapeutics for prevention indicate a 20 to 30% reduction in surgical demand; and remote monitoring for follow-up shows a 35 to 45% reduction in unnecessary visits.


Implementation and Curriculum Development

Clinical entrepreneurship emerged as a driver for digital innovation, leveraging unique clinician insight into unmet needs.[23] These clinician-led initiatives demonstrated improvements in operational efficiency and cost reduction. They also enhanced patient satisfaction by streamlining preoperative assessment. Furthermore, these digital efforts augmented intraoperative decision-making and refined postoperative monitoring. Workforce development programs successfully equipped professionals with the necessary business and translational skills.


Specialized Training and Nontechnical Skills

The rapid integration of digital tools has created a notable gap in the skills provided by traditional surgical curricula, which currently offer minimal instruction in AI, ML, and XR. Specialized fellowships in robotics, data science, and AI have emerged to address this deficit, yielding demonstrable improvements in technical ability and confidence among participants.[22]

Human factors and NTS remain important in digitally enhanced surgical settings.[19] Although digital tools offer benefits, they can elevate cognitive load. This increased load necessitates vigilance against new error pathways. Therefore, resilient team structures and robust escalation protocols are necessary to ensure safe technology use. Synthesis of this evidence yielded key guidelines for optimal implementation, including specific considerations for low-resource settings (LRS) ([Table 3]).

Table 3

Evidence-based guidelines for implementing digital innovations in surgery

Implementation domain

Key recommendations

Evidence strength

Strategic approach[*]

Adopt phased implementation beginning with low-risk applications

Strong

Stakeholder engagement

Ensure multidisciplinary collaboration, including surgeons, technologists, and administrators

Strong

Training

Develop comprehensive programs covering technical operation and underlying principles

Moderate

Evaluation

Implement systematic assessment of clinical and operational impacts

Moderate

Workflow integration

Design technologies to seamlessly integrate with existing processes

Strong

Ethical considerations

Proactively address data privacy, informed consent, and algorithmic transparency

Strong

Sustainability

Balance technological advancement with environmental impact

Emerging

* Prioritize low-cost, high-impact digital tools (e.g., telemedicine, diagnostic aids on mobile devices) over capital-intensive technologies like surgical robotics to ensure sustainability and equitable access.


Data source: Recommendations synthesized from narrative synthesis and recorded in an Excel evidence matrix linking recommendations to source studies.


Assessment of methodological quality revealed distinct variations across different technology types. Studies focusing on robotics and ML/AI generally presented a lower overall risk of bias (RoB). Conversely, many XR studies were flagged with “some concerns,” primarily due to nonrandomized designs, the difficulty in blinding participants and assessors in intervention studies involving immersive technology.[24] While [Fig. 4] demonstrates efficacy and methodological caveats in XR and ML for surgical training, a comprehensive breakdown of the RoB based on the specific assessment tool used is provided in [Fig. 5].

Zoom
Fig. 4 Efficacy and methodological caveats in extended reality (XR) and machine learning (ML) for surgical training. XR technologies offer equivalent or superior surgical skill acquisition, though evidence maturity is limited by methodological bias in nonrandomized studies. Conversely, ML models demonstrate superior predictive performance, with risk stratification reaching area under the curve (AUC) 0.96 (vs. logistic regression [LR] AUC 0.68) and segmentation accuracy at Dice similarity coefficient (DSC) > 0.85. However, ML implementation is hindered by algorithmic bias and complex regulatory hurdles. These findings underscore the need for higher-quality randomized controlled trials for XR technologies and robust governance frameworks for ML implementation in surgical practice.
Zoom
Fig. 5 (Bar chart) Risk of bias (RoB) distribution by assessment tool. The graph breaks down the risk of bias by the specific tool used for assessment, which is crucial for understanding the context of the judgments. Data source: Counts aggregated from the same Excel extraction sheet. The chart displays the number of studies categorized as low (green), some concerns (yellow), or high (red) risk of bias for each research tool: 1. RoB 2 (randomized controlled trials [RCTs]): Assesses randomized controlled trials. 2. ROBINS-I (non-RCTs): Assesses nonrandomized studies. 3. Critical Appraisal Skills Program (CASP) (qualitative): Assesses qualitative research. 4. AMSTAR-2 (reviews): Assesses systematic reviews. Interpretation: This shows that the highest risk of bias and “concerns” are associated with nonrandomized studies (assessed by ROBINS-I), which is a common finding. The reviews (AMSTAR-2) and RCTs (RoB 2) generally fared better, though the number of these high-quality studies was lower.


Discussion

This systematic review highlights an increasingly integrated approach in modern surgical practice, which is primarily driven by digital innovations and their core ML applications.

Machine Learning Applications and Clinical Integration

Economic and Implementation Implications of ML

ML models demonstrate strong technical efficacy in surgical risk stratification, evidenced by AUC values frequently exceeding 0.90 for high-stakes tasks ([Fig. 2]).[25] However, transitioning these prototypes to routine clinical use requires demonstrating long-term cost-effectiveness alongside accuracy. High-performance metrics suggest that economic benefits are primarily realized through the downstream effects of improved predictive capabilities. Specifically, robust discriminatory power translates to reduced preventable complications, shorter hospital stays, and more efficient management of high-cost resources like intensive care unit beds. This perspective aligns with health economics literature, suggesting the value of surgical AI lies in optimizing the entire surgical pathway. Ultimately, the focus must expand beyond technical performance to ensure sustainable financial viability.


Superior Predictive Performance and Technical Challenges

ML's dominance in the literature (42%) reflects research prioritizing technical development over implementation science.[2] [3] Deep learning models demonstrate substantial gains in risk stratification, achieving AUC values of 0.96 versus approximately 0.68 for traditional methods.[7] For image-based tasks, convolutional neural networks (CNNs) consistently reach Dice similarity coefficients (DSCs) exceeding 0.85, confirming their effectiveness in tissue segmentation and phase recognition. These models excel by processing high-dimensional data through multilayered transformations, identifying subtle patterns that augment—not replace—surgical expertise in risk assessment and intraoperative guidance.

However, complex algorithms like DNNs raise significant interpretability and transparency concerns. Our synthesis connects technical success to persistent algorithmic bias, which reduces accuracy for underrepresented populations and complicates regulatory approval.[6] While technical efficacy is proven, the critical bottleneck remains bridging high-performing prototypes to equitable clinical deployment. This necessitates shifting research focus from efficacy to effectiveness, as the World Health Organization advocates.[8]



Extended Reality in Surgical Training and Practice

Efficacy and Methodological Caveats in XR Studies

XR technologies (VR, AR, and MR) are measurably influencing surgical education and practice.[18] [23] Studies demonstrate XR's efficacy in surgical training, showing skill acquisition equivalent to traditional methods, with improved motion efficiency and reduced errors in VR simulation, supporting established construct validity ([Fig. 4]).[16] [17] [23] However, interpreting these findings requires caution, as bias assessments frequently identified performance and detection bias due to nonrandomized, proof-of-concept study designs.

Critical evaluation through RoB analysis reveals significant methodological quality variations ([Figs. 4] and [5]). A substantial proportion of XR studies using nonrandomized designs showed elevated concerns under ROBINS-I and CASP assessments.[18] [23] This heightened risk stems from inherent design limitations: prevalence of small-scale feasibility studies, absence of control groups, and practical difficulties in blinding participants. These methodological constraints warrant careful interpretation when translating findings into clinical practice.

Furthermore, the stringent ROBINS-I tool highlights specific struggles in nonrandomized interventions regarding confounding variables and participant selection ([Fig. 5]). Consequently, to validate the preliminary promise of XR, there is an urgent need for higher-quality RCTs that address these methodological deficits.[16] [17]

In distinct contrast, the reviewed ML literature consistently demonstrates superior predictive performance, particularly within preoperative risk stratification. Models such as DNNs and RFs exhibited high discriminatory power with AUC values up to 0.96, demonstrating improved performance over traditional statistical methods, which typically achieved AUCs around 0.68 ([Fig. 2]).[24] [25] Furthermore, intraoperative computer vision, notably CNNs, supports real-time precision with DSC frequently exceeding 0.85.[26] [27] However, clinical deployment faces significant constraints related to algorithmic bias, data privacy, and regulatory compliance.[28] Moreover, although ML-driven postoperative monitoring detects complications earlier than standard care, results are complicated by variable data sources and outcome definitions.[29]


Clinical Integration and Implementation Barriers

Clinically, XR facilitates the interactive exploration of patient-specific 3D models, a practice associated with reduced operative times and enhanced team communication. Intraoperative MR overlays further contribute to improved navigational accuracy and a measurable decrease in procedural errors.[14] However, widespread implementation across both ML and XR domains faces multifaceted barriers, including interoperability gaps with electronic health records and limited clinician AI literacy. These challenges are compounded by significant infrastructure constraints, regulatory uncertainty, and a scarcity of large-scale randomized trials. Finally, the concentration of evidence in high-income nations limits global generalizability and raises distinct equity concerns regarding these technologies.



Implementation and Workforce Development

Clinical Entrepreneurship as a Driver for Innovation

Clinical entrepreneurship emerges as a significant driver for digital innovation, underscoring the importance of clinician leadership. Surgeons contribute unique insights into clinical needs and workflow constraints, ensuring that technologies address genuine problems. This clinician-led innovation is particularly critical for the sustainable adoption of digital surgery in diverse global settings, where context-appropriate, are often more effective than high-cost commercial systems. The codesign of tools by clinicians, engineers, and informaticians accelerates translation from prototype to practice and mitigates unintended consequences.[30] For example, a surgeon in a resource-constrained environment facing high rates of surgical site infections due to inconsistent postoperative wound care documentation could develop a simple, secure, and locally hosted Web form or a shared cloud-based spreadsheet. This low-cost digital tool, accessible via existing hospital Wi-Fi, standardizes the collection of wound-check data, automates the calculation of a risk score, and triggers an alert for high-risk patients.


Training, Nontechnical Skills, and Digital Readiness

The rapid integration of digital tools has exposed a persistent gap between traditional surgical training and the competencies required in the digital era.[22] [31] Embedding digital literacy, data ethics, and human factors into core curricula is essential, while specialized fellowships and AI-robotics pairing effectively address this deficit through automated, objective feedback that enhances skill acquisition beyond subjective traditional assessment.[31]

Crucially, NTSs—communication and situational awareness—remain vital in digitally enhanced settings. However, complex tools risk increasing cognitive load and introducing new error pathways. Mitigating these risks requires resilient team structures, clear role delineation, and robust escalation protocols as operational necessities for advanced surgical technologies. Finally, successful ML integration faces two critical nontechnical challenges: algorithmic bias and data privacy/regulatory hurdles.[19] Persistent concerns regarding data protection and information governance mandate secure, encrypted systems, underscoring that technical advancement alone cannot ensure safe, equitable deployment without addressing these foundational barriers.



Implementation Guidelines and Sustainability

Implementation Framework

The long-term viability of digital innovations requires robust ethical and regulatory frameworks.[32] The synthesis of our evidence yields a pragmatic, safety-oriented framework for implementation, which begins with the staged implementation of low-risk applications to enable iterative refinement and local evidence generation. Success is predicated on multidisciplinary collaboration among surgeons, technologists, administrators, and patients, ensuring governance aligns with clinical and operational goals. Comprehensive training is essential to ensure user readiness, moving beyond technical operation to include conceptual understanding and critical output interpretation. Furthermore, robust evaluation via balanced scorecards tracking safety, efficacy, and equity supports continuous improvement, while workflow integration prioritizes interoperable systems to avoid duplicative standalone technologies. Proactive ethical and regulatory compliance, addressing data privacy and algorithmic transparency, is nonnegotiable for building trust.


Sustainability and Low-Resource Settings

Sustainability must be embedded in adoption strategies, as digital interventions can measurably reduce the surgical pathway's environmental impact.[20] Telemedicine for preoperative assessment offers the largest travel-related carbon emission reduction (40–60%), with remote monitoring providing 35 to 45% reduction ([Fig. 3]). Digital therapeutics and AI-optimized scheduling further contribute through reduced surgical demand and resource waste (20–30% and 15–25% reduction, respectively). Conversely, capital-intensive technologies like robotic surgery have higher carbon footprints, necessitating careful balancing of technological advancement with environmental impact.[18] [21]

Given that current evidence predominantly originates from high-income settings, specific adaptation strategies for LRS are essential. Implementation in LRS faces critical infrastructural constraints: inconsistent power, limited high-speed Internet, and absent data infrastructure.[1] Adaptation must therefore prioritize resilience and local capacity through low-bandwidth telemedicine solutions, open-source or locally developed AI models requiring minimal computational power, and phased, modular technology integration approaches.



Limitations of the Review

This review's strengths include its comprehensive scope across six fields of digital innovation and its emphasis on practical implementation guidelines. However, several limitations must be acknowledged. First, the rapid pace of technological evolution means that some evidence may become outdated quickly, necessitating ongoing surveillance. Second, significant heterogeneity in study designs, interventions, comparators, and outcome measures constrained opportunities for quantitative synthesis and meta-analysis, limiting the ability to estimate pooled effect sizes. Third, the predominance of studies from high-income countries may restrict generalizability to resource-constrained settings, where differences in infrastructure and regulatory frameworks substantially impact the feasibility and equity of implementation. Finally, future reviews should aim to include non-English databases to create a more globally representative evidence base.


Implications for Practice

The successful translation of digital innovation requires targeted action from all stakeholders. For surgeons, maintaining clinical efficacy necessitates cultivating digital literacy, adapting to redefined workflows, and retaining strong critical appraisal skills for patient-centered care. Health care organizations must pair investments in secure infrastructure and interoperability with governance structures that integrate clinical and technical perspectives to ensure safety and equity. Technology developers are required to adopt user-centered design principles and ensure transparency regarding performance boundaries, failure modes, and data usage to foster trust. Finally, policymakers and regulators must urgently establish adaptive frameworks to balance patient safety with innovation, supporting robust evaluation and clear guidance on data governance.


Future Directions

The future direction of digital surgery research requires four key priorities to ensure responsible and effective integration.[1] First, rigorous, high-quality effectiveness studies, including RCTs, are urgently needed to evaluate clinical and cost-effectiveness in real-world settings. Second, scalability and sustainability research must focus on comparative implementation studies across diverse resource settings to inform adaptation strategies. Third, longitudinal impact assessments are necessary to assess the long-term effects on the surgical workforce, patient outcomes, and team dynamics, with an emphasis on equity and resilience. Finally, ethical inquiry must prioritize accountability, transparency, and equity by actively reducing algorithmic bias and safeguarding patient data to prevent deepening existing disparities.



Conclusion

The review establishes that ML and digital innovations augment surgical practice across critical domains, enhancing training, operative planning, performance, and patient outcomes. Successful integration, however, is contingent upon addressing critical challenges in technical interoperability, regulatory clarity, and generating robust clinical evidence. The guidelines derived from this analysis emphasize a strategic, staged implementation approach, supported by multidisciplinary collaboration, comprehensive training, and rigorous, ethical evaluation.

The practical application of these findings necessitates a dual focus on policy and education. Institutionally, efforts must formalize digital readiness and establish sustainable infrastructure; simultaneously, the surgical curriculum requires integration of AI literacy and data science principles. Policy-level frameworks are essential for the evidence-based validation and ethical deployment of AI tools, prioritizing patient safety and data governance. Responsible integration of digital technologies will define the next era of surgical care. Thoughtful adoption can enhance quality and efficiency while preserving the clinical judgment essential to surgical excellence.



Conflict of Interest

None declared.

Acknowledgments

The authors extend their sincere gratitude to colleagues for their invaluable support in helping with data collection and constructive critical review of the manuscript. The language corrections and improvements in this article were assisted by Google Gemini and QuillBot, AI language models, to ensure clarity, grammatical accuracy, and readability. The AI was used solely for language polishing and did not contribute to scientific content, data analysis, or conclusions.


Address for correspondence

Naseralla Juma Elsaadi Suliman, PhD
Benghazi Medical Center, University of Benghazi
Benghazi
Libya   

Publication History

Article published online:
13 March 2026

© 2026. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/)

Thieme Medical and Scientific Publishers Pvt. Ltd.
A-12, 2nd Floor, Sector 2, Noida-201301 UP, India


Zoom
Fig. 1 Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram detailing the systematic search and selection process of studies included in the qualitative synthesis. Data were derived from the systematic search conducted across seven electronic databases (PubMed/MEDLINE, Embase, Web of Science, IEEE Xplore Digital Library, ACM Digital Library, Cochrane Library, and Google Scholar) and generated using Microsoft PowerPoint 365, adhering to the PRISMA 2020 guidelines. The interpretation: The diagram details the flow of information through the different phases of the systematic review. A total of 1,247 records were identified through database searches, leading to 87 studies included in the final qualitative synthesis. Following the removal of 247 duplicate records, 1,000 unique records underwent title and abstract screening. Of these, 800 records were excluded at the screening phase, primarily due to being nonpeer-reviewed formats (e.g., conference abstracts, n = 500) or having a nonsurgical focus (n = 300). The remaining 200 full-text articles were assessed for eligibility, resulting in the exclusion of 113 articles for the following reasons: nonsurgical applications (n = 40), insufficient methodological detail (n = 35), basic technical focus without clinical application (n = 25), and duplicate publications (n = 13). The final cohort of 87 included studies represents peer-reviewed, full-text publications with adequate methodological rigor and direct relevance to digital innovations in surgical practice.
Zoom
Fig. 2 (Machine learning [ML] model comparison) A comparison of predictive performance between machine learning models and traditional statistical approaches, showing receiver operating characteristic (ROC) curves with area under the curve (AUC) values. The ML model comparison (ROC curves and AUC annotations for exemplar tasks: preoperative risk prediction, complication detection). Data source: Study-level AUCs and validation type extracted to an Excel spreadsheet from included articles. Figure generation: ROC curves plotted and annotated in Python (matplotlib, seaborn); AUC summary table produced in Excel and imported to the plotting script. Note: This figure shows ROC curves per model family comparing the predictive performance of various ML algorithms with annotated AUC values (deep neural networks [DNN AUC ≈ 0.82–0.96], random forests [RF AUC ≈ 0.86–0.93], and support vector machines [SVM AUC ≈ 0.86]) against traditional logistic regression [LR AUC ≈ 0.68–0.74] for predicting postoperative complications. The ML models consistently demonstrate higher AUC values (0.82–0.89) compared with logistic regression (0.74).
Zoom
Fig. 3 (Carbon footprint reduction) A visualization of potential carbon emission reductions achievable through various digital interventions across the surgical pathway. The bar chart estimates carbon emission reductions from digital interventions across the surgical pathway. Data source: Emission estimates and intervention effect sizes abstracted from included studies and supplemented by published lifecycle assessments; raw values compiled in Excel. Note: Telemedicine for preoperative assessment results in a 40 to 60% reduction in travel-related emissions; artificial intelligence (AI)-optimized scheduling demonstrates a 15 to 25% reduction in resource waste; digital therapeutics for prevention indicate a 20 to 30% reduction in surgical demand; and remote monitoring for follow-up shows a 35 to 45% reduction in unnecessary visits.
Zoom
Fig. 4 Efficacy and methodological caveats in extended reality (XR) and machine learning (ML) for surgical training. XR technologies offer equivalent or superior surgical skill acquisition, though evidence maturity is limited by methodological bias in nonrandomized studies. Conversely, ML models demonstrate superior predictive performance, with risk stratification reaching area under the curve (AUC) 0.96 (vs. logistic regression [LR] AUC 0.68) and segmentation accuracy at Dice similarity coefficient (DSC) > 0.85. However, ML implementation is hindered by algorithmic bias and complex regulatory hurdles. These findings underscore the need for higher-quality randomized controlled trials for XR technologies and robust governance frameworks for ML implementation in surgical practice.
Zoom
Fig. 5 (Bar chart) Risk of bias (RoB) distribution by assessment tool. The graph breaks down the risk of bias by the specific tool used for assessment, which is crucial for understanding the context of the judgments. Data source: Counts aggregated from the same Excel extraction sheet. The chart displays the number of studies categorized as low (green), some concerns (yellow), or high (red) risk of bias for each research tool: 1. RoB 2 (randomized controlled trials [RCTs]): Assesses randomized controlled trials. 2. ROBINS-I (non-RCTs): Assesses nonrandomized studies. 3. Critical Appraisal Skills Program (CASP) (qualitative): Assesses qualitative research. 4. AMSTAR-2 (reviews): Assesses systematic reviews. Interpretation: This shows that the highest risk of bias and “concerns” are associated with nonrandomized studies (assessed by ROBINS-I), which is a common finding. The reviews (AMSTAR-2) and RCTs (RoB 2) generally fared better, though the number of these high-quality studies was lower.