Enhancing Surgical Video Phase Recognition with Advanced AI Models for Endoscopic Pituitary Tumor Surgery

Jack Cook; Jonathan Chainey; Ruth Lau; Margaux Masson-Forsythe; Ayesha Syeda; Kaan Duman; Daniel Donoho; Dhiraj A. Pangal; Juan Vivanco Suarez

doi:10.1055/s-0045-1803348

RSS-Feed abonnieren

Bitte kopieren Sie die angezeigte URL und fügen sie dann in Ihren RSS-Reader ein.

https://www.thieme-connect.de/rss/thieme/de/10.1055-s-00000181.xml

J Neurol Surg B Skull Base 2025; 86(S 01): S1-S576
DOI: 10.1055/s-0045-1803348

Presentation Abstracts

Podium Presentations

Oral Presentations

Enhancing Surgical Video Phase Recognition with Advanced AI Models for Endoscopic Pituitary Tumor Surgery

Autoren

Jack Cook

¹Surgical Data Science Collective, Washington, District of Columbia, United States
Jonathan Chainey

²Division of Neurosurgery, Department of Surgery, University of Toronto, Toronto, Ontario, Canada
Ruth Lau

³Division of Neurosurgery, Department of Surgery, University of Toronto, Toronto, Ontario, Canada
Margaux Masson-Forsythe

¹Surgical Data Science Collective, Washington, District of Columbia, United States
Ayesha Syeda

¹Surgical Data Science Collective, Washington, District of Columbia, United States
Kaan Duman

¹Surgical Data Science Collective, Washington, District of Columbia, United States
Daniel Donoho

⁵Surgical Data Science Collective, Washington, District of Columbia, United States
Dhiraj A. Pangal

⁷Department of Neurosurgery, Stanford University, Stanford, California, United States
Juan Vivanco Suarez

⁸University of Iowa, Iowa City, Iowa, United States

Weitere Informationen

Auch verfügbar auf

Introduction: Operative videos are often used as a source of surgical education and demonstration. With improvements in computer vision, surgical video analytics has revolutionized the analysis of surgical performance. However, surgical videos, particularly skull base procedures, are lengthy, require significant manual effort to optimize for downstream functions, and are poorly delineated. Detecting and identifying phases of surgery can help surgeons quickly skip to parts of the surgery that are applicable for education and demonstration and provide useful, targeted analytical insights. We introduce an artificial intelligence model designed to segment pituitary tumor surgery into four distinct and essential phases: nasal, sphenoid, sellar, and closure. This was achieved by collaborating with a global team of surgeons to create an extensive dataset of labeled phase videos.

Method: Our total dataset includes 127 video clips across 38 case videos from 3 contributing centers. We split our dataset into 80% training data and 20% validation and test data. We developed two deep-learning model pipelines to segment the phases of pituitary tumor surgery. The first pipeline employs a state-of-the-art video transformer model to directly predict surgical phases from video input. The second pipeline generates frame-by-frame embeddings, which are then processed using an MSTCN++ (Multi-Stage Temporal Convolutional Network) model to predict phases. A post-processing stage utilizing an accumulator is applied to enhance the accuracy and consistency of the predictions. This stage mitigates any erratic predictions. Our approach is validated using a comprehensive dataset of labeled phase videos provided by a global team of surgeons. Also included are remapped and adapted data from the PitVis dataset (PitVis dataset [data set]. Synapse. https://www.synapse.org/Synapse:syn51232283/wiki/621581).

Results: The performance of the two deep learning model pipelines was evaluated using accuracy, precision, and F1 score as the primary metrics. These metrics provided a comprehensive assessment of the model’s ability to segment the surgical phases accurately and precisely. The embeddings pipeline achieved an accuracy of 77.7% over the test set, whereas the video transformer achieved an accuracy of 72%. In addition to quantitative metrics, visual segmentation timelines were generated for a visual performance analysis ([Fig. 1]), the additional smoothing effects of the accumulator in postprocessing are also visible. These timelines helped illustrate the phase prediction’s effectiveness and identify any discrepancies or areas for improvement in the segmentation process.

Fig. 1 Visual timeline predictions of phases.

Conclusion: Our study demonstrates the effectiveness of two deep-learning model pipelines in segmenting Pituitary Tumor Surgery into four distinct phases. By leveraging the video transformer model and a combination of frame-by-frame embeddings with the MSTCN++ model, we achieved high accuracy, precision, and F1 scores. The post-processing stage using an accumulator further refined these predictions, resulting in coherent and reliable phase segmentations. Including remapped and adapted data from the PitVis dataset, combined with visual segmentation timelines, provided robust performance analysis and valuable insights. This approach not only enhances surgical training and performance but also has the potential to be adapted to other types of surgeries, contributing to the advancement of surgical analytics and education.

Publikationsverlauf

Artikel online veröffentlicht:
07. Februar 2025

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany

Weitere Informationen

Ähnliche Zeitschriften

Bücher zum Thema

RSS-Feed abonnieren

Teilen / Bookmarken

Enhancing Surgical Video Phase Recognition with Advanced AI Models for Endoscopic Pituitary Tumor Surgery

Autoren

Publikationsverlauf