J Neurol Surg B Skull Base 2025; 86(S 01): S1-S576
DOI: 10.1055/s-0045-1803348
Presentation Abstracts
Podium Presentations
Oral Presentations

Enhancing Surgical Video Phase Recognition with Advanced AI Models for Endoscopic Pituitary Tumor Surgery

Jack Cook
1   Surgical Data Science Collective, Washington, District of Columbia, United States
,
Jonathan Chainey
2   Division of Neurosurgery, Department of Surgery, University of Toronto, Toronto, Ontario, Canada
,
Ruth Lau
3   Division of Neurosurgery, Department of Surgery, University of Toronto, Toronto, Ontario, Canada
,
Margaux Masson-Forsythe
1   Surgical Data Science Collective, Washington, District of Columbia, United States
,
Ayesha Syeda
1   Surgical Data Science Collective, Washington, District of Columbia, United States
,
Kaan Duman
1   Surgical Data Science Collective, Washington, District of Columbia, United States
,
Daniel Donoho
5   Surgical Data Science Collective, Washington, District of Columbia, United States
,
Dhiraj A. Pangal
7   Department of Neurosurgery, Stanford University, Stanford, California, United States
,
Juan Vivanco Suarez
8   University of Iowa, Iowa City, Iowa, United States
› Institutsangaben
 

Introduction: Operative videos are often used as a source of surgical education and demonstration. With improvements in computer vision, surgical video analytics has revolutionized the analysis of surgical performance. However, surgical videos, particularly skull base procedures, are lengthy, require significant manual effort to optimize for downstream functions, and are poorly delineated. Detecting and identifying phases of surgery can help surgeons quickly skip to parts of the surgery that are applicable for education and demonstration and provide useful, targeted analytical insights. We introduce an artificial intelligence model designed to segment pituitary tumor surgery into four distinct and essential phases: nasal, sphenoid, sellar, and closure. This was achieved by collaborating with a global team of surgeons to create an extensive dataset of labeled phase videos.

Method: Our total dataset includes 127 video clips across 38 case videos from 3 contributing centers. We split our dataset into 80% training data and 20% validation and test data. We developed two deep-learning model pipelines to segment the phases of pituitary tumor surgery. The first pipeline employs a state-of-the-art video transformer model to directly predict surgical phases from video input. The second pipeline generates frame-by-frame embeddings, which are then processed using an MSTCN++ (Multi-Stage Temporal Convolutional Network) model to predict phases. A post-processing stage utilizing an accumulator is applied to enhance the accuracy and consistency of the predictions. This stage mitigates any erratic predictions. Our approach is validated using a comprehensive dataset of labeled phase videos provided by a global team of surgeons. Also included are remapped and adapted data from the PitVis dataset (PitVis dataset [data set]. Synapse. https://www.synapse.org/Synapse:syn51232283/wiki/621581).

Results: The performance of the two deep learning model pipelines was evaluated using accuracy, precision, and F1 score as the primary metrics. These metrics provided a comprehensive assessment of the model’s ability to segment the surgical phases accurately and precisely. The embeddings pipeline achieved an accuracy of 77.7% over the test set, whereas the video transformer achieved an accuracy of 72%. In addition to quantitative metrics, visual segmentation timelines were generated for a visual performance analysis ([Fig. 1]), the additional smoothing effects of the accumulator in postprocessing are also visible. These timelines helped illustrate the phase prediction’s effectiveness and identify any discrepancies or areas for improvement in the segmentation process.

Zoom
Fig. 1 Visual timeline predictions of phases.

Conclusion: Our study demonstrates the effectiveness of two deep-learning model pipelines in segmenting Pituitary Tumor Surgery into four distinct phases. By leveraging the video transformer model and a combination of frame-by-frame embeddings with the MSTCN++ model, we achieved high accuracy, precision, and F1 scores. The post-processing stage using an accumulator further refined these predictions, resulting in coherent and reliable phase segmentations. Including remapped and adapted data from the PitVis dataset, combined with visual segmentation timelines, provided robust performance analysis and valuable insights. This approach not only enhances surgical training and performance but also has the potential to be adapted to other types of surgeries, contributing to the advancement of surgical analytics and education.



Publikationsverlauf

Artikel online veröffentlicht:
07. Februar 2025

© 2025. Thieme. All rights reserved.

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany