Zentralbl Chir 2025; 150(S 01): S97
DOI: 10.1055/s-0045-1809785
Abstracts
Innovative Technologien

SurgiMind: Next-Generation Surgical Image Segmentation leveraging Transformers for Lung Cancer Surgery

F Ponholzer
1   Medical University of Innsbruck, Department of Visceral, Transplant and Thoracic Surgery, Innsbruck, Österreich
,
D Barnes
2   University of Innsbruck, Department of Computer Science, Innsbruck, Österreich
,
L Sugic
1   Medical University of Innsbruck, Department of Visceral, Transplant and Thoracic Surgery, Innsbruck, Österreich
,
L Mayr
2   University of Innsbruck, Department of Computer Science, Innsbruck, Österreich
,
S Schneeberger
1   Medical University of Innsbruck, Department of Visceral, Transplant and Thoracic Surgery, Innsbruck, Österreich
,
J Piater
2   University of Innsbruck, Department of Computer Science, Innsbruck, Österreich
,
D Öfner
1   Medical University of Innsbruck, Department of Visceral, Transplant and Thoracic Surgery, Innsbruck, Österreich
,
F Augustin
1   Medical University of Innsbruck, Department of Visceral, Transplant and Thoracic Surgery, Innsbruck, Österreich
,
K Grossmann
2   University of Innsbruck, Department of Computer Science, Innsbruck, Österreich
,
A Rodríguez-Sánchez
2   University of Innsbruck, Department of Computer Science, Innsbruck, Österreich
› Author Affiliations
 

Background To develop and evaluate a transformer-based deep learning model for real-time anatomical structure segmentation in video-assisted thoracoscopic surgery (VATS) for right upper lobe lobectomy in lung cancer patients.

Methods & Materials A retrospective cohort study was conducted using thoracoscopic video recordings from 81 patients who underwent anatomical VATS right upper lobe resection between 2009 and 2024. A total of 1539 frames were extracted and manually annotated for eight anatomical classes: right upper pulmonary vein, azygos vein, right upper lobe bronchus, phrenic nerve, middle lobe vein, A2 segmental artery, truncus anterior, and pulmonary main artery. Three deep learning architectures (U-Net, Fully Convolutional Transformer [FCT], and the novel Surgi-FCT) were trained and evaluated. Surgi-FCT was optimized by removing the Wide Focus layer and increasing the network depth to improve feature extraction and reduce computational overhead. Evaluation metrics included Dice coefficient, Intersection over Union (IoU), and precision, with separate analyses for class-present (CP) and class-absent (CA) scenarios.

Results The Surgi-FCT model with 7 encoder-decoder layers (Surgi-FCT 7) trained on 640×640 images achieved the best segmentation performance, with an average Dice coefficient of 0.69 (CP) and 0.88 (CA), resulting in an overall Dice of 0.82. This outperformed U-Net (Dice: 0.56 CP, 0.79 CA) and FCT (Dice: 0.68 CP, 0.84 CA). Surgi-FCT 7 was particularly effective in segmenting frequently occurring classes such as the pulmonary main artery and phrenic nerve. Classes with fewer examples, such as the A2 artery and middle lobe vein, had lower Dice scores (0.40 and 0.62 respectively) but still showed improved performance in multi-class training compared to single-class models. The network demonstrated that class co-occurrence, as observed in correlation matrices, improved segmentation accuracy—e.g., co-detection of the azygos vein and main artery. Higher image resolution and deeper model architecture also led to performance gains, though at increased computational cost.

Conclusion The Surgi-FCT 7 model enables accurate segmentation of complex anatomical structures in thoracic surgery videos. Leveraging transformer attention and class co-occurrence, it outperforms conventional CNN-based architectures and provides a scalable foundation for AI-powered visual assistance tools in minimally invasive thoracic surgery.



Publication History

Article published online:
25 August 2025

© 2025. Thieme. All rights reserved.

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany