Laryngorhinootologie 2023; 102(S 02): S198
DOI: 10.1055/s-0043-1767088
Abstracts | DGHNOKHC
Surgical assistance procedures/Robotics/Navigation

Image- and text-based semi-automatic generation of surgery reports in paranasal sinus surgery

Martin Sorge
1   Univ.-Klinikum Leipzig, Klinik und Poliklinik f. HNO-Heilkunde
,
Richard Bieck
2   Innovation Center Computer Assisted Surgery (ICCAS)
,
Markus Pirlich
3   Univ.-Klinikum Leipzig, Klinik und Poliklinik f. HNO-Heilkunde
,
Andreas Dietz
3   Univ.-Klinikum Leipzig, Klinik und Poliklinik f. HNO-Heilkunde
,
Viktor Kunz
3   Univ.-Klinikum Leipzig, Klinik und Poliklinik f. HNO-Heilkunde
,
Valentina Wildfeuer
3   Univ.-Klinikum Leipzig, Klinik und Poliklinik f. HNO-Heilkunde
,
Thomas Neumuth
2   Innovation Center Computer Assisted Surgery (ICCAS)
› Author Affiliations
 

Introduction The aim of the project is the further development of an existing documentation tool, which generates a surgical report on the basis of voice inputs and selected individual images of endoscopic video recordings of paranasal sinus surgeries. This should shorten the documentation time and improve the report quality.

Material and methods An introduced language model was extended to process relevant single images from paranasal sinus surgeries in addition to textual surgery reports. This "vision language model" is based on artificial neural network architecture and recursively generates OR reports sentence by sentence based on the previously generated report sentences. 15-60 relevant frames were each selected by experts and by an automatic clustering algorithm. The generated OR reports were evaluated for specificity, sensitivity and semantics using the text metrics ROUGE, BLEU and METEOR.

Results An OR report is generated in 350 ms. The best result was achieved with manually selected image data (ROUGE 0.66, BLEU 0.40, METEOR 0.58). By adding specific OR-relevant image data, an objective improvement in report quality of 14% was achieved compared to text-only processing and 3% compared to automatic image selection.

Conclusion  The benefit of a combined use of image and text data is shown for use cases of text generation. The semi-automatic approach of selecting and processing relevant image data in addition to text achieves better results than an automatic alternative. In the target scenario, the intraoperative simultaneous recording of a keyword and the corresponding image sequence can be expected to produce high-quality and efficient surgical documentation.



Publication History

Article published online:
12 May 2023

Georg Thieme Verlag
Rüdigerstraße 14, 70469 Stuttgart, Germany