Image- and text-based semi-automatic generation of surgery reports in paranasal sinus surgery

Martin Sorge; Richard Bieck; Markus Pirlich; Andreas Dietz; Viktor Kunz; Valentina Wildfeuer; Thomas Neumuth

doi:10.1055/s-0043-1767088

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00000036.xml

Share / Bookmark

Facebook X Linkedin Weibo

Laryngorhinootologie 2023; 102(S 02): S198
DOI: 10.1055/s-0043-1767088

Abstracts | DGHNOKHC

Surgical assistance procedures/Robotics/Navigation

Image- and text-based semi-automatic generation of surgery reports in paranasal sinus surgery

Martin Sorge

¹Univ.-Klinikum Leipzig, Klinik und Poliklinik f. HNO-Heilkunde

,

Richard Bieck

²Innovation Center Computer Assisted Surgery (ICCAS)

,

Markus Pirlich

³Univ.-Klinikum Leipzig, Klinik und Poliklinik f. HNO-Heilkunde

,

Andreas Dietz

³Univ.-Klinikum Leipzig, Klinik und Poliklinik f. HNO-Heilkunde

,

Viktor Kunz

³Univ.-Klinikum Leipzig, Klinik und Poliklinik f. HNO-Heilkunde

,

Valentina Wildfeuer

³Univ.-Klinikum Leipzig, Klinik und Poliklinik f. HNO-Heilkunde

,

Thomas Neumuth

²Innovation Center Computer Assisted Surgery (ICCAS)

› Author Affiliations

› Further Information

Also available at

Congress Abstract
Full Text

Introduction The aim of the project is the further development of an existing documentation tool, which generates a surgical report on the basis of voice inputs and selected individual images of endoscopic video recordings of paranasal sinus surgeries. This should shorten the documentation time and improve the report quality.

Material and methods An introduced language model was extended to process relevant single images from paranasal sinus surgeries in addition to textual surgery reports. This "vision language model" is based on artificial neural network architecture and recursively generates OR reports sentence by sentence based on the previously generated report sentences. 15-60 relevant frames were each selected by experts and by an automatic clustering algorithm. The generated OR reports were evaluated for specificity, sensitivity and semantics using the text metrics ROUGE, BLEU and METEOR.

Results An OR report is generated in 350 ms. The best result was achieved with manually selected image data (ROUGE 0.66, BLEU 0.40, METEOR 0.58). By adding specific OR-relevant image data, an objective improvement in report quality of 14% was achieved compared to text-only processing and 3% compared to automatic image selection.

Conclusion The benefit of a combined use of image and text data is shown for use cases of text generation. The semi-automatic approach of selecting and processing relevant image data in addition to text achieves better results than an automatic alternative. In the target scenario, the intraoperative simultaneous recording of a keyword and the corresponding image sequence can be expected to produce high-quality and efficient surgical documentation.

Publication History

Article published online:
12 May 2023

Georg Thieme Verlag
Rüdigerstraße 14, 70469 Stuttgart, Germany