Subscribe to RSS
DOI: 10.1055/s-0043-1767088
Image- and text-based semi-automatic generation of surgery reports in paranasal sinus surgery
Introduction The aim of the project is the further development of an existing documentation tool, which generates a surgical report on the basis of voice inputs and selected individual images of endoscopic video recordings of paranasal sinus surgeries. This should shorten the documentation time and improve the report quality.
Material and methods An introduced language model was extended to process relevant single images from paranasal sinus surgeries in addition to textual surgery reports. This "vision language model" is based on artificial neural network architecture and recursively generates OR reports sentence by sentence based on the previously generated report sentences. 15-60 relevant frames were each selected by experts and by an automatic clustering algorithm. The generated OR reports were evaluated for specificity, sensitivity and semantics using the text metrics ROUGE, BLEU and METEOR.
Results An OR report is generated in 350 ms. The best result was achieved with manually selected image data (ROUGE 0.66, BLEU 0.40, METEOR 0.58). By adding specific OR-relevant image data, an objective improvement in report quality of 14% was achieved compared to text-only processing and 3% compared to automatic image selection.
Conclusion The benefit of a combined use of image and text data is shown for use cases of text generation. The semi-automatic approach of selecting and processing relevant image data in addition to text achieves better results than an automatic alternative. In the target scenario, the intraoperative simultaneous recording of a keyword and the corresponding image sequence can be expected to produce high-quality and efficient surgical documentation.
Publication History
Article published online:
12 May 2023
Georg Thieme Verlag
Rüdigerstraße 14, 70469 Stuttgart, Germany