Aims Precise surgical phase recognition and evaluation may improve our understanding of
complex endoscopic procedures. Furthermore, quality control measurements and endoscopy
training could benefit from objective descriptions of surgical phase distributions.
Therefore, we aimed to develop an artificial intelligence algorithm for frame-by-frame
operational phase recognition during endoscopic submucosal dissection (ESD).
Methods Full length ESD-videos from 31 patients comprising 6.297.782 single images were collected
retrospectively. Videos were annotated on a frame-by-frame basis for the operational
macro-phases diagnostics, marking, injection, dissection and bleeding. Further subphases
were the application of electrical current, visible injection of fluid into the submucosal
space and scope manipulation, leading to 11 phases in total. 4.975.699 frames (21
patients) were used for training of a video swin transformer using uniform frame sampling
for temporal information. Hyperparameter tuning was performed with 897.325 further
frames (6 patients), while 424.758 frames (4 patients) were used for validation.
Results The overall F1 scores on the test dataset for the macro-phases and all 11 phases
were 0.96 and 0.90, respectively. The recall values for diagnostics, marking, injection,
dissection and bleeding were 1.00, 1.00, 0.95, 0.96 and 0.93, respectively.
Conclusions The algorithm classified operational phases during ESD with high accuracy. A precise
evaluation of phase distribution may allow for the development of objective quality
metrics for quality control and training.