Aims The quality of tip manipulation in endoscopy (tip-control) is akin to hand steadiness
in surgery where correlation with surgical complications and morbidity is established.
The subjective opinion of tip-control given by blinded human raters can accurately
classify endoscopy provider profile and therapeutic experience [1]. However, to remain unbiased, videos must be read by multiple human observers in
a blinded environment adding time, cost and a delay to obtaining results. We aimed
to train an artificial intelligence (AI) model to assess tip-control and compare its
performance to evaluations by blinded human raters on the same endoscopic videos.
Methods 13 endoscopists applied margin ablation on four shapes two times, printed onto a
ham with STSC in an ex-vivo model [1]. Procedural videos were blindly, anonymously, and randomly evaluated 3-times each
by a pool of 7 human raters using an online tool. Outcome of ratings was accuracy
(%), speed (mm/sec) and subjective tip-control score (%). An AI model was trained
on video data to detect detect the proximity between the endoscope and the printed
line (validation accuracy 88.5%). An unsupervised computer vision model [2] analyzed motion data, generating variables related to speed, online/offline periods,
and motion peaks. We fitted two generalized linear mixed models to predict the subjective
tip-control score. The first model was based on performance metrics including speed,
hit frequency, hit density, and accuracy. The second model utilized twelve AI-derived
metrics. We selected predictors by ranking all possible combinations using the Akaike
Information Criterion, then retained those in the final models that were included
in at least 4 out of 5 folds in cross- validation. Model performance was evaluated
by comparing predictions to the subjective tip-control ratings using Spearman rank
correlation.
Results Human raters agreed on subjective tip-control scores (intraclass correlation coefficient
0.86 [0.76-0.92]). The performance metric based model including speed (OR:1.03 [1.01-1.05],
p=0.002), accuracy (OR:1.07 [1.04-1.09],p<0.001) and number of correct hits per minute
(OR:1.05 [1.02-1.07],p<0.001) predicted subjective tip-control scores with high accuracy
(r=0.88, [0.83 0.92]). The AI-based model included five predictors: number of offline
periods (OR:1.01 [1-1.3],p=0.041), duration of offline periods (OR: 0.99 [0.99-0.99]),
median height of motion speed peaks during offline (OR:1.61 [0.94-2.76], p=0.082)
and online periods (OR:0.48 [0.2-1.15],p=0.101) and median prominence of peaks during
simulation (OR: 0.01 [0.001-0.18], p=0.002). The model based on AI derived metrics
predicted subjective tip-control scores also with good accuracy (r=0.78 [0.68-0.84]).
Conclusions An AI model reliably predicted blinded human rater subjective impression of endoscopist
tip-control using a standardized ex-vivo simulator. The AI model was comparable to
structured human performance-metric rating which involves significantly more time
and effort. The AI approach offers potential real-time, unbiased assessment of endoscopic
technical skills.