PIXEL VS PATCH-BASED DEEP LEARNING MODELS, PAVING THE WAY TOWARDS REAL-TIME AI-ASSISTED DETECTION OF BARRETT’S NEOPLASIA

M Abdelrahim; M Saiko; Y Masaike; S Arndtz; E Hossain; P Bhandari

doi:10.1055/s-0040-1704074

Endoscopy, Table of Contents

Endoscopy 2020; 52(S 01): S22-S23
DOI: 10.1055/s-0040-1704074

ESGE Days 2020 oral presentations

Friday, April 24, 2020 11:00 – 13:00 Artificial Intelligence inGI-endoscopy:Is the future here? Wicklow Meeting Room 3

PIXEL VS PATCH-BASED DEEP LEARNING MODELS, PAVING THE WAY TOWARDS REAL-TIME AI-ASSISTED DETECTION OF BARRETT’S NEOPLASIA

Authors

M Abdelrahim

¹Queen Alexandra Hospital, Endoscopy, Portsmouth, United Kingdom
M Saiko

²NEC Corporation, Biometrics Research Laboratories, Kanagawa, Japan
Y Masaike

³NEC Europe Ltd., , London, United Kingdom
S Arndtz

¹Queen Alexandra Hospital, Endoscopy, Portsmouth, United Kingdom
E Hossain

¹Queen Alexandra Hospital, Endoscopy, Portsmouth, United Kingdom
P Bhandari

¹Queen Alexandra Hospital, Endoscopy, Portsmouth, United Kingdom

Abstract

Full Text

Aims Early detection of Barrett’s neoplasia is challenging. Deep learning (DL) is proposed to play a role, with limited recent reports showing encouraging results. However, comparative data on the best methods to develop and implement this technology is lacking. We aim to compare two different DL models for detection of Barrett’s neoplasia, a classical patch-based, and a pixel-based model.

Methods We collected 76 anonymous, HDWLE, histologically-confirmed images from our database, including adenocarcinoma, HGD and LGD. For patch-based model, LeNet-5 architecture was used. Each image is divided into patches of 48x48 pixels, each patch had a confidence score and label (neoplastic or non-neoplastic). For pixel-based, we used SegNet architecture. Each pixel in the image was given a label and confidence score. Validation performed using 4 fold leave-one-out cross-validations. Graphic processing unit used was “GeForce RTX 2080 Ti. Processing speed, global accuracy (how often is the model prediction right), F-score (harmonic mean of sensitivity and precision), and IoU (overlap between model prediction and expert marking) were calculated and compared using paired sample t-test.

Results Average processing speed with pixel-based was 33ms/image, compared with 102.6ms/image for patch-based model. At a score threshold of 0.8, pixel and patch-based models showed mean values of global accuracy 88% and 84% (P-value 0.00002), IoU 0.40 and 0.21 (P< 0.0001), and F-score (for correctly predicted images, at IoU 0.5) 0.81 and 0.69 (P value < 0.0001), respectively.

Conclusions Pixel-based model is significantly faster, and performed better than patch-based model. Given average human visual response latency is estimated at 70-100ms, this data suggest our pixel-based model could potentially detect neoplasia faster than human eye so it will be best suited for real time detection. To our knowledge, this is the first report comparing these two different approaches in Barrett’s neoplasia and suggests that all future work should be done with Pixel based model.