Background & Aims Screening for colorectal cancer (CRC) relies on colonoscopy and/or fecal occult blood
test while other (non-invasive) risk-stratification systems have not been implemented
into European guidelines. Here, we evaluated the potential of Machine Learning (ML)
methods to optimize prediction of advanced adenoma (AA).
Patients & Methods 5862 individuals participating in a screening program for colorectal cancer were
included after excluding patients with history of CRC, symptomatic patients and those
with insufficient colonoscopy. Adenoma were diagnosed histologically with AA being
≥1cm in size, or high-grade dysplasia/ villous features being present. Clinical, laboratory
and lifestyle parameters were assessed at the time of colonoscopy. Logistic regression
(LR) and extreme gradient boosting algorithms (XGBoost) were evaluated for AA-prediction
based on readily-available laboratory/clinical/lifestyle parameters. The dataset was
divided into a derivation cohort (for model development and internal cross-validation)
and an external validation cohort.
Results The mean age was 58.7±9.7 years with 2811 males (48.0 %). 1404 (24.0 %) suffered
from obesity (BMI≥30kg/m2), 871 (14.9 %) from diabetes, and 2095 (39.1 %) from the metabolic syndrome. Any
adenoma was detected in 1884 (32.1 %) and any AA in 437 (7.5 %). 659 individuals (11.2 %)
had a first-degree relative with a history of CRC. Modelling 36 laboratory parameters,
8 clinical parameters and data on 8 food types/dietary patterns, a moderate accuracy
to predict AA with XGBoost (AUC of 0.66-0.68) and LR (AUC of 0.65-0.66) could be achieved.
Limiting variables to established risk factors for AA did not significantly improve
performance. Also, subgroup analyses in subjects without genetic predisposition or
gender-specific analyses showed similar results.
Conclusion ML, based on point prevalence laboratory and clinical information, does not accurately
predict AA. Non-invasive risk-prediction seems insufficient to replace current CRC
screening programs. However, the potential for sequential application before colonoscopy
to increase pre-test probability warrants further investigation.