TRIPOD Checklist: Prediction Model Development and Validation

a


ESD procedure
In the Dutch ESD cohort, all assessments and ESDs were performed by 6 experienced interventional endoscopists (WdG, ADK, PD, LMG, JCH, JJB) who followed extensive training in ESD (tutorial courses and animal in vivo training).None of the endoscopists had prior experience in ESD before.In the early phase of the study period (2011 till ~2014-2015), ESD was performed according to the conventional method as described previously 8 .Thereafter, ESD was mostly carried out according to the pocket-creation 9 or tunneling method 10 .The perimeter of the lesion was not routinely marked before starting the dissection.The adopted ESD method was not routinely recorded in the endoscopy reports.
In the Swedish validation cohort, all ESDs were performed by 2 experienced interventional endoscopists (FBS, MO) who have followed extensive training in ESD (tutorial courses and animal in vivo training).Both endoscopists had limited prior experience in ESD before their first procedure in this series.
Predicted probabilities could not be calculated because the intercept of the Eastern prediction model was not reported in the original paper.Therefore, calibration of the model also could not be evaluated by a calibration plot (which presents the predicted probability against the observed risk).

Statistical analyses
All statistical analyses were performed using R v4.1.2.Nominal and ordinal variables were expressed as frequencies and percentages, and continuous variables as means and standard deviations (SD).
Pearson's chi-square test was used to compare categorical data.Continuous variables were compared using a one-way analysis of variance.A p-value of <0.05 was considered statistically significant.
Multiple imputation by chained equations (mice package, 10 datasets) was used to address missing data while respecting the correlation structure.Details on the extent of missing data per variable are provided in Supplementary Table 7.After imputation, the performance of current time planning practice was quantified by calculating the proportion of explained variance (R 2 ).Pooling of the R 2 was done according to Rubin's Rules 12 using the pool.r.squared function.The performance of the Eastern prediction model for ESD duration 11 was quantified by calculating the c-statistic.The c-statistic of the 10 different datasets were calculated using the ROCit package and pooled according to Rubin's Rules 12 using the pool_auc function.Visualizations of the receiver operating characteristic (ROC) and probabilities per scoring category were created from a stochastic single imputation dataset (the first of a series of datasets generated through multiple imputation).
For the development of the cESD-TIME formula, no formal sample size calculation was performed.
This was because the number of subjects per predictor (SPP) was a priori expected to be much larger (>20-30 SPP) than rules of thumb (2 SPP) which have been proposed for adequate estimation of regression coefficients 13,14 .
The cESD-TIME formula was developed using the rms package.Continuous variables were winsorized at the 1 st and 99 th percentile before imputation 15 .The cESD-TIME formula was built using multivariable linear regression with backward selection based on p<0.20 15 .Uniform shrinkage of regression coefficients was applied when the mean bootstrapped shrinkage factor was <0.99 15 .Nonlinearity of continuous predictors was assessed using polynomials, variable restrictions and transformations, and restricted cubic splines.R 2 was used as the outcome measure for model performance.Internal bootstrap validation (validate function, 1000 replications) was used to evaluate the risk of overfitting.Internal-external cross-validation by omission of each of the 3 centers in turn 16 was used to mimic the situation that the model is applied in a new center.This crossvalidation strategy was also performed with omission of each of the ESD endoscopists in turn.The final model was based on the full analysis cohort 16 and was assessed for overall performance and calibration in fully external validation.If the R 2 of the developed model in external validation were lower than the optimism-corrected R 2 , simple recalibration methods such as calibration-in-the-large and slope recalibration of the linear predictor were considered first before using more advanced methods such as model revision and extension.
The online calculator was built using the shiny package, and Bayesian estimates and prediction intervals were created using the brms package and the predict function.All model parameters were set as non-informative priors.

Transformations of a previous Eastern prediction model
Linear regression showed that the original Eastern model explained 45% of the variance of the ESD durations (95%-CI: 38-52%).After transformation of the dichotomized quantitative variables of the Eastern model to continuous variables and non-linear fitting of these variables, the model's performance increased to 61% (full model with regression coefficients shown in Supplementary Table 3).Including all locations from the Eastern model (cecum, dentate line, flexure) as 3 separate yes/no variables did not further increase the R 2 (61%, 95%-CI: 54-68%).Moreover, the performance remained unchanged when categorizing tumor location into rectum, left and right hemicolon (R 2 =61%, 95%-CI: 54-67%).Lastly, ungrouping the combined item "morphology" into its 2 separate components (gross morphology: protruding, sessile or flat; granularity: granular or non-granular surface) also did not considerably improve the model's performance (R 2 =62%, 95%-CI: 55-67%).

Endoscopic maneuverability as potential predictor in the cESD-TIME formula
Endoscopic maneuverability has been proposed as a crucial determinant of ESD complexity and duration [17][18][19] .In the analysis cohort, maneuverability was subjectively evaluated and reported in 49% of cases.After imputation and backward selection, maneuverability was included as predictor in the resulting model (β=19, p<0.001;Supplementary Table 8), together with the other 6 variables of the cESD-TIME formula.The R 2 of this exploratory model was 65% (95%-CI: 59-71%).

Table 2 .
Reasons *The number of patients is lower than the number of procedures because two patients underwent 2 single ESDs at 2 different time points **Numbers of missing values per center are shown in Supplementary Table7Values are n (%) unless otherwise defined.ASA: American Society of Anesthesiologists physical status classification system, BMI: Body mass index, CRC: colorectal cancer, ESD: endoscopic submucosal dissection, SD: standard deviation Supplementary

Table 3 .
Transformed Eastern model to predict ESD duration in minutes

Table 4 .
Associations between all candidate predictors and ESD duration in univariable linear regression

Table 5 .
Associations between all candidate predictors and ESD duration in multivariable linear regression

Table 6 .
Key tumor and ESD characteristics of the independent Swedish validation cohort Dang H et al.Predicting procedure duration of colorectal e... Endosc Int Open 2023; 11 | © 2023 The Author(s).

Table 7 .
Number of missing values per variable

Table 1 .
Lesion characteristics of the analysis cohort

Table 8 .
cESD-TIME formula with endoscopic maneuverability to predict ESD duration in minutes