A Novel BERT-Based Machine Learning Approach for Enhanced CSF Leak Prediction in Endoscopic Endonasal Skull Base Surgery

Rayan Alfallaj; Yakoub Bazi; Mohamad M. A. Rahhal; Mansour Zuair; Ashwaq Alqurashi; Ahmad Alroqi; Abdulrazag Ajlan; Saad Alsaleh; Abdulaziz S. Alrasheed

doi:10.1055/a-2719-8970

Journal of Neurological Surgery Part B: Skull Base, Table of Contents

CC BY 4.0 · J Neurol Surg B Skull Base
DOI: 10.1055/a-2719-8970

Original Article

A Novel BERT-Based Machine Learning Approach for Enhanced CSF Leak Prediction in Endoscopic Endonasal Skull Base Surgery

Authors

Rayan Alfallaj

¹Department of Otolaryngology - Head and Neck Surgery, College of Medicine, King Saud University, Riyadh, Saudi Arabia
Yakoub Bazi

²Computer Engineering Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
Mohamad M. A. Rahhal

³Applied Computer Science Department, College of Applied Computer Science, King Saud University, Riyadh, Saudi Arabia
Mansour Zuair

²Computer Engineering Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
Ashwaq Alqurashi

⁴Devision of Neurosurgery, King Saud University, Riyadh, Saudi Arabia
Ahmad Alroqi

¹Department of Otolaryngology - Head and Neck Surgery, College of Medicine, King Saud University, Riyadh, Saudi Arabia
Abdulrazag Ajlan

⁴Devision of Neurosurgery, King Saud University, Riyadh, Saudi Arabia
Saad Alsaleh

¹Department of Otolaryngology - Head and Neck Surgery, College of Medicine, King Saud University, Riyadh, Saudi Arabia
Abdulaziz S. Alrasheed

¹Department of Otolaryngology - Head and Neck Surgery, College of Medicine, King Saud University, Riyadh, Saudi Arabia

Abstract

Full Text

PDF Download

Keywords

AI - CSF - machine learning - NLP - skull base - BERT - reconstruction - rhinology

Introduction

The endoscopic endonasal approach has become the dominant and widely accepted method for removing skull base tumors in the sellar and parasellar regions.[1] Skull base reconstruction plays a pivotal role in mitigating the risk of cerebrospinal fluid (CSF) leakage and its associated complications, including the development of meningitis.[2] Achieving successful skull base reconstruction and preventing CSF leaks has significant positive implications for reducing hospitalization duration and alleviating the burden on healthcare costs.[3] CSF leak is a recognized risk during endoscopic sellar surgery, especially when managing large tumors or craniopharyngiomas. Evidence indicates that intraoperative CSF leaks occur in about 30.1% of cases, while postoperative leaks are reported in roughly 3.7 to 9%,[4] [5] influenced by various factors such as surgeon experience, tumor characteristics, and individual patient-related variables. The risk of CSF leak following endoscopic skull base surgery is further affected by tumor size and consistency, patient age, BMI, and surgical factors[6]; prevention depends on appropriate reconstruction techniques, with autologous grafts used for small, low-flow defects and vascularized tissue, particularly the nasoseptal flap, serving as the gold standard for larger or high-flow leaks in anterior skull base reconstruction.[7]

Despite established reconstruction techniques and known risk factors, accurately predicting which patients will develop postoperative CSF leaks remains challenging, necessitating more sophisticated predictive methodologies.

The advancement of artificial intelligence in healthcare enables the development of machine learning (ML) models that can predict CSF leak risk factors.[7] AI models can integrate multiple patient, tumor, and surgical variables to predict intraoperative CSF leaks. In a cohort of 238 pituitary adenoma cases, machine-learning methods were tested; random forest performed best, accurately identifying most leaks and outperforming traditional statistics and other ML models.[8]

This capability will significantly impact surgical practice by informing pre-, intra-, and postoperative management decisions, including reconstruction method selection. In natural language processing (NLP), BERT has become a benchmark model because it is first exposed to vast amounts of raw, unlabeled text and can process information from both directions across relatively long passages. BERT, short for bidirectional encoder representations from transformers, uses only the encoder part of the transformer architecture to learn word meanings that depend on the words that come before and after them, producing a highly adaptable, context-aware language representation. Although the original BERT was trained on general-purpose English text, researchers have since built tailored versions for specialized domains. For example, BioBERT, BlueBERT, SciBERT, and PubMedBERT have been retrained on biomedical literature so the model recognizes technical vocabulary and clinical phrasing more accurately. These domain-specific variants preserve BERT's key strengths, bidirectional context, and the ability to digest long sequences while improving performance on biomedical text-mining tasks.[9] This is the first study to compare a BERT-based model with conventional statistical techniques for detecting CSF-leak risk factors in endoscopic skull-base surgery, leveraging qualitative (nonnumeric) inputs to simplify the handling of large, variable datasets and ultimately boost predictive accuracy and guide reconstruction decisions more effectively.

Materials and Methods

King Saud University Medical City, in collaboration with the College of Computer Science at King Saud University. In this study, we developed a traditional logistic regression model and compared its performance to an NLP model, specifically a BERT model. Ethical approval for the study was obtained from the ethics committee at the College of Medicine at King Saud University (no. 23-8232). [Fig. 1] shows the overall methodological process of this project.

Fig. 1 Methodological overview of the study process.

Data Categorization

We extracted information from healthcare records within the study setting, including all cases of skull base pathologies managed by a multidisciplinary team specializing in rhinology, skull base surgery, and neurosurgery at King Saud University Medical City. The available data, which served as input for both the BERT model and the traditional statistical model, were categorized into the following groups: demographics: age, gender, and BMI. Perioperative clinical CSF leak indicators: postoperative clinical symptoms or signs raising suspicion, need for postoperative CT scans and their timing, and length of hospital stay. Pathology-related factors: tumor size, presence of hydrocephalus, tumor location, and histopathology. Surgical factors: type and extent of surgical approach, preoperative CSF diversion, intraoperative CSF leak presence, CSF leak flow rate, nasal flap reconstruction, use of nasal packing, and types of reconstruction materials used. Perioperative CT scan features, including the timing, presence, and location of pneumocephalus; the presence and size of any hematoma; graft characteristics; signs of air communication; and findings from repeat CT scans, were evaluated by a skull-base team comprising a rhinologist, neurosurgeon, and radiologist. The resulting radiological findings data were then entered manually into a text-based datasheet.

Development of the Traditional Logistic Regression Model

Descriptive statistics were generated using the compareGroups package, with missing data imputed (Hmisc), categorical variables factorized, and class imbalance addressed via upsampling (caret). Continuous data were summarized as mean ± SD or median/IQR, and categorical data as counts and percentages. Univariate analysis used Somers' D and the C statistic (AUC) to assess variable discrimination. Clinically relevant variables with high C statistics were included regardless of significance. Upsampling outperformed SMOTE and ROSE, preserving data structure and improving model performance. A bootstrap approach reduced overfitting, and nonsignificant variables were excluded. The final logistic regression model, validated by ROC and kappa statistics, showed improved predictive accuracy and robustness.

Development of the BERT Model

Categorical medical data input: previously described were collected as input data, forming the foundation of the predictive model.

Tabular Dataset

Each patient record is represented as a sequence x = [c₁,…,c⅞,y], where c contains k categorical attributes encoded as their text labels (e.g., “male,” “Craniopharyngioma”) and y∈{0,1} denotes the presence of a CSF leak. The entire text string is passed to the standard BERT tokenizer (bert-base-uncased), which automatically converts the words into WordPiece tokens, adds the special [CLS] and [SEP] markers, and pads or truncates the sequence to a fixed length.

The resulting token IDs are fed to the BERT encoder. Within each layer, multi-head self-attention computes contextualized representations using the canonical (see [Supplementary Material S1], available in the online version only, for formulas details), followed by residual connections, layer normalization, and feed-forward sublayers. After the final layer, the [CLS] token is prepended to each sequence, and its final hidden state is extracted as a pooled representation and passed through a linear layer to classify CSF leak versus no leak, as illustrated in [Fig. 2].

Fig. 2 Overview of the classification pipeline based on BERT for CSF leak detection.

All mathematical derivations, hyperparameters, and the training loss are presented in [Supplementary Material S1] (available in the online version only).

BERT Model Training and Optimization

The BERT classifier was fine-tuned on twofold cross-validation (50% training/50% testing). Optimization used AdamW (learning-rate 1 × 10⁻⁵, weight-decay 1 × 10⁻²) for 20 epochs and a binary cross-entropy objective; focal weighting was applied to mitigate class imbalance (see [Supplementary Material S2], available in the online version only, for the exact loss formula). Performance was reported as the mean of the two folds using accuracy, precision, recall (sensitivity), positive and negative predictive values, F1 score (the harmonic mean of precision and recall, ranging from 0 to 1), and AUC.

Results

Descriptive Statistics for the Study Subjects

A total of 116 skull-base surgery patients were included. No statistically significant differences were found between the CSF-leak and no-leak groups in age (median: 29 years vs. 42 years, p = 0.526), gender (60.0% vs. 57.4% female, p = 1.000), preoperative CSF diversion (53.3% vs. 50.5%, p = 1.000), or hydrocephalus (26.7% vs. 8.9%, p = 0.065). Weight, tumor location, pathology size, reconstruction materials (fat, fascia, hard graft, flap, nasal pack, glue, gasket), and surgical approach also showed no significant differences. Moreover, within both the gross-total resection (GTR) and subtotal resection (STR) groups, the proportions of CSF-leak versus no-leak cases were not significantly different (all p > 0.05) as detailed in [Table 1].

Table 1
Descriptive statistics for the study sample
	No CSF leak	CSF leak	p-Value
	n = 101	n = 15
Age	42.0 [24.0; 54.0]	29.0 [23.5; 49.0]	0.526
Gender			1.000
Female	58 (57.4%)	9 (60.0%)
Male	43 (42.6%)	6 (40.0%)
Preoperative diversion			1.000
No	51 (50.5%)	7 (46.7%)
Yes	50 (49.5%)	8 (53.3%)
Preoperative hydrocephalus			0.065
No	92 (91.1%)	11 (73.3%)
Yes	9 (8.91%)	4 (26.7%)
Weight
Underweight	5 (4.95%)	0 (0.00%)	1.000
Normal	31 (30.7%)	5 (33.3%)	1.000
Overweight	21 (20.8%)	6 (40.0%)	0.112
Obese	44 (43.6%)	4 (26.7%)	0.338
Location (anterior skull base)			0.580
No	52 (51.5%)	6 (40.0%)
Yes	49 (48.5%)	9 (60.0%)
Location (posterior fossa)			0.633
No	92 (91.1%)	13 (86.7%)
Yes	9 (8.91%)	2 (13.3%)
Location (sellar/suprasellar)			0.413
No	59 (58.4%)	11 (73.3%)
Yes	42 (41.6%)	4 (26.7%)
Size of the pathology
(1–2) cm	16 (15.8%)	3 (20.0%)	0.710
(2–3) cm	28 (27.7%)	4 (26.7%)	1.000
>3 cm	44 (43.6%)	8 (53.3%)	0.666
Type of reconstruction material
Fat	77 (76.2%)	13 (86.7%)	0.515
Fascia	34 (33.7%)	6 (40.0%)	0.849
Hard (bone)	42 (41.6%)	10 (66.7%)	0.122
Naso-septal flap	65 (64.4%)	11 (73.3%)	0.695
nasal pack	74 (73.3%)	9 (60.0%)	0.358
Glue	67 (66.3%)	10 (66.7%)	1.000
Gasket	21 (20.8%)	3 (20.0%)	1.000
Endoscopic approach			0.238
Extended approach	61 (60.4%)	12 (80.0%)
Standard approach (transsphenoidal transsellar)	40 (39.6%)	3 (20.0%)
GTR (gross total resection)			0.532
No	53 (52.5%)	6 (40.0%)
Yes	48 (47.5%)	9 (60.0%)
STR (subtotal resection)			1.000
No	64 (63.4%)	10 (66.7%)
Yes	37 (36.6%)	5 (33.3%)

Performance and Results of Logistic Regression Model

Odds Ratio of Different Variables

Preoperative hydrocephalus (odds ratio [OR] = 5.15, p = 0.016), overweight status (OR = 7.15, p < 0.001), fat use in reconstruction (OR = 19.65, p < 0.001), and gross total resection (OR = 3.83, p = 0.001) were all associated with increased odds of CSF leak. In contrast, nasal pack (OR = 0.03, p < 0.001), gasket use (OR = 0.20, p = 0.003), and sellar/suprasellar tumor location (OR = 0.10, p < 0.001) significantly reduced leak risk as shown in [Table 2].

Table 2
Logistic regression analysis results
Dependent variable: CSF leak	Odds ratios	Confidence interval	p-Value
(Intercept)	0.11	0.03–0.40	0.001
Preoperative hydrocephalus	5.15	1.46–21.61	0.016
Overweight	7.15	2.98–18.60	<0.001
Nasal pack	0.03	0.01–0.10	<0.001
Gasket	0.20	0.07–0.56	0.003
Fat	19.65	5.52–82.96	<0.001
Tumor location (sellar/suprasellar vs. other)	0.10	0.03–0.29	<0.001
Gross total resection (GTR vs. other)	3.83	1.77–8.79	0.001
Observations	202
R ² Tjur	0.401

Intraoperative CSF leaks occurred in 20% of the leak group versus 16.8% of the nonleak group (p = 0.721). Among leak cases, high-flow leaks were more common (13.3% vs. 5.95%), while low-flow leaks were less frequent (6.67% vs. 14.3%) compared to nonleak cases (p = 0.367). Reconstruction methods varied between groups but showed no significant difference (p = 0.167). Postoperative CT revealed significantly more and higher-grade pneumocephalus in the leak group (p = 0.004). Excessive fat was more frequent (58.3% vs. 12.7%, p = 0.001), and fat size was more often inadequate (p = 0.048) in the leak group. Hematoma requiring evacuation was slightly more common but not significant (p = 0.492). Postoperative CT revealed factors associated with CSF leaks, including greater extra-cavitary fat graft displacement (p = 0.002) and less favorable solid reconstruction positioning (p = 0.047). Flap adherence was reduced in the leak group (p = 0.060), and air continuity between the nasal and surgical cavities was observed in only one leak case (p = 0.153). Although resection extent varied: standard approach: transsphenoidal transsellar versus Extended approach, the difference was not statistically significant (p = 0.862) as shown in [Table 3].

Table 3
Intraoperative and postoperative findings
	No leak	Leak	p-Value overall	n
	n = 101	n = 15
Presence of Intraoperative CSF leak			0.721	116
No	84 (83.2%)	12 (80.0%)
Yes	17 (16.8%)	3 (20.0%)
If yes, the CSF flow was			0.367	99
High flow		2 (13.3%)
Low flow		1 (6.67%)
None	(100%)	12 (80.0%)
Type of solid reconstruction			0.167	52
Bone	5 (11.9%)	3 (30.0%)
Medpore	34 (81.0%)	6 (60.0%)
Mesh titanium	1 (2.38%)	0 (0.00%)
Omnipore plate	0 (0.00%)	1 (10.0%)
Plastic plate	2 (4.76%)	0 (0.00%)
Presence of pneumocephalus in the postoperative CT scan			0.004	86
Grade 0 (none)	42 (57.5%)	2 (15.4%)
Grade 1 (dots [<1 mm of air])	3 (4.11%)	0 (0.00%)
Grade 2 (bubbles [<1 cm of air])	9 (12.3%)	1 (7.69%)
Grade 3 (1–3 cm air)	4 (5.48%)	4 (30.8%)
Grade 4 (>3 cm of air)	15 (20.5%)	6 (46.2%)
Presence of a big hematoma: a need for evacuation?			0.492	85
No	69 (95.8%)	12 (92.3%)
Yes	3 (4.17%)	1 (7.69%)
Presence of excessive fat			0.001	83
No	62 (87.3%)	5 (41.7%)
Yes	9 (12.7%)	7 (58.3%)
Size Fat-graft?			0.048	82
All fat outside the surgical cavity	0 (0.00%)	1 (7.69%)
Part of the fat is outside	3 (4.35%)	2 (15.4%)
Good size	66 (95.7%)	10 (76.9%)
The location of fat graft?			0.002	77
started to go outside the surgical cavity	9 (13.8%)	4 (33.3%)
Within the sinus	0 (0.00%)	2 (16.7%)
Within the surgical cavity	56 (86.2%)	6 (50.0%)
Sign of air continuity between the surgical cavity and the nasal cavity			0.153	85
No	72 (100%)	12 (92.3%)
Yes	0 (0.00%)	1 (7.69%)
Solid reconstruction, is it in a good location?			0.047	64
No	20 (37.7%)	8 (72.7%)
Yes	33 (62.3%)	3 (27.3%)
Is the location of the septal flap adherent around all bone defects?			0.060	73
No	14 (22.6%)	6 (54.5%)
Yes	48 (77.4%)	5 (45.5%)

Reconstruction materials, including fat, fascia lata, hard grafts, nasoseptal flap, nasal packing, and bone, varied between CSF leak and nonleak groups, but none showed statistically significant differences (all p > 0.05), as shown in [Table 1]. Histopathological tumor types showed no significant association with CSF leak incidence, with pituitary adenoma being the most common in leak cases (26.7%), followed by meningioma (20.0%), chordoma and craniopharyngioma (13.3% each), and all comparisons yielded nonsignificant p-values as shown in [Table 4].

Table 4
Histopathology type and CSF leak
Histopathology diagnosis	No leak	Leak	p-Value overall
	n = 101	n = 15
Atypical cartilaginous neoplasm	1 (0.99%)	0 (0.00%)	1.000
Chordoma	7 (6.93%)	2 (13.3%)	0.328
Craniopharyngioma	6 (5.94%)	2 (13.3%)	0.276
Dermoid cyst	1 (0.99%)	0 (0.00%)	1.000
Epidermoid cyst	0 (0.00%)	1 (6.67%)	0.129
Fibrous dysplasia	2 (1.98%)	0 (0.00%)	1.000
Fungal sinusitis	4 (3.96%)	0 (0.00%)	1.000
Germ cell tumor	1 (0.99%)	0 (0.00%)	1.000
Juvenile nasopharyngeal angiofibroma	3 (2.97%)	0 (0.00%)	1.000
Meningioma	24 (23.8%)	3 (20.0%)	1.000
Meningoencephalocele	3 (2.97%)	1 (6.67%)	0.430
Pilocytic astrocytoma	2 (1.98%)	1 (6.67%)	0.342
Pituitary Adenoma	43 (42.6%)	4 (26.7%)	0.374
Rathke	1 (0.99%)	0 (0.00%)	1.000
Sarcoma	0 (0.00%)	1 (6.67%)	0.129
Spontaneous CSF leak	2 (1.98%)	0 (0.00%)	1.000
Temporal bone fracture	1 (0.99%)	0 (0.00%)	1.000

Logistic Regression Model Performance

The logistic regression model achieved an AUC of 0.847, with high specificity (90.6%) but low sensitivity (21.4%) and a Kappa of 0.1447, indicating slight agreement. McNemar's test (p < 2e-16) showed significant prediction bias; positive predictive value (PPV) was 44.7%, negative predictive value (NPV) 76.5%, with CSF leak prevalence at 26.2%, detection rate 5.6%, and detection prevalence 12.6%. The balanced accuracy of 56.0% reflects poor performance in detecting true CSF leaks due to class imbalance.

Performance and Results of the BERT Model

In terms of performance metrics for the BERT model at the confusion matrix (50:2), it was found to have an AUC of 1.0000 and an accuracy of 0.9833, a PPV of 0.8889, an NPV of 1.0000, a recall of 0.9808, and an F1 score of 0.9657

The average grouped attention weights across twofold of the proposed model's training revealed the following findings: perioperative CT scans received the highest attention weight by far, followed by surgical factors. The other three groups: demographics, suspicion of CSF leak, and pathology-related factors (including histopathology type) had significantly lower attention weights, as shown in [Fig. 3].

Fig. 3 Categorical groups predictors for CSF leak.

Comparison of the Logistic Regression Model versus the BERT Model

The BERT model surpassed the traditional logistic regression model in predicting CSF leaks, achieving an accuracy of 98%, a PPV of 89%, an NPV of 100%, a recall of 98%, and an F1 score of 96%. In contrast, the logistic regression model had an accuracy of 72%, sensitivity of 20%, specificity of 90%, detection rate of 5%, and PPV of 44%, as illustrated in [Table 5] and [Fig. 4].

Table 5
Comparison of the logistic regression model versus the BERT model
Metric	Logistic regression model	BERT model
AUC	0.847	1.0000
Accuracy	0.725	0.9833
95% CI	(0.7189, 0.731)
Sensitivity	0.2143	1.0000
Specificity	0.9060	0.9808
Positive predictive value	0.4471	0.8889
Negative predictive value	0.7649	1.0000
F1 score	0.2897	0.9657

Fig. 4 Comparison of performance of the BERT model versus the logistic regression model in the prediction of CSF leak.

Discussion

In this study, we developed a BERT-based NLP model that transforms qualitative medical data into numerical form for machine learning analysis to predict CSF leaks after skull base surgery. The model significantly outperformed traditional logistic regression, achieving 98% accuracy, 89% precision, 98% recall, and a 96% F1 score, compared to logistic regression's 72% accuracy, 20% sensitivity, 90% specificity, 5% detection rate, and 44% PPV, highlighting the potential of NLP-based models for superior clinical outcome prediction.

Our BERT model analysis revealed that perioperative CT scan features suggestive of CSF leakage carried the highest attention weight for prediction, followed by surgical factors. These findings align with a large multicenter study that evaluated three predictive models (logistic regression, decision tree, and neural network) for CSF leak prediction. Their research identified intraoperative CSF leakage as the most significant risk factor for CSF rhinorrhea, with elevated BMI and revision surgery also contributing significantly in transsphenoidal approaches.[10] The consistency between their conclusions and our results reinforces the superior performance of machine learning methodologies compared to traditional statistical approaches in predicting this critical surgical complication. The BERT model processed the full dataset without manual feature selection, revealing that perioperative CT findings and intraoperative factors were stronger predictors of CSF leak than preoperative variables. Including early postoperative CT data allowed the model to identify key indicators of CSF leak at the time of clinical suspicion, highlighting the value of focusing on perioperative and surgical data for more precise prediction models. A neural network study for predicting intraoperative CSF leaks during pituitary surgery demonstrated 88% classification accuracy (AUC: 0.84), outperforming conventional statistical methods, which identified no significant risk factors. The neural network achieved high sensitivity (83%) and specificity (89%), with high suprasellar Hardy grade, prior surgery, and older age as the primary predictive factors.[11] A systematic review of seven AI studies for CSF leak prediction in pituitary surgery found performance metrics ranging from 0.73 to 0.98 (AUC) and 0.70 to 0.97 (accuracy). Random Forest was the most frequently used algorithm, with k-fold cross-validation as the predominant validation method. Notably, deep learning models demonstrated significantly higher pooled sensitivity than machine learning models (99% vs. 86.2%, p < 0.01), while specificity remained comparable between approaches (90.6% vs. 92.1%, p = 0.87).[12] Our 12% CSF leak rate aligns with previous studies; elevated BMI was a significant risk factor (OR = 7.15), consistent with Ivan et al,[13] while Fraser et al,[14] specifically identified BMI > 25 kg/m² and posterior fossa tumors as predictors of higher postoperative leak rates in their 615-patient EEA study. While our data showed no significant correlation between histopathological types or tumor location and CSF leak risk, Zhang et al,[15] found clival tumors associated with higher leak rates in their 100 extra-sellar tumor cases. Our results identified preoperative hydrocephalus as a significant predictor (OR: 5.15, CI: 1.46–21.61, p = 0.016), consistent with Patel et al,[16] who found only BMI and hydrocephalus as significant predictors in their 806-case analysis. Our analysis found no statistical significance for various reconstruction techniques, including solid reconstruction. A multicenter analysis of 706 patients reported a 7.8% postoperative CSF leak rate, indicating that rigid reconstruction and older age were protective factors against postoperative sellar leaks, whereas BMI was not linked to increased risk[17]—contrary to our findings. Kuan et al,[18] reported that only intraoperative CSF leak was associated with recurrence in their 300 consecutive repair cases. A large systematic review and meta-analysis of risk factors for postoperative CSF leakage after endonasal endoscopic skull-base surgery reported that overweight and obesity were associated with an OR of 1.88 (95% CI, 1.35 – 2.63; p < 0.01), a result that aligns with our findings. Regarding reconstruction, 16 studies (total n = 3,579) assessed pedicled vascularized flaps, showing a pooled OR of 0.62 for CSF leakage compared with free grafts.[19] In contrast, our findings demonstrated an unexpectedly high OR of 19.56 for CSF leak when fat grafts were used. This finding diverges from prior literature, in a systematic review of fat graft use in transsphenoidal surgery, postoperative CSF leak requiring intervention was significantly lower in the fat-in-sphenoid-sinus repair group (4.4 %) than in multilayer (20.3 %) or no-repair groups (12.6 %; p < 0.01).[20] One plausible explanation for this discrepancy is that fat grafts in our series were preferentially applied to high-flow defects, which inherently carry a greater leak risk.

Our study introduces a novel application of an NLP model for CSF leak prediction in skull base surgery. We believe that the BERT model holds considerable promise for advancing both clinical practice and research. Its implementation in future studies with larger sample sizes may lead to improved clinical decision-making, as the model has demonstrated the ability to identify patterns and associations that traditional statistical methods may overlook. Furthermore, the application of this model in research settings could offer significant advantages in handling large and complex datasets, particularly those involving qualitative variables. Unlike conventional approaches, BERT's key advantage lies in its ability to directly process qualitative data without numerical conversion, preserving contextual information while streamlining analysis. Despite promising results, the exceptionally high-performance metrics suggest potential overfitting or data leakage. As a single-center study, our findings face generalizability limitations. Future validation should include a critical review of CT features, rigorous verification of train/test methodology, validation on independent datasets, and optimization of model complexity. We recommend multicenter implementation of BERT for CSF leak analysis to increase sample size and potentially identify risk factors undetectable through conventional statistical approaches.

Conclusion

BERT NLP model outperforms traditional logistic regression in predicting CSF leaks after endoscopic skull base surgery, demonstrating superior accuracy with qualitative clinical data, enhancing risk stratification and decision-making.

References

References
1 Conger A, Zhao F, Wang X. et al. Evolution of the graded repair of CSF leaks and skull base defects in endonasal endoscopic tumor surgery: trends in repair failure and meningitis rates in 509 patients. J Neurosurg 2019; 130 (03) 861-875
2 Clavenna MJ, Turner JH, Chandra RK. Pedicled flaps in endoscopic skull base reconstruction: review of current techniques. Curr Opin Otolaryngol Head Neck Surg 2015; 23 (01) 71-77
3 Hannan CJ, Almhanedi H, Al-Mahfoudh R, Bhojak M, Looby S, Javadpour M. Predicting post-operative cerebrospinal fluid (CSF) leak following endoscopic transnasal pituitary and anterior skull base surgery: a multivariate analysis. Acta Neurochir (Wien) 2020; 162 (06) 1309-1315
4 Karnezis TT, Baker AB, Soler ZM. et al. Factors impacting cerebrospinal fluid leak rates in endoscopic sellar surgery. Int Forum Allergy Rhinol 2016; 6 (11) 1117-1125
5 Khan DZ, Ali AMS, Koh CH. et al. Skull base repair following endonasal pituitary and skull base tumour resection: a systematic review. Pituitary 2021; 24 (05) 698-713
6 Xue H, Wang X, Yang Z, Bi Z, Liu P. Risk factors and outcomes of cerebrospinal fluid leak related to endoscopic pituitary adenoma surgery. Br J Neurosurg 2020; 34 (04) 447-452
7 Ein L, Sargi Z, Nicolli EA. Update on anterior skull base reconstruction. Curr Opin Otolaryngol Head Neck Surg 2019; 27 (05) 426-430
8 Tariciotti L, Fiore G, Carrabba G. et al. A supervised machine-learning algorithm predicts intraoperative CSF leak in endoscopic transsphenoidal surgery for pituitary adenomas. J Neurosurg Sci 2023; 67 (04) 393-407
9 Su P, Vijay-Shanker K. Investigation of improving the pre-training and fine-tuning of BERT model for biomedical relation extraction. BMC Bioinformatics 2022; 23 (01) 120
10 CRANIAL Consortium. Machine learning driven prediction of cerebrospinal fluid rhinorrhoea following endonasal skull base surgery: a multicentre prospective observational study. Front Oncol 2023; 13: 1046519
11 Staartjes VE, Zattra CM, Akeret K. et al. Neural network-based identification of patients at high risk for intraoperative cerebrospinal fluid leaks in endoscopic pituitary surgery. J Neurosurg 2019; 133 (02) 329-335
12 Hajikarimloo B, Sabbagh Alvani M, Koohfar A. et al. Clinical application of artificial intelligence in prediction of intraoperative cerebrospinal fluid leakage in pituitary surgery: a systematic review and meta-analysis. World Neurosurg 2024; 191: 303-313.e1
13 Ivan ME, Iorgulescu JB, El-Sayed I. et al. Risk factors for postoperative cerebrospinal fluid leak and meningitis after expanded endoscopic endonasal surgery. J Clin Neurosci 2015; 22 (01) 48-54
14 Fraser S, Gardner PA, Koutourousiou M. et al. Risk factors associated with postoperative cerebrospinal fluid leak after endoscopic endonasal skull base surgery. J Neurosurg 2018; 128 (04) 1066-1071
15 Zhang H, Peeters S, Vengorivich G. et al. Risk factors associated with postoperative CSF leak in extrasellar tumors. J Neurol Surg B Skull Base 2023; 85 (01) 15-20
16 Patel PN, Stafford AM, Patrinely JR. et al. Risk factors for intraoperative and postoperative cerebrospinal fluid leaks in endoscopic transsphenoidal sellar surgery. Otolaryngol Head Neck Surg 2018; 158 (05) 952-960
17 Abiri A, Bitner BF, Nguyen TV. et al. Clinical and technical factors in endoscopic skull base surgery associated with reconstructive success. Rhinology 2024; 62 (03) 330-341
18 Kuan EC, Yoo F, Patel PB, Su BM, Bergsneider M, Wang MB. An algorithm for sellar reconstruction following the endoscopic endonasal approach: a review of 300 consecutive cases. J Neurol Surg B Skull Base 2018; 79 (02) 177-183
19 Kim JS, Hong SD. Risk factors for postoperative CSF leakage after endonasal endoscopic skull base surgery: a meta-analysis and systematic review. Rhinology 2021; 59 (01) 10-20
20 Baig Mirza A, Boardman T, Okasha M. et al; Cohort Study and Systematic Review. Fat in the fossa and the sphenoid sinus: a simple and effective solution to CSF leaks in transsphenoidal surgery. Cohort study and systematic review. J Neurol Surg B Skull Base 2022; 84 (02) 143-156

Figures

Fig. 1 Methodological overview of the study process.

Fig. 2 Overview of the classification pipeline based on BERT for CSF leak detection.

Fig. 3 Categorical groups predictors for CSF leak.

Fig. 4 Comparison of performance of the BERT model versus the logistic regression model in the prediction of CSF leak.

Supplementary Material

Supplementary Material (PDF)