Subscribe to RSS

DOI: 10.1055/s-0046-1816061
From Clinic to Community: An Interpretable Artificial Intelligence Framework for Enamel Caries Detection to Support Public Health Dentistry
Authors
Abstract
Objectives
Dental enamel caries is among the most prevalent oral diseases worldwide. Early detection is essential, as incipient lesions can be managed with noninvasive therapies. Conventional methods, such as visual-tactile inspection and radiography, remain limited by examiner variability and reduced sensitivity for early lesions. This study aimed to develop an efficient and interpretable deep learning framework for automated classification of enamel caries at multiple severity levels, while ensuring clinical applicability and transparency.
Materials and Methods
A dataset of 2,000 clinical dental images categorized as advanced enamel caries, early-stage enamel caries, and no enamel caries was curated and expanded to 12,000 images using preprocessing and augmentation. Two transfer learning models, Modified EfficientNetB0 and Modified MobileNetV2, were trained individually, then combined using an attention-guided fusion mechanism. Gradient-weighted Class Activation Mapping (Grad-CAM) was applied to provide visual interpretability.
Statistical Analysis
Performance was evaluated using accuracy, precision, sensitivity, specificity, F1 score, and ROC AUC. Comparative analysis was performed across models and classifiers, with inference time assessed for clinical feasibility.
Results
The Modified EfficientNetB0 and MobileNetV2 models achieved accuracies of 96.33 and 96.25%, respectively. The fused model with Random Forest demonstrated superior performance, achieving 96.92% accuracy, F1 score of 96.92, and an ROC AUC of 99.34. Misclassifications were limited to adjacent disease stages, with no severe diagnostic errors.
Conclusion
The proposed framework provides accurate, interpretable, and efficient enamel caries detection. Its low inference time supports real-time clinical use, enhancing diagnostic confidence and enabling early, minimally invasive interventions. Future research should focus on multicenter validation and multimodal datasets to improve generalizability.
Keywords
enamel caries classification - attention-guided feature fusion - Modified EfficientNetB0 - Modified MobileNetV2 - grad-CAM - dental public healthIntroduction
Dental caries is one of the most widespread oral health problems worldwide. At the earliest stage of dental enamel caries, damage is restricted to the enamel surface. This stage is critical; at this point, the process can be reversed through preventive strategies such as dietary modification.[1] Early intervention prevents the need for restorative procedures, reduces treatment costs, and preserves natural tooth structure. The initial clinical sign of enamel caries is a small white spot that indicates the demineralization of the enamel.[2] Without early management, the lesion progresses through the enamel into the dentin, leading to pain, pulp involvement, and possible tooth loss.[2] Therefore, accurate detection of enamel caries at an early stage is essential for prevention and minimally invasive dentistry. Traditional diagnostic methods, such as visual and tactile checkups, are often guided by the International Caries Detection and Assessment System (ICDAS), as well as radiographic techniques such as bitewing or panoramic imaging.[3] Although widely used, these approaches have limitations. Early enamel lesions are often radiographically invisible, and visual inspection highly depends on the clinician's expertise.[4] [5] Dental caries persists as a major global public health crisis, exerting heavy social and financial burdens despite its preventability.[6] [7] While the caries process was once viewed simply as cavitation, it is now recognized as a dynamic continuum of demineralization and remineralization, influenced by ecological shifts in biofilm composition.[8] Recently, artificial intelligence (AI) has appeared as a capable tool for dental diagnosis. Deep learning models can analyze radiographs and clinical images to identify subtle features that may be invisible to the human eye.[9] [10] [11] [12] Generative adversarial networks (GANs) have also been employed to synthesize realistic dental images, thereby augmenting limited datasets and improving the training of deep learning models.[13] Emerging AI tools, particularly those based on deep learning, offer transformative potential: image-based detection of enamel lesions, personalized caries risk prediction, and even virtual training to enhance clinical decision-making.[7] [14] [15] Kühnisch et al[16] applied a CNN model to 2,417 standardized single-tooth photographs and reported 92.5% accuracy, a sensitivity of 89.6%, specificity of 94.3%, and an AUC of 96. However, inference speed was only qualitatively discussed, with no quantitative reporting of per-image runtime. Zhang et al[17] employed a single-shot multi-box detector (SSD)-based ConvNet model trained on 3,932 oral images and achieved an AUC of 85.65, with image-level sensitivity of 81.90% but a substantial drop in localization sensitivity to 64.6%, highlighting the difficulty of precise lesion detection compared with general classification. Generalizability concerns were raised by Frenkel et al.[18] They externally validated an AI model on 718 internet-sourced images, achieved 92.0% detection accuracy and a 0.702 to 0.909 classification AUC, reflecting reduced performance in real-world heterogeneous images. Beyond photographs, deep learning has also been applied to radiographs, though most studies focus on more advanced regions rather than early enamel changes. Li et al[19] utilized a modified deep learning model on 4,129 periapical radiographs, achieving an F1 score of 82.9 for caries and 82.8 for periapical periodontitis, though the model requires further validation. Estai et al[20] applied Faster R-CNN and Inception-ResNet-v2 on 2,468 bitewing radiographs, achieving an F1 score of 87 and 89 of recall and need for further validation. Tan et al[21] used CNNs on quantitative light-induced fluorescence (QLF) 9,478 images obtained via handheld devices, achieved an 88% AUC but noted a limited sensitivity of 64% for caries staging. Chaves et al[22] employed Mask R-CNN with a backbone of Swin transformer trained on 425 bitewings, achieving an F1 score of 71.9 for secondary caries, though the model requires further clinical validation. The explainable AI addresses the “black box” limitation of deep learning systems. Oztekin et al[23] applied Grad-CAM visualizations to panoramic radiographs, showing that heatmaps highlight decision-relevant regions to enhance clinical trust and utilize EfficientNetB0, DenseNet121, and ResNet50 deep learning models trained on 562 panoramic images. ResNet50 model achieved 92.00% label-wise accuracy and a 91.61% F1 score, accuracy limited to further internal validation. A summary of the literature work is shown in [Table 1].
|
Author |
Dataset/Modality |
Methodology |
Accuracy/Performance |
Limitations |
|---|---|---|---|---|
|
Kühnisch et al[11] |
2,417 single-tooth photographs |
CNN |
92.5% |
No quantitative reporting of inference time or hardware efficiency |
|
Zhang et al[12] |
3,932 oral photographs |
SSD-based ConvNet |
AUC: 85.65% Sensitivity: 81.90% |
Localization sensitivity dropped to 64.6%; false-positive predictions |
|
Frenkel et al[13] |
718 internet-sourced images |
External validation of an AI model |
92.0% |
Reduced performance on heterogeneous images; partially correct segmentation (44.1%) |
|
Li et al[14] |
4,129 periapical radiographs |
Modified deep learning model |
F1 score: 82.9% |
Requires further validation on diverse clinical data |
|
Estai et al[15] |
2,468 bitewing radiographs |
Faster R-CNN and Inception-ResNet-v2 |
F1 score: 87% |
Model requires further clinical validation |
|
Tan et al[16] |
9,478 QLF images (handheld device) |
CNN |
AUC: 88% Validation sensitivity: 64% |
Limited sensitivity for early caries staging |
|
Chaves et al[17] |
425 bitewing radiographs |
Mask R-CNN (Swin Transformer) |
Secondary caries F1 score: 71.9% |
Requires further clinical validation for primary caries |
|
Öztekin et al[10] |
562 panoramic radiographs |
ResNet-50 (with Grad-CAM) |
Accuracy: 92.00% |
Model requires further external validation |
To address these limitations, this study introduces a lightweight attention-guided fusion mechanism that combines the complementary strengths of the Modified EfficientNetB0 and MobileNetV2 feature representations. Although both models are well-established architectures, our contribution lies in the way their features are combined: a simple attention-based weighting approach that selectively highlights the most informative features from each network without increasing the overall model size. This results in a more discriminative and efficient representation compared with using either model alone. In addition, Grad-CAM is incorporated to provide clear visual explanations, improving the interpretability and clinical trust of the system. By evaluating the model using multiple performance metrics and inference time, the proposed framework aims to deliver a precise, efficient, and clinically practical tool for early enamel caries classification.
Materials and Methods
This study developed an automated diagnostic framework for the classification of dental enamel caries through a multistage computational pipeline, as illustrated in [Fig. 1]. The workflow commenced with the curation of a clinical image dataset comprising 2,000 low-resolution dental images, categorized into advanced enamel caries, early-stage enamel caries, and no enamel caries. To enhance feature visibility and ensure efficiency, the dataset underwent a rigorous preprocessing stage employing histogram equalization, CLAHE, and adaptive thresholding, followed by an augmentation protocol using the Albumentations Library to increase data diversity and prevent overfitting. Two deep learning architectures, Modified EfficientNet-B0 and MobileNetV2, were developed via transfer learning. Each model featured a custom classification head with a 512-unit dense layer for deep feature extraction and was trained in a two-stage process involving initial frozen backbone training and subsequent fine-tuning. The discriminative features extracted from both models were then synergistically integrated using an attention-guided fusion mechanism, which adaptively weighed the most salient features from each network to form a unified, highly discriminative representation without increasing dimensionality. The final fused features were classified using a suite of machine learning classifiers. For model evaluation, only the original test images were used without any augmentation to ensure unbiased performance assessment. To ensure clinical transparency and interpretability, the decision-making process of the models was elucidated using gradient-weighted class activation mapping (Grad-CAM), which generated visual heatmaps highlighting the specific image regions most influential for the classification of each caries label. The selection of reporting checklists was informed by a recent integrative review consolidating major AI reporting frameworks, including CONSORT-AI, TRIPOD-AI, PRISMA-AI, CLAIM, STARD-AI, and DECIDE-AI, with specific relevance to dental research.[24] Based on this guidance, the present study adheres to the Checklist for Artificial Intelligence in Medical Imaging (CLAIM) 2024[25] and the STARD 2015 guidelines[26] to ensure methodological rigor, transparency, and reproducibility.


Dataset Collection
In this experiment, a publicly available dataset of enamel caries is utilized. The images were acquired from several medical clinics located in Rajshahi and Dhaka, Bangladesh.[27] These images represent a heterogeneous population of patients to ensure the variability in dental characteristics and enhance the generalizability of the dataset. The dataset is categorized into three categories: advanced enamel caries, early-stage enamel caries, and no enamel caries. A total of 2,000 original images in JPG format are described in [Table 2]. It is a valuable resource for the development and evaluation of an automated diagnostic framework in dental image analysis, and it does not require ethical approval since it is publicly available for research purposes.[27] The dataset was divided into two main partitions: a training set, a validation set, and a test set. A total of 1,200 original images were reserved as the test set, and the remaining 800 images were used for training and internal validation. From the 800 training images, 20% were used as the validation set to monitor model performance during training.
|
Labels |
Number of images (jpg) |
|---|---|
|
Advanced enamel caries |
800 |
|
Early-stage enamel caries |
800 |
|
No enamel caries |
400 |
|
Total |
2000 |
Inclusion and Exclusion Criteria
To ensure consistency, only images that met clear clinical and visual standards were included.
Inclusion Criteria
-
Intraoral photographs showing teeth and enamel surfaces clearly.
-
Images belong to one of the three predefined categories.
-
JPG images of 224 × 224 × 3 resolution.
-
Fully anonymized images without any identifiable information.
Exclusion Criteria
-
Blurred, dark, overexposed, or low-quality images.
-
Images containing restorations, orthodontic wires, or artifacts covering enamel surfaces.
-
Damaged or incomplete image files.
-
Duplicate images or images appearing across multiple data subsets.
Dataset Preprocessing Techniques
In this study, image preprocessing techniques were applied to enhance the visibility of dental enamel caries and to reduce variability caused by illumination differences. Three enhancement methods were employed in this study. First, histogram equalization was utilized to improve global contrast, to redistribute pixel intensities, and to enhance the visibility of enamel defects.[28] Second, CLAHE was applied to provide localized contrast enhancement while preventing noise over-amplification, which is advantageous in medical images.[29] Finally, adaptive thresholding was employed to segment structural regions to compute the pixel-level thresholds adaptively over localized areas, ensuring reliability against nonuniform illumination.[30] The effects of images after the preprocessing techniques applied are illustrated in [Fig. 2].


Dataset Augmentation
The augmentation pipeline was implemented using the Albumentations Library,[31] which provides efficient and diverse image transformations widely used in medical image analysis. The applied transformations included geometric operations (horizontal and vertical flipping, random rotation), intensity-based adjustments brightness and contrast modification, gamma correction, histogram equalization, and CLAHE. Additionally, morphological operations and adaptive thresholding were employed to enhance structural features,[32] and random cropping was performed, followed by resizing to a fixed resolution of 224 × 224 pixels to maintain uniformity. The summary of the augmentation process is presented in [Table 3].
Note: Test set was separated from the original dataset without augmentation, while the training and validation sets were created using augmentation.
Modified EfficientNet-B0
For the classification of dental enamel caries, transfer learning-based strategy was developed by using EfficientNetB0 as the feature extractor. The pretrained EfficientNetB0 network was adopted as the base architecture with pretrain weights, with its convolutional backbone frozen during the initial training stage. A custom classification head was then appended to adapt the model for the specific multi-class dental caries classification task. The architecture consisted of the following components: the EfficientNetB0 convolutional base, excluding its original fully connected layers, a global average pooling layer to reduce feature dimensionality,[33] a dropout regularization, and a custom dense feature extraction layer with 512 units, ReLU activation function, and then a batch normalization layer was applied to improve generalization and stabilize training.[34] Finally, a fully connected SoftMax layer was used for multiclass probability prediction.[35] [Table 4] summarizes the architecture of the proposed model along with its associated hyperparameters.
During Stage 1, freezing, the base model layers and custom classification head are trained to learn high-level discriminative features. In Stage 2, the last 60 layers of EfficientNetB0 were unfrozen for fine-tuning while maintaining a lower learning rate for domain-specific feature adaptation.[36] The training process was monitored using accuracy and loss curves, which are presented in [Fig. 3]. These curves demonstrate the convergence behavior of the model and the effectiveness of the staged training procedure in reducing overfitting.


Modified MobileNetV2 Architecture
A second deep learning model was developed using MobileNetV2 as the feature extractor. Similar to the EfficientNetB0-based pipeline, applied a transfer learning strategy with a two-stage training process, as shown in [Table 5]. The pretrained MobileNetV2 weights were used as the convolutional backbone, with a custom classification head designed to enhance discriminative feature learning for enamel caries detection. The model consisted of the following major components: the MobileNetV2 base, excluding its top fully connected layers, a global average pooling layer to aggregate features, dropout layers for regularization,[37] a custom dense feature extraction layer with 512 units, batch normalization, ReLU activation, and a final SoftMax layer to use for multi-class classification.[38] Compared with the standard MobileNetV2 classifier head, the following modifications were introduced:
-
Inserted a custom dense feature extraction layer (512 units) before the classification to improve feature representation.
-
Applied double dropout regularization of 0.3 and 0.4 at different stages to reduce overfitting.[39]
-
Added batch normalization with ReLU block after the feature extraction layer to stabilize training.[40]
-
Frozen backbone training, Stage 1, and then fine-tuning of the last 60 layers, Stage 2.
-
Applied advanced data augmentation, such as rotation, shift, zoom, and flipping, to improve model generalization.[41]
The model's training and validation performance are shown in [Fig. 4].


Feature Extraction
Deep feature extraction was conducted using both MobileNetV2 and the Modified EfficientNet-B0 models by isolating the 512-dimensional dense layer before classification. For MobileNetV2, the feature train set shape is 9,600 × 512, the validation set is 160 × 512, and the test set is 1,200 × 512. Similarly, for EfficientNetB0, feature dimensions are 9,600 × 512 for training, 160 × 512 for the validation set, and 1,200 × 512 for the test sets. All features were generated with shuffle set to False to maintain alignment between features and labels.
Attention-Guided Feature Fusion
To integrate the features extracted from the Modified MobileNetV2 and EfficientNetB0 models, we applied a simple attention-guided fusion mechanism. In this approach, each feature vector is given a small learnable weight that indicates how useful that feature set is for the final prediction. The mechanism assigns slightly higher weight to the model that provides more relevant information for a particular image, allowing the combined feature representation to benefit from the strengths of both networks.
Unlike simple concatenation or averaging, this fusion strategy selectively highlights the more informative features while reducing overlapping or redundant information. Importantly, the final fused feature size remains the same 512 dimensions, so no extra computational cost is added. This lightweight attention-based fusion helps create a more balanced and discriminative representation, which improves the overall reliability of the enamel caries classification.
Explainable AI with Grad-CAM
In this study, Grad-CAM was applied to images from three categories: advanced, early, and no caries. For each image, a heatmap was superimposed onto the original radiograph, producing an intuitive visualization of the model's focus. For example, in cases of early-stage enamel caries, the Grad-CAM visualization highlighted localized regions of enamel discoloration, whereas in advanced caries, larger lesion areas were emphasized, as shown in [Fig. 5]. In the absence of caries, the model predominantly focused on intact enamel structures.


Results
This experiment was conducted on the Kaggle cloud-based platform. The computational workload was executed on a Kaggle-provided environment equipped with a high-performance NVIDIA P100 GPU accelerator, 32 GB of RAM, and an Intel(R) Core (TM) i7–1065G7 CPU @ 1.30- to 1.50-GHz processor. This reliable hardware configuration for evaluating deep learning models significantly reduces the time required for processing large datasets and complex computations. All models and analyses were implemented using the Python programming language.
Performance Evaluation Parameters
The quantitative evaluation of the framework performance conducted to use these metrics, such as accuracy (Acc), precision, recall, F1 score, and AUC-ROC, was derived from the constituent elements of the confusion matrix, such as true positives, false positives, true negatives, and false negatives. The formal definitions and mathematical formulations for each of these performance parameters are provided in [Table 6].
(TP: True Positives, TN: True Negatives, FP: False Positives, FN: False Negatives)
Class-Wise Classification
The diagnostic performance of the two modified deep learning architectures, EfficientNetB0 and MobileNetV2, was evaluated on a test set of 1,200 dental images. Both models demonstrated exceptional and highly comparable efficacy in the automated classification of enamel caries. The class-wise breakdown of precision, recall, and F1 score, along with aggregate metrics, is presented in [Table 7].
As shown in [Table 7], both Modified EfficientNetB0 and MobileNetV2 (96.33 and 96.25%, respectively) achieved excellent accuracy. With the highest F1 scores in the no caries class and reliable performance in early-stage lesions, both models demonstrated balanced, unbiased, and highly effective capabilities for automated enamel caries detection and classification.
As presented in [Fig. 6], both Modified MobileNetV2 and EfficientNetB0 models achieved high diagnostic accuracy. MobileNetV2 correctly classified 381 advanced, 380 early-stage, and 394 no-caries cases, while EfficientNetB0 showed slightly superior performance for advanced lesions, with comparable results across other classes, confirming reliability in enamel caries detection.


Deep Feature Extraction Results
The performance of various classifiers utilizing deep features extracted from the Modified EfficientNetB0 and MobileNetV2 networks is summarized in [Table 8].
As evidenced in [Table 8], the KNN Medium classifier with Modified EfficientNetB0 features and the KNN Fine and AdaBoost models with Modified MobileNetV2 features achieved the highest classification accuracy. All evaluated classifiers attained an F1 score greater than 95%, indicating reliable and balanced precision and recall characteristics. The ROC AUC values consistently exceeded 97%, confirming excellent model discriminative ability. The AdaBoost classifier achieved high accuracy, correctly classifying 382 advanced enamel caries, 389 early-stage cases, and 390 no-caries cases, with all errors conservatively assigned to adjacent categories. Similarly, the KNN Medium model correctly identified 384 advanced, 383 early-stage, and 393 no-caries cases, with minimal misclassifications mainly between adjacent stages. Both models demonstrated strong reliability, with no severe diagnostic errors, supporting their suitability for early enamel caries detection in clinical settings, as shown in [Fig. 7].


Attention-Guided Feature Fusion Results
The implementation of an attention-guided feature fusion mechanism, integrating deep features from the Modified EfficientNetB0 and MobileNetV2 architectures, yielded a significant enhancement in diagnostic performance. The complete results of various classifiers operating on these fused features are presented in [Table 9].
As shown in [Table 9], the attention-guided fusion approach significantly enhanced diagnostic performance, with the random forest classifier achieving an accuracy of 96.92%, an F1 score of 0.9692, and a ROC AUC of 0.9934. This improvement was obtained without added computational cost, as the fused feature vector remained at 512 dimensions. The confusion matrix in [Fig. 8] further validated the model's robust classification across all enamel caries categories, confirming its reliability and clinical applicability.


As illustrated in [Fig. 8], the model demonstrated exceptional proficiency. For advanced enamel caries, 383 cases were correctly identified, with the remaining 17 misclassifications all conservatively predicted as the less severe early-stage caries. In classifying early-stage enamel caries, the model achieved 385 correct predictions, with errors distributed as 8 misclassifications as advanced and 7 as no caries. The performance for the no enamel caries class was near perfect, with 395 correct identifications; all five errors were predicted as the most conservative early-stage condition.
Statistical Significance Analysis
To quantitatively validate that the performance improvement of the proposed attention-guided feature fusion model was statistically significant and not due to random chance, two rigorous statistical tests were employed: the paired Student's t-test and McNemar's test. These tests compare the proposed model against the best-performing baseline models, Modified EfficientNetB0 and MobileNetV2, to ascertain the significance of the observed differences in classification outcomes.
T-Test Results
A t-test was conducted to compare the accuracy distributions obtained from cross-validation (10-fold) of the proposed model and the baseline models. The null hypothesis (H0) stated that there was no significant difference in the mean accuracy between the models,[42] while the alternative hypothesis (H1) stated that a significant difference existed.[42] Results of the paired Student's t-test for model accuracy is shown in [Table 10].
As shown in [Table 10], the comparisons yielded p-values well below the significance level of α = 0.05. This provides strong evidence to reject the null hypothesis, confirming that the difference in mean accuracy between the proposed attention-guided fusion model and each baseline model is statistically significant.
McNemar's Test Results
McNemar's test was performed on the prediction outcomes of the proposed model and the best baseline MobileNetV2 with AdaBoost to evaluate the significance of the disagreement in their classifications. This test is particularly suited for paired nominal data and is based on a chi-squared (χ 2) statistic derived from the counts of discordant pairs. The contingency ([Table 11]) for the test is as follows.
The resulting p-value is 0.0231. A statistically significant difference in error rates between the two models, p < 0.05, led to the rejection of the null hypothesis of marginal homogeneity. This result suggests that the performance disparity is unlikely to have occurred by chance. The McNemar test revealed a markedly greater number of discordant pairs,[43] where the proposed model was correct and the baseline model incorrect (n = 45) than the reverse (n = 25). This asymmetry in misclassifications demonstrates that the proposed model's improvement is both systematic and statistically significant.
Discussion
The present study introduced an attention-guided fusion framework that integrates features from Modified EfficientNetB0 and MobileNetV2 models to improve enamel caries detection. The framework demonstrated superior diagnostic performance, with an overall accuracy of 96.92% and an ROC AUC of 0.9934. These results are highly competitive compared with previously published work on enamel and dental caries detection, as shown in [Table 12]. Kühnisch et al[16] applied MobileNetV2 on intraoral images and reported an accuracy of 92.5% with an AUC of 0.96, but their study did not provide model efficiency details. Zhang et al[17] used SSD on consumer-grade images, achieving a lower performance AUC of 0.856 with a significant reduction in localization sensitivity of 64.6%. Frenkel et al[18] validated an AI-based photographic system and achieved an accuracy of 92%, though class-wise AUC varied widely, from 0.70 to 0.91, showing inconsistency across lesion types. Similarly, Estai et al[20] and Chaves et al[22] achieved 87% acc on radiographs, but their focus was mainly on dentinal or secondary dental caries ([Fig. 9]).
|
Author |
Dataset |
Methodology |
Accuracy |
Precision |
Recall |
F1 score |
AUC |
Prediction time(s) |
|---|---|---|---|---|---|---|---|---|
|
Kühnisch et al[11] |
2,417 intraoral photos |
MobileNetV2 (CNN) |
92.5% |
– |
89.6% |
– |
0.96 |
– |
|
Zhang et al[12] |
3,932 consumer-grade photos |
SSD-based ConvNet |
– |
– |
81.90% |
– |
0.856 |
– |
|
Frenkel et al[13] |
718 web images |
AI-based model |
92.0% |
– |
– |
– |
0.702–0.909 |
– |
|
Li et al[14] |
4,129 periapical radiographs |
Modified CNN |
– |
0.82 |
0.83 |
0.829 |
0.88 |
– |
|
Estai et al[15] |
2,468 bitewing radiographs |
Inception-ResNet-v2 |
0.87 |
0.86 |
0.89 |
0.87 |
– |
– |
|
Chaves et al[17] |
425 bitewing radiographs |
Mask R-CNN (Swin-T) |
– |
– |
– |
0.689 |
– |
– |
|
Proposed study |
Caries-Spectra 2000 low-resolution enamel caries images |
Attention-guided fusion |
96.9% |
96.9% |
96.9% |
96.9% |
99.34% |
0.0149 |


Beyond these benchmarks, recent literature highlights the broader clinical and public health potential of deep learning.[44] Deep learning models have demonstrated strong accuracy in detecting dental caries from images, such as smartphone images, with promising sensitivity and specificity for cavitated lesions.[15] Furthermore, machine learning algorithms applied to survey and demographic data have reliably predicted individuals and adolescents at high risk of developing caries. This enables targeted early interventions and optimizes resource allocation in public health systems.[45] [46]
Policy-maker benefits:
-
Enables data-driven resource distribution to predict high-risk populations and focus on preventive care where it is most needed.
-
Supports evidence-based policymaking, informs the national strategies for early detection programs, and preventive dentistry.
-
Advances a paradigm shift in dental practice models, moving from reactive drill and fill approaches toward proactive, risk-based, and minimally invasive care protocols.
Clinical Significance and Implications for Practice
The clinical significance of this study lies in its ability to detect early enamel caries with high accuracy at the initial stage. By integrating explainable AI, our framework shows visual maps of the regions that influence the prediction, making the system transparent and easy to trust. The model is efficient and accurate, with low prediction time. This makes it suitable for real-time chairside use where dentists need quick support without heavy computing resources. These features increase diagnostic confidence for general practitioners, promote early preventive interventions, and extend access to caries detection in resource-limited settings.
Conclusion
This study proposed and validated a novel attention-guided feature fusion framework for the automated classification of enamel caries. By synergistically integrating deep features from Modified EfficientNetB0 and MobileNetV2 architectures, the model achieved a superior diagnostic performance, with a peak accuracy of 96.92% and an ROC AUC of 0.9934. Statistical significance testing confirmed that this improvement over strong baseline models was not due to random chance. Embedding deep learning in dental enamel caries management has immense potential to improve diagnostic accuracy, facilitate early intervention, and reshape public health policy through predictive analytics and targeted care delivery. Supporting early detection and minimally invasive treatment strategies, such as frameworks, can help reduce the global burden of caries while providing policymakers with efficient evidence to guide preventive health initiatives.
Limitations
Despite the promising results, this study has several limitations. The primary limitation is the constraint imposed by the current availability of public datasets. There is a notable absence of a large, publicly available, and expertly annotated dataset for enamel caries that includes the precise clinical classifications, such as advanced, early stage, and no caries, used in this work. Consequently, the proposed framework was trained and validated on a dataset of clinical images. Furthermore, the model's applicability is inherently limited to the detection of visible, surface-level enamel changes, and cannot be generalized to subsurface or proximal lesions typically diagnosed through radiographic evaluation. A significant methodological limitation is the lack of external validation on an independent, multicenter dataset, which is currently not available for the specific class definitions used in this study. This absence affects the assessment of the model's generalizability and real-world reliability.
Future Work
Future research directions will focus on addressing these limitations and expanding the model's clinical utility. The foremost priority is to perform a rigorous external validation of the model. With the current unavailability of a publicly accessible dataset with compatible clinical classifications, a key immediate step will be to prospectively collect a new, multicenter clinical image dataset to serve as an external test set. This will allow for a thorough assessment of the model's reliability and generalizability beyond the internal validation performed in this study. Concurrently, we will pursue collaborations with dental institutions to assemble a larger and more diverse multimodal dataset, encompassing both clinical and radiographic images with expert annotations. This will enable the development of a next-generation model capable of fused multimodal analysis (clinical + radiographic), which would represent a significant advancement toward a comprehensive automated diagnostic system. Beyond technical development, future work should also emphasize the following:
-
AI tools validate across diverse populations and imaging devices to ensure generalizability and equity.
-
Integration with public dental health systems to support risk-based preventive models, such as caries management by risk assessment (CAMBRA), guides policymakers toward more efficient and patient-centered care.
-
Formulation of regulatory frameworks and guidelines to evaluate the efficacy, equity, and ethical deployment of AI systems in dentistry.[47]
Acknowledgments
The author expresses sincere gratitude to the OralAI Research Group for their valuable technical support and mentorship with continuous support throughout the development of the models (https://oralai.org).
Declaration of GenAI Use
During the revision phase of this article, the authors employed ChatGPT-4 for the purpose of enhancing the clarity and quality of the English language in select paragraphs. The tool was not used to generate scientific content. All revisions made by using the tool were subsequently reviewed and edited by the authors to ensure accuracy and integrity of the article. The authors take full responsibility for the final content of the article.
-
References
- 1 Walsh T, Macey R, Ricketts D. et al. Enamel caries detection and diagnosis: an analysis of systematic reviews. J Dent Res 2022; 101 (03) 261-269
- 2 Daruich PM, Brizuela M. Remineralization of initial carious lesions. StatPearls [Internet]. StatPearls Publishing; 2023
- 3 Macey R, Walsh T, Riley P. et al. Visual or visual-tactile examination to detect and inform the diagnosis of enamel caries. Cochrane Database Syst Rev 2021; 6 (06) CD014546
- 4 Walsh T, Macey R, Riley P. et al. Imaging modalities to inform the detection and diagnosis of early caries. Cochrane Database Syst Rev 2021; 3 (03) CD014545
- 5 Tuygunov N, Samaranayake L, Khurshid Z. et al. The transformative role of artificial intelligence in dentistry: a comprehensive overview part 2: the promise and perils, and the international dental federation communique. Int Dent J 2025; 75 (02) 397-404
- 6 Lai Y, Li Y, Liu X. et al. The impact of social and commercial determinants on the unequal increase of oral disorder disease burdens across global, regional, and national contexts. BMC Oral Health 2025; 25 (01) 1308
- 7 Liang Y, Li D, Deng D. et al. AI-driven dental caries management strategies: from clinical practice to professional education and public self care. Int Dent J 2025; 75 (04) 100827
- 8 Takahashi N, Nyvad B. Caries ecology revisited: microbial dynamics and the caries process. Caries Res 2008; 42 (06) 409-418
- 9 Mohammad-Rahimi H, Motamedian SR, Rohban MH. et al. Deep learning for caries detection: a systematic review. J Dent 2022; 122: 104115
- 10 Albano D, Galiano V, Basile M. et al. Artificial intelligence for radiographic imaging detection of caries lesions: a systematic review. BMC Oral Health 2024; 24 (01) 274
- 11 Samaranayake L, Tuygunov N, Schwendicke F. et al. The transformative role of artificial intelligence in dentistry: a comprehensive overview. Part 1: Fundamentals of AI, and its contemporary applications in dentistry. Int Dent J 2025; 75 (02) 383-396
- 12 Khurshid Z. Digital dentistry: transformation of oral health and dental education with technology. Eur J Dent 2023; 17 (04) 943-944
- 13 Waqas M, Hasan S, Ghori AF, Alfaraj A, Faheemuddin M, Khurshid Z. Synthetic orthopantomography image generation using generative adversarial networks for data augmentation. Int Dent J 2025; 75 (06) 103878
- 14 Sreekumar R, Naveen SN. Application of artificial intelligence technologies for the detection of early childhood caries. Discover Artificial Intelligence 2025; 5 (01) 1-16
- 15 Thanh MTG, Van Toan N, Ngoc VTN, Tra NT, Giap CN, Nguyen DM. Deep learning application in dental caries detection using intraoral photos taken by smartphones. Appl Sci (Basel) 2022; 12 (11) 5504
- 16 Kühnisch J, Meyer O, Hesenius M, Hickel R, Gruhn V. Caries detection on intraoral images using artificial intelligence. J Dent Res 2022; 101 (02) 158-165
- 17 Zhang X, Liang Y, Li W. et al. Development and evaluation of deep learning for screening dental caries from oral photographs. Oral Dis 2022; 28 (01) 173-181
- 18 Frenkel E, Neumayr J, Schwarzmaier J. et al. Caries detection and classification in photographs using an artificial intelligence-based model - an external validation study. Diagnostics (Basel) 2024; 14 (20) 2281
- 19 Li S, Liu J, Zhou Z. et al. Artificial intelligence for caries and periapical periodontitis detection. J Dent 2022; 122: 104107
- 20 Estai M, Tennant M, Gebauer D. et al. Evaluation of a deep learning system for automatic detection of proximal surface dental caries on bitewing radiographs. Oral Surg Oral Med Oral Pathol Oral Radiol 2022; 134 (02) 262-270
- 21 Tan R, Zhu X, Chen S. et al. Caries lesions diagnosis with deep convolutional neural network in intraoral QLF images by handheld device. BMC Oral Health 2024; 24 (01) 754
- 22 Chaves ET, Vinayahalingam S, van Nistelrooij N. et al. Detection of caries around restorations on bitewings using deep learning. J Dent 2024; 143: 104886
- 23 Oztekin F, Katar O, Sadak F. et al. An explainable deep learning model to prediction dental caries using panoramic radiograph images. Diagnostics (Basel) 2023; 13 (02) 226
- 24 Khurshid Z, Osathanon T, Shire MA, Schwendicke F, Samaranayake L. Artificial intelligence in dentistry: a concise review of reporting checklists and guidelines. Int Dent J 2025; 76 (01) 109322
- 25 Tejani AS, Klontzas ME, Gatti AA. et al; CLAIM 2024 Update Panel. Checklist for artificial intelligence in medical imaging (CLAIM): 2024 update. Radiol Artif Intell 2024; 6 (04) e240300
- 26 Cohen JF, Korevaar DA, Altman DG. et al. STARD 2015 guidelines for reporting diagnostic accuracy studies: explanation and elaboration. BMJ Open 2016; 6 (11) e012799
- 27 Himel GMS, Islam MM, Hannan UH. Caries-spectra: a dataset of enamel caries. Mendeley Data 2023; 1
- 28 Somal S. Image Enhancement Using Local and Global Histogram Equalization Technique and Their Comparison. Springer; 2019: 739-753
- 29 Nia SN, Shih FY. Medical X-ray image enhancement using global contrast-limited adaptive histogram equalization. Int J Pattern Recognit Artif Intell 2024; 38 (12) 2457010
- 30 Salih AAM, Al-Khannaq M, Hasikin K, Isa NAM. Adaptive local exposure based region determination for non-uniform illumination and low contrast images. Alex Eng J 2022; 61 (12) 11185-11195
- 31 Buslaev A, Iglovikov VI, Khvedchenya E, Parinov A, Druzhinin M, Kalinin AA. Albumentations: fast and flexible image augmentations. Information (Basel) 2020; 11 (02) 125
- 32 Bobin J, Starck J-L, Fadili JM, Moudden Y, Donoho DL. Morphological component analysis: an adaptive thresholding strategy. IEEE Trans Image Process 2007; 16 (11) 2675-2681
- 33 Gholamalinezhad H, Khosravi H. Pooling methods in deep neural networks, a review. arXiv 2020 https://doi.org/10.48550/arXiv.2009.07485
- 34 Santurkar S, Tsipras D, Ilyas A, Madry A. How does batch normalization help optimization?. Advances in neural information processing systems . arXiv:1805.11604.
- 35 Li X, Li X, Pan D, Zhu D. On the learning property of logistic and softmax losses for deep neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence 2020; 34 (04) 4739-4746
- 36 Ali H, Shifa N, Benlamri R, Farooque AA, Yaqub R. A fine tuned EfficientNet-B0 convolutional neural network for accurate and efficient classification of apple leaf diseases. Sci Rep 2025; 15 (01) 25732
- 37 Sinha D, El-Sharkawy M. Thin mobilenet: an enhanced mobilenet architecture. IEEE; 2019: 0280-0285
- 38 Imak A, Celebi A, Siddique K, Turkoglu M, Sengur A, Salam I. Dental caries detection using score-based multi-input deep convolutional neural network. IEEE Access 2022; 10: 18320-18329
- 39 Salehin I, Kang D-K. A review on dropout regularization approaches for deep neural networks within the scholarly domain. Electronics (Basel) 2023; 12 (14) 3106
- 40 Jung W, Jung D, Kim B, Lee S, Rhee W, Ahn JH. Restructuring batch normalization to accelerate CNN training. Proceed Machine Learn Syst 2019; 1: 14-26
- 41 Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. J Big Data 2019; 6 (01) 1-48
- 42 Simón-Soro A, Belda-Ferre P, Cabrera-Rubio R, Alcaraz LD, Mira A. A tissue-dependent hypothesis of dental caries. Caries Res 2013; 47 (06) 591-600
- 43 Hannigan A, Lynch CD. Statistical methodology in oral and dental research: pitfalls and recommendations. J Dent 2013; 41 (05) 385-392
- 44 Khurshid Z, Waqas M, Hasan S, Kazmi S, Faheemuddin M. Deep learning architecture to infer Kennedy classification of partially edentulous arches using object detection techniques and piecewise annotations. Int Dent J 2025; 75 (01) 223-235
- 45 Bomfim RA. Machine learning to predict untreated dental caries in adolescents. BMC Oral Health 2024; 24 (01) 316
- 46 Ramos-Gomez F, Marcus M, Maida CA. et al. Using a machine learning algorithm to predict the likelihood of presence of dental caries among children aged 2 to 7. Dent J 2021; 9 (12) 141
- 47 Surdilovic D, Abdelaal HM, D'Souza J. Using artificial intelligence in preventive dentistry: a narrative review. J Datta Meghe Inst Med Sci Univ 2023; 18 (01) 146-151
Address for correspondence
Publication History
Article published online:
20 February 2026
© 2026. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/)
Thieme Medical and Scientific Publishers Pvt. Ltd.
A-12, 2nd Floor, Sector 2, Noida-201301 UP, India
-
References
- 1 Walsh T, Macey R, Ricketts D. et al. Enamel caries detection and diagnosis: an analysis of systematic reviews. J Dent Res 2022; 101 (03) 261-269
- 2 Daruich PM, Brizuela M. Remineralization of initial carious lesions. StatPearls [Internet]. StatPearls Publishing; 2023
- 3 Macey R, Walsh T, Riley P. et al. Visual or visual-tactile examination to detect and inform the diagnosis of enamel caries. Cochrane Database Syst Rev 2021; 6 (06) CD014546
- 4 Walsh T, Macey R, Riley P. et al. Imaging modalities to inform the detection and diagnosis of early caries. Cochrane Database Syst Rev 2021; 3 (03) CD014545
- 5 Tuygunov N, Samaranayake L, Khurshid Z. et al. The transformative role of artificial intelligence in dentistry: a comprehensive overview part 2: the promise and perils, and the international dental federation communique. Int Dent J 2025; 75 (02) 397-404
- 6 Lai Y, Li Y, Liu X. et al. The impact of social and commercial determinants on the unequal increase of oral disorder disease burdens across global, regional, and national contexts. BMC Oral Health 2025; 25 (01) 1308
- 7 Liang Y, Li D, Deng D. et al. AI-driven dental caries management strategies: from clinical practice to professional education and public self care. Int Dent J 2025; 75 (04) 100827
- 8 Takahashi N, Nyvad B. Caries ecology revisited: microbial dynamics and the caries process. Caries Res 2008; 42 (06) 409-418
- 9 Mohammad-Rahimi H, Motamedian SR, Rohban MH. et al. Deep learning for caries detection: a systematic review. J Dent 2022; 122: 104115
- 10 Albano D, Galiano V, Basile M. et al. Artificial intelligence for radiographic imaging detection of caries lesions: a systematic review. BMC Oral Health 2024; 24 (01) 274
- 11 Samaranayake L, Tuygunov N, Schwendicke F. et al. The transformative role of artificial intelligence in dentistry: a comprehensive overview. Part 1: Fundamentals of AI, and its contemporary applications in dentistry. Int Dent J 2025; 75 (02) 383-396
- 12 Khurshid Z. Digital dentistry: transformation of oral health and dental education with technology. Eur J Dent 2023; 17 (04) 943-944
- 13 Waqas M, Hasan S, Ghori AF, Alfaraj A, Faheemuddin M, Khurshid Z. Synthetic orthopantomography image generation using generative adversarial networks for data augmentation. Int Dent J 2025; 75 (06) 103878
- 14 Sreekumar R, Naveen SN. Application of artificial intelligence technologies for the detection of early childhood caries. Discover Artificial Intelligence 2025; 5 (01) 1-16
- 15 Thanh MTG, Van Toan N, Ngoc VTN, Tra NT, Giap CN, Nguyen DM. Deep learning application in dental caries detection using intraoral photos taken by smartphones. Appl Sci (Basel) 2022; 12 (11) 5504
- 16 Kühnisch J, Meyer O, Hesenius M, Hickel R, Gruhn V. Caries detection on intraoral images using artificial intelligence. J Dent Res 2022; 101 (02) 158-165
- 17 Zhang X, Liang Y, Li W. et al. Development and evaluation of deep learning for screening dental caries from oral photographs. Oral Dis 2022; 28 (01) 173-181
- 18 Frenkel E, Neumayr J, Schwarzmaier J. et al. Caries detection and classification in photographs using an artificial intelligence-based model - an external validation study. Diagnostics (Basel) 2024; 14 (20) 2281
- 19 Li S, Liu J, Zhou Z. et al. Artificial intelligence for caries and periapical periodontitis detection. J Dent 2022; 122: 104107
- 20 Estai M, Tennant M, Gebauer D. et al. Evaluation of a deep learning system for automatic detection of proximal surface dental caries on bitewing radiographs. Oral Surg Oral Med Oral Pathol Oral Radiol 2022; 134 (02) 262-270
- 21 Tan R, Zhu X, Chen S. et al. Caries lesions diagnosis with deep convolutional neural network in intraoral QLF images by handheld device. BMC Oral Health 2024; 24 (01) 754
- 22 Chaves ET, Vinayahalingam S, van Nistelrooij N. et al. Detection of caries around restorations on bitewings using deep learning. J Dent 2024; 143: 104886
- 23 Oztekin F, Katar O, Sadak F. et al. An explainable deep learning model to prediction dental caries using panoramic radiograph images. Diagnostics (Basel) 2023; 13 (02) 226
- 24 Khurshid Z, Osathanon T, Shire MA, Schwendicke F, Samaranayake L. Artificial intelligence in dentistry: a concise review of reporting checklists and guidelines. Int Dent J 2025; 76 (01) 109322
- 25 Tejani AS, Klontzas ME, Gatti AA. et al; CLAIM 2024 Update Panel. Checklist for artificial intelligence in medical imaging (CLAIM): 2024 update. Radiol Artif Intell 2024; 6 (04) e240300
- 26 Cohen JF, Korevaar DA, Altman DG. et al. STARD 2015 guidelines for reporting diagnostic accuracy studies: explanation and elaboration. BMJ Open 2016; 6 (11) e012799
- 27 Himel GMS, Islam MM, Hannan UH. Caries-spectra: a dataset of enamel caries. Mendeley Data 2023; 1
- 28 Somal S. Image Enhancement Using Local and Global Histogram Equalization Technique and Their Comparison. Springer; 2019: 739-753
- 29 Nia SN, Shih FY. Medical X-ray image enhancement using global contrast-limited adaptive histogram equalization. Int J Pattern Recognit Artif Intell 2024; 38 (12) 2457010
- 30 Salih AAM, Al-Khannaq M, Hasikin K, Isa NAM. Adaptive local exposure based region determination for non-uniform illumination and low contrast images. Alex Eng J 2022; 61 (12) 11185-11195
- 31 Buslaev A, Iglovikov VI, Khvedchenya E, Parinov A, Druzhinin M, Kalinin AA. Albumentations: fast and flexible image augmentations. Information (Basel) 2020; 11 (02) 125
- 32 Bobin J, Starck J-L, Fadili JM, Moudden Y, Donoho DL. Morphological component analysis: an adaptive thresholding strategy. IEEE Trans Image Process 2007; 16 (11) 2675-2681
- 33 Gholamalinezhad H, Khosravi H. Pooling methods in deep neural networks, a review. arXiv 2020 https://doi.org/10.48550/arXiv.2009.07485
- 34 Santurkar S, Tsipras D, Ilyas A, Madry A. How does batch normalization help optimization?. Advances in neural information processing systems . arXiv:1805.11604.
- 35 Li X, Li X, Pan D, Zhu D. On the learning property of logistic and softmax losses for deep neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence 2020; 34 (04) 4739-4746
- 36 Ali H, Shifa N, Benlamri R, Farooque AA, Yaqub R. A fine tuned EfficientNet-B0 convolutional neural network for accurate and efficient classification of apple leaf diseases. Sci Rep 2025; 15 (01) 25732
- 37 Sinha D, El-Sharkawy M. Thin mobilenet: an enhanced mobilenet architecture. IEEE; 2019: 0280-0285
- 38 Imak A, Celebi A, Siddique K, Turkoglu M, Sengur A, Salam I. Dental caries detection using score-based multi-input deep convolutional neural network. IEEE Access 2022; 10: 18320-18329
- 39 Salehin I, Kang D-K. A review on dropout regularization approaches for deep neural networks within the scholarly domain. Electronics (Basel) 2023; 12 (14) 3106
- 40 Jung W, Jung D, Kim B, Lee S, Rhee W, Ahn JH. Restructuring batch normalization to accelerate CNN training. Proceed Machine Learn Syst 2019; 1: 14-26
- 41 Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. J Big Data 2019; 6 (01) 1-48
- 42 Simón-Soro A, Belda-Ferre P, Cabrera-Rubio R, Alcaraz LD, Mira A. A tissue-dependent hypothesis of dental caries. Caries Res 2013; 47 (06) 591-600
- 43 Hannigan A, Lynch CD. Statistical methodology in oral and dental research: pitfalls and recommendations. J Dent 2013; 41 (05) 385-392
- 44 Khurshid Z, Waqas M, Hasan S, Kazmi S, Faheemuddin M. Deep learning architecture to infer Kennedy classification of partially edentulous arches using object detection techniques and piecewise annotations. Int Dent J 2025; 75 (01) 223-235
- 45 Bomfim RA. Machine learning to predict untreated dental caries in adolescents. BMC Oral Health 2024; 24 (01) 316
- 46 Ramos-Gomez F, Marcus M, Maida CA. et al. Using a machine learning algorithm to predict the likelihood of presence of dental caries among children aged 2 to 7. Dent J 2021; 9 (12) 141
- 47 Surdilovic D, Abdelaal HM, D'Souza J. Using artificial intelligence in preventive dentistry: a narrative review. J Datta Meghe Inst Med Sci Univ 2023; 18 (01) 146-151


















