Introduction
Basal cell carcinoma (BCC) is the most common skin cancer worldwide, responsible for up to 80% of nonmelanoma cases. Early and accurate diagnosis is essential to minimize local invasion, cosmetic damage, and healthcare burden. Dermatoscopy is a noninvasive imaging tool that enhances visualization of subsurface skin structures, improving lesion assessment. Yet, variability among observers and qualitative interpretation can limit accuracy. In recent years, deep learning, especially convolutional neural networks (CNNs), has shown promise in analyzing dermatoscopic images to detect BCC more consistently than the human eye. This article summarizes a systematic review and meta-analysis that evaluates the diagnostic performance of deep learning models on dermatoscopy and compares them with human experts.
Methods in Brief
The analysis followed PRISMA-DTA guidelines with a registered protocol in PROSPERO (CRD42025633947). A literature search of PubMed, Embase, and Web of Science covered studies through 2024, with an update in 2025. Eligible studies focused on dermatoscopy-based deep learning for BCC detection and reported enough data to construct diagnostic 2×2 tables (true positives, false positives, false negatives, true negatives) or provided ROC metrics suitable for reconstruction. Internal validation (cross-validation or split-sample tests) was common, while external validation remained limited.
Two independent reviewers extracted study characteristics, including data type (image sets or patients), imaging type, reference standards (histopathology with or without expert consensus), and model details (CNN vs. non-CNN). A bivariate random-effects model synthesized sensitivity and specificity, with AUC as a primary summary metric. Heterogeneity was explored using meta-regression and subgroup analyses. Publication bias was assessed with Deeks’ test.
Key Findings
Across 15 dermatoscopy-based studies (predominantly retrospective), deep learning models achieved remarkable internal validation performance: pooled sensitivity ~0.96, specificity ~0.98, and an AUC near 0.99. In comparison, dermatologists showed lower sensitivity (~0.75) but high specificity (~0.97), with an AUC around 0.96 on internal data. These results suggest that when evaluated on internal validation sets, AI models can outperform human experts in identifying BCC from dermatoscopic images.
External validation data were sparse, with only one study contributing a validation set (sensitivity ~0.88, specificity ~0.99). This gap highlights concerns about generalizability to real-world clinical settings and emphasizes the need for more multi-center prospective validation before routine adoption.
Subgroup analyses revealed that both CNN and non-CNN approaches performed well, with no statistically significant superiority of one architecture over another in most comparisons. The choice of reference standard also influenced results: studies using histopathology alone vs. histopathology plus clinical follow-up yielded broadly similar accuracy, though heterogeneity was substantial (I² often above 50%). Internal validation type (cross-validation vs. random splits) and imaging details also contributed to variability. No strong publication bias was detected overall.
Implications for Practice
These findings indicate that deep learning models can offer high diagnostic accuracy for BCC on dermatoscopy, potentially supporting clinicians by reducing missed cancers and easing workload. The superior performance on internal datasets suggests AI could assist in triage, documentation, and second-reading workflows, especially in high-volume or resource-limited settings. However, the limited external validation calls for caution: models must be tested across diverse populations, devices, and imaging conditions to ensure reliable performance in everyday practice.
Moreover, the field should address transparency and interpretability concerns inherent to deep learning, establish standardized reference standards, and evaluate cost-effectiveness and integration strategies into dermatology clinics. Prospective studies that mimic real-world conditions are essential to verify whether AI-assisted dermatoscopy translates into improved patient outcomes.
Limitations and Future Directions
The analyzed literature was largely retrospective, with variable gold standards and heavy reliance on public image datasets. These factors contribute to heterogeneity and may overestimate performance in routine care. Future work should emphasize rigorous external validation across centers, standardized imaging protocols, and head-to-head comparisons with dermatologists in prospective trials. Exploring how AI tools fit into clinical workflows, patient pathways, and economic analyses will be critical for sustainable adoption.
Conclusion
Deep learning applied to dermatoscopic images shows strong diagnostic potential for detecting basal cell carcinoma, often outperforming dermatologists on internal validation. External validation remains the key hurdle toward clinical deployment. With further prospective testing and standardized methods, AI-assisted dermatoscopy could become a valuable adjunct in early and accurate BCC diagnosis, ultimately improving patient care and reducing treatment delays.