Robust machine learning model for classifying lung biopsy samples using general and tissue-specific feature extractors

Viviane Teixeira Loiola de Alencar; Felipe Navarro Balbino Alves; Guilherme de Sousa Velozo; Luiz Edmundo Lopes Mizutani; Vladmir Cláudio Cordeiro de Lima; Fábio Távora

doi:10.1055/s-0045-1807874

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00059887.xml

CC BY 4.0 · Brazilian Journal of Oncology 2025; 21
DOI: 10.1055/s-0045-1807874

INNOVATION IN HEALTHCARE

1831

POSTER PRESENTATION

Robust machine learning model for classifying lung biopsy samples using general and tissue-specific feature extractors

Authors

Viviane Teixeira Loiola de Alencar
Felipe Navarro Balbino Alves
Guilherme de Sousa Velozo
Luiz Edmundo Lopes Mizutani
Vladmir Cláudio Cordeiro de Lima
Fábio Távora

Further Information

All articles of this category

Keywords

lung cancer - artificial intelligence - histological subtype - deep learning

Introduction: Lung cancer is the leading cause of cancer-related deaths and the most commonly diagnosed malignancy worldwide. Accurate histological subtyping is crucial for diagnosis and treatment planning, yet pathologist variability can affect up to 25% of cases. Diagnoses often rely on small biopsy samples, posing challenges and requiring tissue conservation for subsequent molecular analyses. Therefore, new diagnostic tools that enhance accuracy without requiring additional tissue sampling are highly beneficial.

Objective: This study aimed to evaluate the potential of a machine learning tool to accurately classify hematoxylin and eosin (H&E) stained lung biopsy samples from a real-world dataset into four categories: adenocarcinoma, squamous cell carcinoma, small cell carcinoma, and benign tissue.

Methods: The training dataset included 412 adenocarcinomas, 323 squamous cell carcinomas, 41 small cell carcinomas, and 532 benign tissue samples, sourced from The Cancer Genome Atlas and a private dataset. To address class imbalance, oversampling techniques were applied to the minority classes. We developed a proprietary model architecture, training a foundational model (LungDine) for feature extraction using DinoV2 on a 1.2 terabyte dataset of 1,935,106 patches from 3,215 H&E lung images from TCGA, comparing it with ResNet50 features. Additionally, a second feature extractor (OncoDino) was developed using DinoV2, trained on a 6.5 terabyte dataset of 10,212,976 patches from 21,479 histology images. The test dataset consisted of 79 biopsy images from a private real-world dataset, with diagnoses validated by immunohistochemistry tests.

Results: The LungDine model achieved AUC (Area Under the Curve) values of 97% for adenocarcinoma (LUAD), 96% for squamous cell carcinoma (LUSC), 94% for benign tissue, and 96% for small cell carcinoma (SCC), with an average AUC improvement of 13.5 percentage points compared to ResNet50. The OncoDino model achieved AUC values of 94% for LUAD, 92% for LUSC, 96% for benign tissue, and 99% for SCC.

Conclusion: These findings demonstrate the efficacy of both models in accurately classifying lung tissue samples. The OncoDino results suggest that effective classification can be achieved without tissue-specific feature extractors, indicating potential for broader and scalable applications in histopathological image analysis. The next step will be to validate these findings in a larger real-world dataset. Funding: FAPESP 2023/11600-0.

Corresponding author: Viviane Teixeira Loiola de Alencar (e-mail: vivianetlalencar@gmail.com).

No conflict of interest has been declared by the author(s).

Publication History

Article published online:
06 May 2025

© 2025. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution 4.0 International License, permitting copying and reproduction so long as the original work is given appropriate credit (https://creativecommons.org/licenses/by/4.0/)

Thieme Revinter Publicações Ltda.
Rua Rego Freitas, 175, loja 1, República, São Paulo, SP, CEP 01220-010, Brazil

Bibliographical Record
Viviane Teixeira Loiola de Alencar, Felipe Navarro Balbino Alves, Guilherme de Sousa Velozo, Luiz Edmundo Lopes Mizutani, Vladmir Cláudio Cordeiro de Lima, Fábio Távora. Robust machine learning model for classifying lung biopsy samples using general and tissue-specific feature extractors. Brazilian Journal of Oncology 2025; 21.
DOI: 10.1055/s-0045-1807874

All articles of this category

Related Journals

Subscribe to RSS

Share / Bookmark

Robust machine learning model for classifying lung biopsy samples using general and tissue-specific feature extractors

Authors

Keywords

Publication History