Subscribe to RSS

DOI: 10.1055/a-2753-9631
Clustering Breast Cancer Patients Based on Their Treatment Courses Using German Cancer Registry Data
Authors
Funding Information This work was supported by the Federal Joint Committee German Innovation Fund within the joint research project SePaMiM (grant number: 01VSF20018).
Abstract
Background
Cancer registries collect extensive data on cancer patients, including diagnoses, treatments, and disease progression. These data offer valuable insights into cancer care, but it is challenging to analyze due to its complexity. Machine learning techniques, particularly clustering, enable the exploration of treatment data to uncover previously unknown patterns and relationships.
Objectives
This work aimed to develop a method for clustering breast cancer patients in cancer registries based on their treatment courses, to demonstrate the usefulness of clustering for gaining insights, improving data quality, and identifying clinically relevant patterns.
Methods
We developed a similarity measure adapted from the Levenshtein distance to compare treatment courses, incorporating cancer diagnosis, surgeries, radiotherapies, and systemic therapies. The method was evaluated on 17,822 breast cancer cases diagnosed in 2019 from the cancer registry of North Rhine-Westphalia. Evaluation involved two stages: first, domain experts reviewed the clustering results to assess clinical relevance and interpretability. Second, an intercluster survival analysis was performed to identify clinically relevant differences between treatment patterns.
Results
Expert evaluations confirmed that clustering produced clinically plausible groups while also uncovering unexpected treatment patterns and potential data inconsistencies. The survival analysis showed differences in survival between clusters in both prognostically favorable and unfavorable subgroups. These results demonstrate that treatment-course clustering can identify patient groups with differing survival outcomes. However, registry data incompleteness and unmeasured confounders may influence these findings.
Conclusion
Clustering treatment courses in cancer registries can reveal data quality issues, distinguish groups with different prognostic profiles, and support exploratory analyses of treatment patterns. While these findings are not intended to guide clinical decision making or evaluate treatment effectiveness, they can help generate hypotheses, identify unexpected care pathways, and support quality monitoring within cancer registries. Future work should focus on improving treatment data completeness, incorporating additional clinical variables, and refining clustering methods for broader applicability.
Keywords
breast neoplasms - cancer registries - cluster analysis - machine learning - treatment coursesPublication History
Received: 13 May 2025
Accepted: 15 November 2025
Article published online:
10 December 2025
© 2025. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/)
Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany
-
References
- 1 Stegmaier C, Hentschel S, Hofstädter F, Katalinic A, Tillack A, Klinkhammer-Schalke M. Das Manual Der Krebsregistrierung. W. Zuckschwerdt Verlag München; 2019
- 2 Wagner RA, Fischer MJ. The string-to-string correction problem. J Assoc Comput Mach 1974; 21 (01) 168-173
- 3 Levenshtein VI. Binary codes capable of correcting deletions, insertions and reversals. Sov Phys Dokl 1966; 10 (08) 707-710
- 4 Arbeitsgemeinschaft Deutscher Tumorzentren V. Bundeseinheitlicher Onkologischer Basisdatensatz. Accessed April 30, 2025 at: https://basisdatensatz.de/
- 5 Jaccard P. The distribution of the flora in the alpine zone. New Phytol 1912; 11 (02) 37-50
- 6 McInnes L, Healy J, Saul N, Großberger L. UMAP: uniform manifold approximation and projection. J Open Source Softw 2018; 3 (29) 861
- 7 Herrmann M, Kazempour D, Scheipl F, Kröger P. Enhancing cluster analysis via topological manifold learning. Data Min Knowl Discov 2024; 38 (03) 840-887
- 8 Allaoui M, Kherfi ML, Cheriet A. Considerably improving clustering algorithms using UMAP dimensionality reduction technique: a comparative study. In: El Moataz A, Mammass D, Mansouri A, Nouboud F. eds. Image and Signal Processing. Springer International Publishing; 2020: 317-325
- 9 UMAP. Uniform Manifold Approximation and Projection for Dimension. Accessed April 30, 2025 at https://umap-learn.readthedocs.io
- 10 Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 1987; 20: 53-65
- 11 Dafni U. Landmark analysis at the 25-year landmark point. Circ Cardiovasc Qual Outcomes 2011; 4 (03) 363-371
- 12 R Core Team. R: a language and environment for statistical computing. R Foundation for Statistical Computing; 2024. https://www.R-project.org/
- 13 Denz R, Klaaßen-Mielke R, Timmesfeld N. A comparison of different methods to adjust survival curves for confounders. Stat Med 2023; 42 (10) 1461-1479
- 14 Therneau TM. A package for survival analysis in R; 2024. https://CRAN.R-project.org/package=survival
- 15 Leitlinienprogramm Onkologie. (Deutsche Krebsgesellschaft A Deutsche Krebshilfe. S3-Leitlinie Früherkennung, Diagnose, Therapie und Nachsorge des Mammakarzinoms, Version 4.4. Published online 2021. http://www.leitlinienprogramm-onkologie.de/leitlinien/mammakarzinom/
- 16 Tang L, Matsushita H, Jingu K. Controversial issues in radiotherapy after breast-conserving surgery for early breast cancer in older patients: a systematic review. J Radiat Res 2018; 59 (06) 789-793
- 17 Matuschek C, Bölke E, Haussmann J. et al. The benefit of adjuvant radiotherapy after breast conserving surgery in older patients with low risk breast cancer—a meta-analysis of randomized trials. Radiat Oncol 2017; 12 (01) 60
- 18 The American Society of Breast Surgeons. Performance and practice guidelines for the use of neoadjuvant systemic therapy in the management of breast cancer. 2017 . Accessed October 24, 2025 at: https://www.breastsurgeons.org/docs/statements/asbrs-neoadjuvant-systemic-therapy.pdf
- 19 Mallin K, Palis BE, Watroba N. et al. Completeness of American Cancer Registry treatment data: implications for quality of care research. J Am Coll Surg 2013; 216 (03) 428-437
- 20 American College of Surgeons. Quality of Care Measures; 2025. Accessed October 24, 2025 at: https://www.facs.org/quality-programs/cancer-programs/national-cancer-database/quality-of-care-measures/