Open Access
CC BY 4.0 · Gesundheitswesen 2025; 87(S 03): S357-S364
DOI: 10.1055/a-2633-5848
Original Article

Diagnostic and Therapeutic Consequences of Thyroid Ultrasound: A Retrospective Explorative Cohort Study using Claims Data

Diagnostische und therapeutische Konsequenzen des Schilddrüsen-Ultraschalls: Eine retrospektive explorative Kohortenstudie anhand von GKV-Routinedaten

Authors

  • Lisette Warkentin

    1   Allgemeinmedizinisches Institut Erlangen, Uniklinikum Erlangen, Erlangen, Germany
  • Thomas Kühlein

    1   Allgemeinmedizinisches Institut Erlangen, Uniklinikum Erlangen, Erlangen, Germany
  • Johanna Tomandl

    1   Allgemeinmedizinisches Institut Erlangen, Uniklinikum Erlangen, Erlangen, Germany
  • Valeria Biermann

    2   Chair of Health Management, Friedrich-Alexander-Universität Erlangen-Nürnberg, Nürnberg, Germany
  • David Klemperer

    3   Fakultät Angewandte Sozial- und Gesundheitswissenschaften, Ostbayerische Technische Hochschule Regensburg, Regensburg, Germany
  • Katharina Sutter

    4   GWQ ServicePlus AG, Gesellschaft für Wirtschaftlichkeit und Qualität bei Krankenkassen, Düsseldorf, Germany
  • Jan Steffen

    4   GWQ ServicePlus AG, Gesellschaft für Wirtschaftlichkeit und Qualität bei Krankenkassen, Düsseldorf, Germany
  • Angela Schedlbauer

    1   Allgemeinmedizinisches Institut Erlangen, Uniklinikum Erlangen, Erlangen, Germany
  • Susann Hueber

    1   Allgemeinmedizinisches Institut Erlangen, Uniklinikum Erlangen, Erlangen, Germany
 

Abstract

Background

Ultrasound (US) screening for thyroid cancer and other non-indicated US of the thyroid lead to the detection of mostly benign nodules or small papillary carcinomas. Although US screening for thyroid cancer is discouraged by guidelines, it continues to be performed. We aimed to explore the effects of thyroid US early in the diagnostic process on further diagnostic and therapeutic procedures in Germany.

Methods

In a retrospective observational cohort study, we analysed claims data from 2012 to 2016 of patients without history of thyroid disease. After propensity score matching for sociodemographic characteristics, a selection of symptoms and diagnoses (e. g. fatigue, hypertension) and morbidity in the last year, patients with an initial TSH test and a thyroid US within 28 days after the test were compared to patients receiving an initial TSH test but no early US. Patients with hypo- or hyperthyroidism diagnosed directly after the initial TSH were excluded. Thyroid-specific morbidity, follow-up tests and therapeutic pathways were analysed.

Results

In total, 5,390 patients remained in each group after data selection and matching (mean age: 46.5 years (SD=15.0), 58% female). Early US was associated with higher thyroid-specific morbidity, especially regarding thyroid nodules. Additionally, more patients in the observation group received thyroid-related follow-up test. The utilization of ambulatory healthcare services in the first year was higher in this group, especially for internal medicine working in general practice, internal medicine, nuclear medicine, and radiology.

Conclusion

Early use of US was associated with increased thyroid-specific morbidity, even after excluding patients with hypo- or hyperthyroidism after the initial TSH. Subsequently more diagnostic and therapeutic procedures were performed. Increased morbidity may be due to overtesting with diagnostic cascades resulting in overdiagnosis and overtreatment. Further research is needed to estimate the true number of overdiagnoses.


Zusammenfassung

Hintergrund

Das Ultraschall (US)-Screening auf Schilddrüsenkrebs und andere nicht indizierte US-Untersuchungen der Schilddrüse führen hauptsächlich zum Nachweis von gutartigen Knoten oder kleinen papillären Karzinomen. Aufgrund eines ungünstigen Verhältnisses zwischen Nutzen und Schaden wird das US-Screening auf Schilddrüsenkrebs als Überdiagnostik angesehen und in den Leitlinien wird davon abgeraten. Dennoch wird es durchgeführt. Unser Ziel war es, die Auswirkungen eines frühen Schilddrüsen-US in der diagnostischen Abklärung auf das weitere diagnostische und therapeutische Vorgehen in Deutschland zu untersuchen.

Methoden

In einer retrospektiven, beobachtenden Kohortenstudie analysierten wir Routinedaten der Gesetzlichen Krankenversicherungen von 2012 bis 2016 von Patientinnen und Patienten ohne Schilddrüsenerkrankung in der Vorgeschichte. Nach einem Propensity-Score-Matching mittels soziodemografischer Merkmale, einer Auswahl von Symptomen und Diagnosen (z. B. Müdigkeit, Bluthochdruck) sowie der Morbidität im letzten Jahr wurden Patientinnen und Patienten mit einem initialen TSH-Test und einem Schilddrüsen-US innerhalb von 28 Tagen nach dem Test (Beobachtungsgruppe, OG) mit Patientinnen und Patienten verglichen, die einen initialen TSH-Test, aber keinen frühen US erhielten (Kontrollgruppe, CG). Patientinnen und Patienten mit einer Hypo- oder Hyperthyreose, die direkt nach dem ersten TSH-Test diagnostiziert wurde, wurden ausgeschlossen. Analysiert wurden die schilddrüsenspezifische Morbidität sowie anschließende diagnostische und therapeutische Maßnahmen.

Ergebnisse

Insgesamt verblieben nach der Datenselektion und dem Matching 5.390 Patientinnen und Patienten in jeder Gruppe (Durchschnittsalter: 46,5 Jahre (SD=15,0), 58% weiblich). Eine frühe US-Untersuchung war mit einer höheren schilddrüsenspezifischen Morbidität verbunden, insbesondere in Bezug auf Schilddrüsenknoten. Außerdem erhielten mehr Patientinnen und Patienten in der OG schilddrüsenbezogene Untersuchungen oder eine schilddrüsenbezogene Behandlung. Die Inanspruchnahme ambulanter Gesundheitsleistungen im ersten Jahr war in der OG höher, insbesondere bei: hausärztlich tätigen Internisten, Internisten bzw. in der Nuklearmedizin und der Radiologie.

Schlussfolgerung

Der frühe Einsatz des US war mit einem Anstieg der schilddrüsenspezifischen Morbidität verbunden, selbst nach Ausschluss von Patientinnen und Patienten mit Hypo- oder Hyperthyreose nach dem ersten TSH-Test. Im Anschluss wurden mehr diagnostische und therapeutische Maßnahmen durchgeführt. Wir gehen davon aus, dass ein Teil der erhöhten Morbidität auf eine Überdiagnostik mit diagnostischen Kaskaden zurückzuführen ist, die zu Überdiagnosen und Überbehandlungen führen. Weitere Forschungsarbeiten sollten die Gründe und die Zweckmäßigkeit einer frühen US-Untersuchung untersuchen, um das tatsächliche Ausmaß der Überdiagnosen zu ermitteln.


Background

Overdiagnosis is the diagnosis of a health condition that was never going to cause any symptoms or harm [1]. As a consequence, overdiagnosis can turn people unnecessarily into patients. The two major causes of overdiagnosis are overdetection and overdefinition of disease. Overdetection means that abnormalities are identified, that would not have harmed the patient [1]. This often leads to overtreatment, a treatment without benefit but with potential harm for the patient. In the wake of the implementation of a nationwide cancer-screening program in South-Korea increasingly performed ultrasound examinations (US) led to a steeply rising incidence of thyroid cancer. With disease-specific mortality persisting at a very low level, this seeming ‘thyroid cancer epidemic’ was shown to be mainly due to overdiagnosis [2] [3]. The vast majority of cancer cases first detected as nodules by US could be related to papillary thyroid cancer, a neoplasm with an excellent prognosis, in most cases even without therapy [2] [4]. It was assumed that non-recommended US and increased detection of thyroid abnormalities has been the trigger for diagnostic and therapeutic cascades [5]. Due to the low mortality of most thyroid cancers, the benefit of these cascades cannot be but limited. However, they are accompanied by harms such as needless visits to physicians, anxieties arising from cancer diagnosis and adverse effects of surgery [6] [7].

The German ‘Choosing wisely’ (Klug entscheiden) initiative recommends that patients older than 60 years should not be screened for thyroid cancer [8]. According to international guidelines a thyroid US should be performed in patients with clinical risk factors, a palpable nodule or neck lymphadenopathy, but due to a negative net benefit not as a screening test for thyroid cancer for all patients [9] [10]. However, some physicians still offer thyroid cancer screening, maybe because it was recommended in the scientific literature some years ago [11] [12]. An US performed early in the diagnostic work-up without clear indication, is likely to be leading to possible overdiagnosis and overtreatment.

The aim of this study was to determine the effects of an US performed early after an initial thyroid-stimulating hormone (TSH) test in patients without previously known thyroid disease on morbidity and subsequent clinical pathways. Our study was performed within the project PRO PRICARE (Preventing Overdiagnosis in Primary Care) that investigated overdiagnosis and overtreatment in outpatient care [13].


Methods

In a retrospective cohort study, data from 2010 to 2016 of German statutory health insurance funds, provided by the Corporation for Efficiency and Quality in Health Insurance (GWQ ServicePlus AG), were analysed. LW, SH, JT, VB, JS, KS performed the data analysis and had access to the anonymous dataset through a secure web server using a two-factor authentication to ensure data protection. The longitudinal data included sociodemographic information (age, sex, region of residence), inpatient and outpatient diagnoses (ICD-10-GM codes (International Classification of Diseases version 10, German Modification)), information on medication (Anatomical Therapeutic Chemical (ATC) Classification (ATC codes)) as well as on inpatient and outpatient diagnostic and therapeutic procedures. Outpatient measures could be identified in the data via the outpatient billing codes (“Einheitlicher Bewertungsmaßstab (EBM), https://ebm.kbv.de/). Inpatient procedures were coded via the OPS (Operationen- und Prozedurenschlüssel [14]). Since the original use of the data was for billing, it did not contain data on laboratory results or medical history. In Germany, the processing of personal health data is regulated by the Book V of the Social Code (SGB V) [15]. The reporting of the study is based on the German GPS (Good Practice Secondary Data Analysis) [16], theRECORD statement (The REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) Statement) [17] and the STROSA2 (Standard for Secondary Data Analyses, Version 2) [18].

Data of the years 2010–2011 were used as baseline and 2012–2016 for observation. Exclusion criteria were inconclusive age or sex, discontinuous insurance status throughout 2010 and 2011, no insurance status on 1.1. /31.12. of the years 2010, 2011 and/or on 1.1.2012, age younger than 18 years and implausible age and/or diagnoses (Table S1, supplementary material, online). Data of patients with thyroid-specific outpatient and inpatient diagnostic or therapeutic measures as well as thyroid-specific diagnoses in 2010 or 2011 were excluded. We used a quasi-experimental design to explore and compare clinical pathways of patients with an early US to patients without this early US. We included data of patients who received a TSH test in 2012 but had no prior history of thyroid diseases and/or no thyroid diagnostic or therapeutic intervention (Table S2, supplementary material, online) in the previous two years (2010–2011) ([Fig. 1]). We included only the patients with TSH test in 2012 in order to allow a similar observation period for all persons. Inclusion criterium for the observation group (OG) was an initial TSH test in 2012 followed by thyroid ultrasound within 28 days. Patients in the control group (CG) received an initial TSH test in 2012 but no thyroid ultrasound within 28 days.

Zoom
Fig. 1 Data selection and data preparation process. InBA-grouper: classification-system of the Institute of Evaluation Committee, GP (general practitioner) centred health care: special payment scheme in outpatient care, TSH: thyroid-stimulating hormone.

The OG and CG were matched by propensity score. The terminology emphasizes the quasi-experimental design which we aimed to achieve through the matching. Patients with an US within four weeks after the initial TSH test without a diagnosis of hyper- or hypothyroidism were assigned to the OG, those without an early US were attributed to the CG. The rationale for the cut-off of four weeks is based on a guideline of the German Society of General Practitioners and Family Physicians (DEGAM) on increased TSH values. It recommends that for most patients an abnormal TSH test should be verified by a second TSH test [19]. Therefore, only with delay to the first TSH test, if a second test has confirmed the first one, an US might eventually become indicated.

Propensity score matching (PSM) Covariates for the PSM were sociodemographic characteristics (age, sex and place of residence by area code [20]), morbidity in the twelve months before the initial TSH test and a selection of symptoms/diagnoses (based on clinical expertise) in the quarter of the initial TSH test, serving as a proxy to the reasons for encounter with association to potential thyroid pathologies (Table S3, supplementary material, online). Morbidity was measured by a classification system of the German Institute of the Evaluation Committee (Institut des Bewertungsausschusses (InBA)) which defines the framework for remuneration in outpatient care [21]. Matching (nearest neighbour without replacement) was performed via logistic regression. Matching quality was evaluated by chi-square test for categorical variables and t-test for independent samples for continuous variables. The results indicate a similarity in terms of observable variables across OG and CG after matching ([Table 1]) [22].

Table 1 Characteristics of variables for propensity score matching

Unmatched

Matched

p-value

Control

Observation

Control

Observation

Statistic

Unmatched

Matched

Age (M (SD))

48.01 (17.06)

45.75 (14.87)

45.73 (16.57)

45.81 (14.89)

t-test

<0.001

0.694

Female (%)

51.81

59.61

59.57

59.27

chi-square

<0.001

0.644

No indication (%)

61.14

62.26

62.73

62.47

chi-square

>0.019

0.679

Morbidity (M (SD))

7.27 (5.41)

6.50 (4.90)

6.53 (4.90)

6.51 (4.91)

t-test

<0.001

0.769

N

121,126

11,452

11,233

11,233

Mean with standard deviation and proportion of age, gender, identified indication for first TSH test and morbidity before (unmatched) and after (matched) propensity score matching. Morbidity was measured by the InBA-classification system of the Institute of Evaluation Committee. Identified indications for first TSH test include symptoms and diagnoses in the quarter of the first TSH test (Table S3, supplementary file 1). T-test for independent samples and chi-square test were used to evaluate matching quality.

After matching the cohorts, the following exclusion criteria were added due to plausibility and medical context: (1) diagnosis of hypo- or hyperthyroidism as a result of the initial TSH test in the OG, because an early US especially in patients with hyperthyroidism might be indicated; (2) discontinuous insurance status after the index date; (3) registration in the German so-called ‘Hausarztzentrierte Versorgung’ (‘general practitioner-centered health care program’, a special payment scheme in German outpatient care), because of lack of data for these patients. Patients were excluded along with their matching partner. After the exclusion process, each group consisted of 5,390 patients.

We compared the groups in terms of sociodemographic variables, overall morbidity using the Charlson Comorbidity Index (CCI) [23] and thyroid-specific morbidity. A diagnosis was considered confirmed if it was verified in at least two quarters of outpatient data and/or once in inpatient data during the observation [24]. Additionally, we analysed the use of inpatient and outpatient care services and differences in selected medical specialties who were involved in the first year of the observation period.

To analyse differences between groups chi-square tests were used for categorical variables with Phi φ as effect sizes. As Levene's tests of variance homogeneity were significant, Welch’s t-test with Cohen’s d was used for continuous variables. P-values were adjusted by Bonferroni correction separately for the number of tests in each section of the analysis.

For interpretation it is important to explain specific details about the data: The outpatient billing code ‘02340’ for puncture includes not only fine needle aspirations (FNA) of the thyroid but also of haematomas, ascites and ten others (Table S2, supplementary material, online). Thus, the number of FNAs of the thyroid will be lower than the results of this analysis suggest. However, for data selection, no patients should be included that possibly had a puncture of the thyroid. Therefore, the code had been included in the exclusion criteria. For the same reason we considered the inpatient billing codes for US and (positron emission tomography-) computed tomography, although they are not thyroid-specific. Data analysis was performed using Microsoft SQL Server Management Studio 2012 (Microsoft Corporation, Redmond, WA) and PSPP (GNU PSPP [version 1.4.1], Free Software Foundation, Boston).

Within the PRO PRICARE project, Hafner et al. analysed cost effects of presumably non-indicated thyroid US using in part the same data and same method as used in the present study [25]. Hueber et al. performed a cluster analysis using only data from ambulatory care in one specific region in Germany in order to identify groups that may be particularly affected by cascading effects after early use of US [26].


Results

Out of 132,578 patients with an initial TSH test, 8.6% did receive an early US in outpatient care and were thus eligible for the OG (n=11,452). After exclusion and matching process, datasets of 5,390 patients remained in each group ([Fig. 1]).

After matching and exclusion, 59.13% of patients in the CG and 57.63% of OG were female. Mean age ranged between 46.08 years (SD =15.67) in CG and 47.96 years (SD=14.23) in OG. Approximately two-thirds of patients had none of the diseases considered for the CCI. About half of the patients were living in urban, almost one third in suburban and the rest in rural regions ([Table 2]).

Table 2 Comparison of control and observation group regarding sociodemographic characteristics, thyroid-specific morbidity and measurements

Control

Observation

All Patients

Statistics χ²(1), if not specified

p-value

Effect size

Sociodemographic characteristics (adjusted p-value a =0.013 (n=4))

N

5,390

5,390

11,780

Age (M (SD))

46.08 (15.67)

46.96 (14.23)

46.52 (14.97)

t(10,679.32)=− 3.05

0.002

d=0.059

Sex Female patients (%)

59.13

57.63

58.38

2.50

0.113

n. a.

Place of residence (n. a.=1.5%)

Urban regions (%)

55.16

55.06

55.11

Suburban regions (%)

28.69

28.97

28.83

χ²(2)=0.14

0.934

n. a.

Rural regions (%)

16.15

15.96

16.06

CCI in uptake quarter (%)

CCI=0

67.05

66.49

66.77

0.38

0.540

n. a.

Thyroid-specific morbidity (%) (adjusted p-value a =0.004 (n=12))

φ

Multinodular goiter (E04.2, E01.1)

1.04

8.35

4.69

321.90

<0.001

φ=0.17

Uninodular goiter (E04.1)

1.08

8.68

4.88

335.98

<0.001

φ=0.18

Diffuse goiter (E04.0, E01.0)

0.82

4.23

2.52

127.69

<0.001

φ=0.11

Hypothyroidism (E02*, E03*)

4.64

7.79

6.22

45.99

<0.001

φ=0.07

Hyperthyroidism (E05*)

1.15

2.43

1.79

25.12

<0.001

φ=0.05

Thyroiditis (E06*)

1.74

6.05

3.90

133.35

<0.001

φ=0.11

Other disorders of the thyroid (E07*)

1.32

4.21

2.76

83.99

<0.001

φ=0.09

Malignant neoplasm (C73*)

0.07

0.30

0.19

7.21

0.007

n. a.

Benign neoplasm (D34*)

0.09

1.09

0.19

45.83

<0.001

φ=0.07

Neoplasm of unknown behaviour (D44.0)

0.02

0.15

0.08

5.45

0.020

n. a.

Paralysis of vocal cords or larynx (J38.0)

0.11

0.24

0.18

2.58

0.108

n. a.

Post-procedural hypothyroidism (E89.0)

0.20

1.34

0.77

45.18

<0.001

φ=0.06

Diagnostic procedures b (%) (adjusted p-value a =0.007 (n=7))

Outpatient

fT4

27.55

46.60

37.08

419.38

<0.001

φ=0.20

fT3

25.66

43.02

34.34

360.43

<0.001

φ=0.18

Calcitonin test

1.99

9.91

5.95

302.43

<0.001

φ=0.17

In-or outpatient

Puncture/Thyroid biopsy

2.12

4.12

3.12

35.83

<0.001

φ=0.06

Radioiodinetherapy

0.13

0.69

0.41

20.54

<0.001

φ=0.04

Scintigraphy

3.12

20.30

11.71

7,969.55

<0.001

φ=0.27

Inpatient

Thyroid surgery

0.41

3.04

1.73

110.31

<0.001

φ=0.10

Accounting quarters (intervals of 3 months) with at least one of the following outpatient examinations (Mean (SD))

TSH test

2.84 (2.14)

3.45 (2.71)

3.15 (2.46)

t(10,221.84)=− 13.01

<0.001

d=− 0.257

Thyroid ultrasound

0.22 (0.64)

1.90 (1.39)

1.06 (1.37)

t(7,592.79)=− 80.50

<0.001

d=− 1.848

Selected medical specialties involved in the first year of the observation period (%) (adjusted p-value*=0.006 (n=9))

General Practice

77.48

72.10

74.79

41.37

<0.001

φ=0.06

Internal Medicine (working in GP)

33.02

49.20

41.11

291.35

<0.001

φ=0.16

Internal medicine

26.75

34.32

30.54

72.80

<0.001

φ=0.08

Gynaecologist

39.94

38.79

39.37

1.49

<0.001

φ=0.01

Psychiatry/Neurology

13.88

12.28

13.08

6.03

<0.001

φ=0.02

Nuclear medicine

3.06

21.89

12.48

875.16

<0.001

φ=0.28

Radiology

26.18

30.20

28.19

21.58

<0.001

φ=0.04

Surgery

35.29

34.56

34.93

0.62

<0.001

φ=0.01

Radiooncology

0.63

0.52

0.58

0.58

<0.001

φ=0.01

a p-value adjustment with Bonferroni-method. bCodes for procedures are depicted in Table S1, supplemental file 1. CCI: Charlson Comorbidity Index. fT3: free triiodothyronine. fT4: free thyroxine. TSH: thyroid-stimulating hormone. GP: general practice. District types were defined according to settlement structure referring to the Federal Institute for Research on Building, Urban Affairs and Spatial Development (BBSR): urban:>300,000 inhabitants or density around 300 inhabitants/km², suburban:>150 inhabitants/km² or region with centre>100,000 inhabitants and≥100 inhabitants/km², rural:>150 inhabitants/km² without regional centre or region with centre>100,000 inhabitants and<100 inhabitants/km². Patients in the observation group initial TSH test and an outpatient ultrasound of the thyroid within 28 days. Patients in the control group received a TSH test but no outpatient ultrasound within 28 days. Chi-square test with effect size Phi (φ), Welch’s t-test with effect size Cohen’s d.

After the early US, patients in the OG had a statistically significantly higher overall thyroid-specific morbidity as compared to patients in the CG without the early US. The highest effect sizes (φ=0.11 to .18) were found for the differences in thyroiditis, multinodular, uninodular and diffuse goitre. There was no statistically significant difference regarding malignant neoplasm of the thyroid or neoplasm of unknown behaviour, however only after p-value adjustment with Bonferroni-method ([Table 2]).

Patients in the OG received statistically significant more thyroid-specific tests and examinations. The effect was highest for scintigraphy and laboratory tests. In the OG 20.03% received a scintigraphy while this was the case in only 3.12% of the CG (χ²(1)=7969.55, p<0.001, φ=0.27). Patients in the OG were seven-fold more likely to receive any kind of thyroid surgery (OG=3.04% vs. CG=0.41%, χ²(1)=110.31, p<0.001, φ=0.10). An FNA was conducted rarely: in 4.12% of patients in the OG versus 2.12% in the CG (χ²(1)=35.83, p<0.001, φ=0.06). In the CG 15.51% received at least one US in outpatient care during the observation period. In the OG, patients had in mean 1.90 (SD=1.39) quarters (3 months interval) with at least one thyroid US versus 0.22 (SD=0.64) in the CG ([Table 2]).

Both groups differed regarding medical specialties involved in the first year of the observation period. For most specialties effect sizes were small. With the highest effect sizes, patients in the OG were more likely to visit a physician specialised in internal medicine working in general practice, and nuclear medicine (OG=49.20% vs. CG=33.02%, χ²(1)=291.35, p<0.001, φ=0.16; and OG=21.89% vs. CG=3.06%, χ²(1)=875.16, p<0.001, φ=0.28 respectively).


Discussion

We aimed to determine the effects of an US performed early after an initial TSH test in patients without previously known thyroid disease on morbidity and subsequent clinical pathways. A noticeable part of the patients having received an early US of the thyroid was diagnosed with thyroid-specific diagnoses. Further thyroid-specific tests and examinations, especially scintigraphs, were more frequent in patients with an early US. The likelihood of undergoing surgery increased statistically significant in these patients.

The Choosing Wisely Campaign Canada states that screening for thyroid disorders in asymptomatic patients is not recommended due to insufficient evidence for the patients’ benefit [27]. However, a recent analysis showed that TSH testing is often performed in patients without appropriate indication and that results were abnormal in only five percent of tests in a group without identified indication for TSH testing [28]. Wintemute et al. reported that in more than one third of patients receiving a TSH test no indication could be identified [29]. About 9% of patients in our study (before matching) with an initial TSH test received an early thyroid US. This number might not seem particularly high. Considering that even the initial TSH test may not have been performed for diagnostic work-up of suspected thyroid abnormalities, the proportion of early US after an initial TSH test for suspected thyroid abnormalities is presumably significantly higher. Due to the use of claims data, reasons such as medical history or results of clinical tests cannot be included in the data analysis. Thus, possible plausible reasons for diagnostic tests that would have justified the tests in some of the patients cannot be taken into account. However, an overuse of US is probable and recent data suggest an overuse [30]. Modes of detection of thyroid nodules in patients who underwent thyroid-directed surgeries were retrospectively analysed. About 40% of patients did not have thyroid-related symptoms and their mode of detection was either an abnormal clinician screening examination, incidentalomas, patient-requested screening, or diagnostic cascade. The latter two account for about 18% and were classified as a rather inappropriate clinical evaluation [30] [31]. Incidentalomas accounted for most diagnoses in asymptomatic patients. Computed tomography of the chest has been the most common imaging modality [30]. Once detected, physicians and patients face the dilemma of overdiagnosis with all its consequences and the uncertainty regarding the classification of the nodule. In our study, we showed that patients after an US receive more follow-up examinations and thyroid-related therapies. The cascade initiating effect of thyroid US was evaluated by qualitative studies from USA and Austria [7] [32]. Physicians described the clinical pathway from incidental diagnosis to a surgical procedure as a predictable cascade, automatic and habitual [32]. They felt forced to proceed in this cascade and patients expected it. Our data suggest that these cascades are present in Germany. We considered examinations, diagnoses and procedures after a thyroid US, however, it can be assumed that for some patients the cascade started even before and the US is already part of the cascade. Patients reported that the initiating trigger of these cascades were incidental findings during imaging tests, but also diagnoses through screening and diagnostic work-up of symptoms or findings in physical exam were named [7].

Triggered by these events, the sequences of examinations and therapy – with doubtful benefit in many cases – come along with a high amount of psychological distress, possible complications and costs. To acknowledge the psychological and physical harms induced by the cancer diagnosis for patients with small papillary neoplasms, even a terminology change for this diagnosis has been discussed recently [33]. In South Korea the increase of thyroid-cancer incidence and thyroidectomies inverted after physicians were called to stop screening, again without affecting disease-specific mortality. This finding underpins the statement that US screening leads to overdiagnosis and hence overtreatment almost inevitably [3].

Limitations

The original purpose of claims data is billing of medical services. Therefore, reasons for encounter are rarely available and physicians often rather use unspecific ICD-10 codes instead of the more specific ones. For matching we considered possible confounders represented in the data. We used a selection of diagnoses and symptoms for propensity score matching, however, there are other conditions that trigger TSH tests. As we had no information on results of physical examinations or laboratory results, those could not be considered.



Conclusions

About every tenth patient with an initial TSH test received an early US of the thyroid. This US was associated with an increase in thyroid-specific morbidity. Subsequently, also more thyroid-specific diagnostic and therapeutic procedures were performed than without US. Although the amount of overuse cannot be determined exactly, the results indicate that at population level overdiagnosis of thyroid abnormalities with presumably cascade-like follow-up examinations and overtreatment is prevalent in Germany. Recommendations of international guidelines and campaigns such as ‘Choosing wisely’ should be considered more carefully in clinical decision-making. Clearly defining and promoting the indications and especially non-indications of thyroid US could aid to reduce this phenomenon. Further research is needed to understand physicians’ and patients’ motives to undergo early US and to estimate the extent of overdiagnosis resulting from unnecessary tests.


Ethics statement

As we analysed retrospective anonymous data, an informed consent and an ethical approval was not required. German law allows analysing anonymous data for research purposes without patients’ consent. Therefore, the appropriate processes regarding the ethical approval and informed patient consent have been followed.


Availability of data and materials

Because of the confidential nature of in- and outpatient claims data, a permission for public availability of the data is not possible. The permission to access the data is restricted to research and subjects to the consent of the health insurance funds.


Authorsʼ contributions

Conceptualization: TK, SH, AS, JT. Methodology: TK, SH, AS, JT. Formal analysis: LW, SH, JT, VB, JS, KS. Resources (data) and Data curation: KS, JS. Funding acquisition: TK, SH. Project administration: SH. Supervision: TK. Writing – original draft: LW. Writing – review & editing: all authors.


Financial resources

This study was performed within the PRO PRICARE (Preventing Overdiagnosis in Primary Care, www.propricare.de) research network.


Funding Information

Bundesministerium für Bildung und Forschung — http://dx.doi.org/10.13039/501100002347; 01GY1605. Supported by the Interdisciplinary Center for Clinical Research (IZKF) at the University Hospital of the University of Erlangen-Nuremberg (Clinician Scientist Programm).



Conflict of Interest

KS and JS worked for GWQ ServicePlus AG, founded and owned by a group of health insurance companies. Besides that, the authors declare that they have no competing interests.

Acknowledgements

We thank the student assistants Lukas Worm, Lena Pachsteffl and Carolin Nürnberger, for contributing to data preparation. The present work was performed in (partial) fulfilment of the requirements for obtaining the degree “Dr. rer. biol. hum.”.


Correspondence

Dr. Lisette Warkentin
Uniklinikum Erlangen
Allgemeinmedizinisches Institut Erlangen
Universitätsstraße 29
91054 Erlangen
Germany   

Publication History

Received: 30 July 2024

Accepted: 31 March 2025

Article published online:
15 August 2025

© 2025. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/).

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany


Zoom
Fig. 1 Data selection and data preparation process. InBA-grouper: classification-system of the Institute of Evaluation Committee, GP (general practitioner) centred health care: special payment scheme in outpatient care, TSH: thyroid-stimulating hormone.