Open Access
CC BY-NC-ND 4.0 · Rev Bras Ortop (Sao Paulo) 2022; 57(02): 321-326
DOI: 10.1055/s-0041-1729581
Artigo Original
Coluna

Evaluation of the Reliability and Reproducibility of the Roussouly Classification for Lumbar Lordosis Types

Article in several languages: português | English

Authors

  • Camila Oda Yamazato

    1   Departamento de Ortopedia e Traumatologia, Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo, SP, Brasil
  • Gustavo Ribeiro

    2   Hospital Israelita Albert Einstein, São Paulo, SP, Brasil
  • Fabio Chaud de Paula

    3   Complexo Hospitalar Universitário Professor Edgar Santos, São Paulo, SP, Brasil
  • Ramon Oliveira Soares

    3   Complexo Hospitalar Universitário Professor Edgar Santos, São Paulo, SP, Brasil
  • Paulo Santa Cruz

    4   Ambulatório de Coluna do Centro de Traumatologia do Esporte, Departamento de Ortopedia e Traumatologia, Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo, SP, Brasil
  • Michel Kanas

    4   Ambulatório de Coluna do Centro de Traumatologia do Esporte, Departamento de Ortopedia e Traumatologia, Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo, SP, Brasil
 

Abstract

Objective The present study aims to determine the intra- and inter-rater reliability and reproducibility of the Roussouly classification for lumbar lordosis types.

Methods A database of 104 panoramic, lateral radiographs of the spine of male individuals aged between 18 and 40 years old was used. Six examiners with different expertise levels measured spinopelvic angles and classified lordosis types according to the Roussouly classification using the Surgimap software (Nemaris Inc., New York, NY, USA). After a 1-month interval, the measurements were repeated, and the intra- and inter-rater agreement were calculated using the Fleiss Kappa test.

Results The study revealed positive evidence regarding the reproducibility of the Roussouly classification, with reasonable to virtually perfect (0.307–0.827) intra-rater agreement, and moderate (0.43) to reasonable (0.369) inter-rater agreement according to the Fleiss kappa test. The most experienced examiners showed greater inter-rater agreement, ranging from substantial (0.619) to moderate (0.439).

Conclusion The Roussouly classification demonstrated good reliability and reproducibility, with intra- and inter-rater agreements at least reasonable, and reaching substantial to virtually perfect levels in some situations. Evaluators with highest expertise levels showed greater intra and inter-rater agreement.


Introduction

During the last 2 decades, the study of spinopelvic angles and sagittal balance has been increasingly relevant in spinal surgery, mainly for the correction of adult deformities. Reportedly, these measures vary according to age, ethnicity, and biotype in asymptomatic patients, and according to the etiology of the sagittal imbalance in symptomatic patients.[1] [2] [3]

Spinal deformities are clinically and radiologically different in adults and adolescents due to the association with degenerative processes, dissimilar patterns, and natural history. Furthermore, spinal imbalance at the sagittal plane may result from fractures or postoperative complications.[4]

Lumbar lordosis severity and pain are inversely correlated; in addition, there is an association between spinopelvic parameters and lumbar lordosis types with disc degeneration, facet overload, spondylolisthesis, chronic lumbar pain, disc herniation, and functional disability.[5] [6] [7] [8] [9] [10]

The body uses mechanisms such as increased kyphosis/lordosis of adjacent segments, trunk hyperextension, pelvic anteversion or retroversion, knee flexion, and ankle extension to compensate for sagittal imbalance.[11] [12] These mechanisms, along with spinal anatomical parameters and sagittal alignment, must be considered for surgical indication and planning because they affect the postoperative prognosis.[13]

Since this assessment is deemed critical, Roussouly et al.[14] proposed a classification system that divides the lumbar spine into four types according to the lordotic apex and sacral tilt angle. Although this classification has been used for both research and clinical purposes since its introduction, few studies prove its validation.

The present study aimed to evaluate the reliability and reproducibility of lumbar lordosis classification using the Roussouly et al.[14] system, and to verify if the inter-rater agreement is affected by the expertise level.


Materials and Methods

A total of 104 panoramic radiographs of the spine in lateral view of men aged between 18 and 40 years old was used.

These radiographs belonged to a database that had been used in previous studies for other evaluations. Due to the impossibility of contacting these subjects, who were previously anonymized, the present study was exempted from an informed consent form and it was approved by the Research Ethics Committee under protocol number 3.828.093.

All radiographs were obtained with the same equipment. Patients were asked to stand up, with a straight trunk, upper limbs resting in a support, shoulders at 30° flexion, slightly flexed elbows, and extended knees. Panoramic, lateral radiographs covered from the base of the skull to the proximal region of the femur. Low-quality images that did not allow measurements were excluded from the study.

Using a the Surgimap software, version 2.3.1.5 (Nemaris Inc., New York, NY, USA), 6 evaluators, consisting in 2 spine surgeons with > 5 years of experience (A1 and A2), 2 residents in Spine Surgery (B1 and B2), and 2 residents in Orthopedics and Traumatology (C1 and C2), measured the spinopelvic angles and the sagittal vertical axis ([Figure 1]). These data were used to classify the type of lumbar lordosis according to Roussouly et al.[15] ([Figure 2]). After 1 month, the measurements were repeated by the same evaluators to assess intra- and inter-rater agreement.

Zoom
Fig. 1 Example of spinopelvic angles and vertical sagittal axis measurement using the Surgimap software.
Zoom
Fig. 2 Lumbar lordosis classification according to Roussouly et al. Type 1: The sacral tilt (ST) is < 35°, and the apex of lumbar lordosis is located at the center of the L5 vertebral body. Type 2: The ST is < 35°, and the apex of lumbar lordosis is located at the base of the L4 vertebral body. Type 3: The ST ranges from 35° to 45°. Type 4: The ST is > 45°. (adapted from Roussouly et al.[15]).

Statistical Methods

Initially, the results were descriptively analyzed to obtain graphs and frequency tables to characterize the participants of the research. Categorical variables were expressed as absolute frequency and percentage values. Graphs assessed the frequency of variables of interest.

Intra- and inter-rater agreement of the Roussouly classification were determined using the Fleiss kappa test (1981); this is a generalization of the kappa test used when several people evaluate the same sample on a scale with different categories, such as the Roussouly classification, consisting in types 1 to 4. The kappa agreement coefficient ranges from + 1 (perfect agreement) to 0 (agreement equal to that expected by chance) to - 1 (complete disagreement).[16]

The kappa coefficient value was classified according to Landis et al.,[17] as shown in [Table 1].

Table 1

Kappa coefficient

Strength of agreement

< 0.00

Poor

0.00–0.20

Weak

0.21–0.40

Reasonable

0.41–0.60

Moderate

0.61–0.80

Substantial

0.81–1.00

Virtually perfect

All analyzes were performed with the statistical software R version 3.3.1 (R Foundation, Vienna, Austria), and the level of significance was set at 5%.[18]



Results

[Table 2] presents frequency distribution of the lordosis type, attributed by each evaluator at both measurements. Note that type 3 was the most frequently assigned type (> 50%) by all evaluators, except for B1, who classified most cases as type 4.

Table 2

Measurement

Evaluator

Lumbar lordosis type according to the Roussouly classification

1

2

3

4

Measurement 1

A1

9 (9%)

11 (11%)

59 (57%)

25 (24%)

A2

7 (7%)

12 (12%)

57 (55%)

28 (27%)

B1

14 (13%)

10 (10%)

23 (22%)

57 (55%)

B2

8 (8%)

8 (8%)

57 (55%)

31 (30%)

C1

7 (7%)

8 (8%)

58 (56%)

31 (30%)

C2

7 (7%)

10 (10%)

59 (57%)

28 (27%)

Measurement 2

A1

6 (6%)

9 (9%)

64 (62%)

25 (24%)

A2

10 (10%)

9 (9%)

57 (55%)

28 (27%)

B1

19 (18%)

8 (8%)

30 (29%)

47 (45%)

B2

10 (10%)

5 (5%)

57 (55%)

32 (31%)

C1

3 (3%)

8 (8%)

61 (59%)

32 (31%)

C2

6 (6%)

14 (13%)

62 (60%)

22 (21%)

Intra-rater Agreement

The intra-rater agreement analysis revealed that A2 had the best level of agreement, with a virtually perfect coefficient (0.827). On the other hand, the lowest level of agreement was obtained by B1, with a reasonable coefficient (0.307). A1 and B2 showed substantial agreement (0.601 and 0.710, respectively), whereas C1 and C2 presented moderate agreement (0.580 and 0.557, respectively). The average intra-rater agreement was 0.597, which was deemed moderate. All values had a p-value < 0.001 ([Figure 3] and [Table 3]).

Table 3

Evaluator

Fleiss kappa coefficients

95%CI

p-value

A1

0.601

(0.462–0.740)

< 0.001*

A2

0.827

(0.738–0.915)

< 0.001*

B1

0.307

(0.163–0.452)

< 0.001*

B2

0.710

(0.586–0.833)

< 0.001*

C1

0.580

(0.440–0.720)

< 0.001*

C2

0.557

(0.407–0.708)

< 0.001*

Zoom
Fig. 3 Fleiss kappa coefficient for intra-rater agreement.

Inter-rater Agreement

In the inter-rater agreement analysis, the general coefficient for the first measurement was 0.43 (moderate); for the second measurement, this value was slightly lower, at 0.369 (reasonable) ([Table 4]).

Table 4

Evaluator

Fleiss kappa coefficients

95%CI

p-value

Measurement 1

0.430

(0.344–0.516)

< 0.001

Measurement 2

0.369

(0.288–0.451)

< 0.001

Among evaluators with the same level of expertise, there was a statistically significant agreement (p < 0.001) between all groups, and spine surgeons with > 5 years of experience presented the highest level of inter-rater agreement, ranging from substantial at the first measurement (0.619) to moderate at the second measurement (0.439). Spine surgery and orthopedics and traumatology residents showed reasonable levels of inter-rater agreement within their classes ([Table 5]).

Table 5

Measurement

Evaluator

A1

A2

B1

B2

C1

C2

Measurement 1

A1

1.000

A2

0.619

1.000

B1

0.192

0.261

1.000

B2

0.488

0.412

0.236

1.000

C1

0.565

0.583

0.218

0.434

1.000

C2

0.597

0.584

0.196

0.516

0.496

1.000

Measurement 2

A1

1.000

A2

0.439

1.000

B1

0.222

0.138

1.000

B2

0.483

0.458

0.283

1.000

C1

0.515

0.539

0.168

0.449

1.000

C2

0.440

0.404

0.166

0.496

0.325

1.000



Discussion

In addition to dictating treatment or providing prognosis, an adequate classification must be reproducible for professionals with different expertise levels. The Roussouly classification for lumbar lordosis was introduced as a tool to analyze the sagittal alignment of the spine while considering pelvic orientation, characterizing an individual biotype.[14]

Roussouly types 1 and 2 have lower sacral tilt (ST) (<35o) and lower angular lordosis, increasing the load at the anterior spine, with a potentially higher association with disc degeneration[3] [6] [7] and chronic low lumbar pain.[9] Type 3 is the most frequent type in asymptomatic populations, even among different ethnicities and age groups.[1] [2] Type 4 has the highest amount of ST (> 45o) and lumbar lordosis, and it is more related to spondylolisthesis and facet overload.[8]

In our study, type 3 lordosis was the most commonly found by the evaluators, which is consistent with previous studies in the asymptomatic population.[1] [2] Roussouly considered this type of lordosis as more physiological.[14]

The classification requires the measurement of spinopelvic angles, which can be performed manually using a goniometer on panoramic, lateral radiographs of the spine including the pelvis and femoral heads; however, the Surgimap software has been validated to facilitate measurement.[19]

Even though the Roussouly classification describes objective criteria depending mainly on measurable, well-defined references, a variation of a single degree in ST may change the type of lordosis. As a result, patients with borderline cutoff values (∼ 35o or 45o) can receive different ratings from different observers or at different measurements from the same observer. In addition, the definition of the lordotic apex may be doubtful, allowing for divergences between types 1 and 2. Thus, the hypothesis that the greater or lesser presence of spines with these characteristics may affect classification reproducibility is valid.

The present study revealed the good reproducibility of the Roussouly classification, since both the intra- and inter-rater agreements were at least reasonable (> 0.20) according to Fleiss kappa coefficients. The intra-rater agreement ranged from reasonable to virtually perfect, whereas the inter-rater agreement ranged from reasonable to moderate.

Evaluator B1 stood out with the lowest intra- and inter-rater agreement; in addition, he was the only one to find a higher prevalence of type 4 lordosis. These differences may be explained by some divergence in the interpretation of the classification, technical measurement errors, or be inherent to the fact that measurements with close values can be classified as different types.

Experience seems to affect the reproducibility of the classification, since the most experienced evaluators (A1 and A2) showed greater intra- and inter-rater agreement. This finding may be explained by the fact that spinal surgeons have greater familiarity with these measurements and understanding of the spinopelvic angles than residents in training.


Conclusion

The Roussouly classification demonstrated good reliability and reproducibility. Intra- and inter-rater agreements were at least reasonable, ranging from substantial to virtually perfect in some situations. Experts with a higher level of experience showed greater intra- and inter-rater agreement.



Conflito de Interesses

Os autores não têm conflito de interesses a declarar.

Financial Support

There was no financial support from public, commercial, or non-profit sources.



Endereço para correspondência

Camila Oda Yamazato
Departamento de Ortopedia e Traumatologia, Escola Paulista de Medicina, Universidade Federal de São Paulo
Rua Botucatu, 740, Térreo, São Paulo, SP, 04023-900
Brasil   

Publication History

Received: 28 September 2020

Accepted: 01 December 2020

Article published online:
13 December 2021

© 2021. Sociedade Brasileira de Ortopedia e Traumatologia. This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commecial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/)

Thieme Revinter Publicações Ltda.
Rua do Matoso 170, Rio de Janeiro, RJ, CEP 20270-135, Brazil


Zoom
Fig. 1 Exemplo de aferição dos ângulos espinopélvicos e eixo vertical sagital (EVS), com o uso do software Surgimap.
Zoom
Fig. 2 Classificação dos tipos de lordose lombar de acordo com Roussouly et al. Tipo 1: A inclinação sacral (IS) é < 35°, o ápice da lordose lombar está localizado no centro do corpo vertebral L5. Tipo 2: A IS é < 35°, o ápice da lordose lombar está localizado na base do corpo vertebral L4. Tipo 3: A IS é entre 35° e 45°. Tipo 4: A IS é > 45°. (Adaptado de Roussouly et al.[15])
Zoom
Fig. 1 Example of spinopelvic angles and vertical sagittal axis measurement using the Surgimap software.
Zoom
Fig. 2 Lumbar lordosis classification according to Roussouly et al. Type 1: The sacral tilt (ST) is < 35°, and the apex of lumbar lordosis is located at the center of the L5 vertebral body. Type 2: The ST is < 35°, and the apex of lumbar lordosis is located at the base of the L4 vertebral body. Type 3: The ST ranges from 35° to 45°. Type 4: The ST is > 45°. (adapted from Roussouly et al.[15]).
Zoom
Fig. 3 Coeficiente Kappa de Fleiss para a concordância intra-avaliador.
Zoom
Fig. 3 Fleiss kappa coefficient for intra-rater agreement.