The Mystery of the Z-Score

Abstract Reliable methods for measuring the thoracic aorta are critical for determining treatment strategies in aneurysmal disease. Z-scores are a pragmatic alternative to raw diameter sizes commonly used in adult medicine. They are particularly valuable in the pediatric population, who undergo rapid changes in physical development. The advantage of the Z-score is its inclusion of body surface area (BSA) in determining whether an aorta is within normal size limits. Therefore, Z-scores allow us to determine whether true pathology exists, which can be challenging in growing children. In addition, Z-scores allow for thoughtful interpretation of aortic size in different genders, ethnicities, and geographical regions. Despite the advantages of using Z-scores, there are limitations. These include intra- and inter-observer bias, measurement error, and variations between alternative Z-score nomograms and BSA equations. Furthermore, it is unclear how Z-scores change in the normal population over time, which is essential when interpreting serial values. Guidelines for measuring aortic parameters have been developed by the American Society of Echocardiography Pediatric and Congenital Heart Disease Council, which may reduce measurement bias when calculating Z-scores for the aortic root. In addition, web-based Z-score calculators have been developed to aid in efficient Z-score calculations. Despite these advances, clinicians must be mindful of the limitations of Z-scores, especially when used to demonstrate beneficial treatment effect. This review looks to unravel the mystery of the Z-score, with a focus on the thoracic aorta. Here, we will discuss how Z-scores are calculated and the limitations of their use.


Introduction
Z-scores are a means of expressing the deviation of a given anatomic or physical measurement from a size-or age-specific population mean. Z-scores can be applied to echocardiographic measurements, height, weight, and blood pressure, and thus may assist in clinical assessment and decision-making [1].
In diseases that affect the aortic diameter, serial diameter measurements of the aortic root are useful for monitoring disease progression. Z-scores of the aorta diameter are also useful aids in diagnosis and determination of therapeutic effects. The use of Z-scores facilitates the detection of pathological increases in aortic root diameter above that expected due to normal growth, which appears as an increased Z-score over time [2]. We discuss Z-scores in detail in the attached audio-visual presentation.
Centiles (also called percentiles) are a common alternative to Z-scores. They are easy to interpret and have been used to monitor development in pediatrics, including aortic root dilatation. However, centiles are less sensitive to changes in the aortic root diameter, particularly at the extremes [2]. For example, if a hypothetical patient (with a body surface area (BSA) of 1.87 m²) has an aortic root that increases from 3.56 to 3.69 cm (1.3 mm difference), the percentile increases from the 99th to 99.7th%. This difference sounds small, but it corresponds to a Z-score increase of +2.33 to +2.75, which is a more visually obvious difference. Z-scores therefore can quantify growth status outside of the percentile ranges [3]. Z-scores also allow: (i) a standardized measure allowing comparison across different ages, genders, and measures and (ii) a continuous variable allowing generation of summary statistics such as mean and SD.
In adult practice, Z-scores are less commonly used. Instead, aortic root diameter is often reported with respect to a single "normal range. " However, this approach is inaccurate in growing children because the normal range of measurements will be impacted by patient size and age. Therefore, the interpretation of these measurements during childhood presents a unique challenge, specifically in determining whether a given measurement is within the expected range. One approach to the description of clinical and echocardiographic variables is to express measurements in terms of Z-scores. In current practice, there is a lack of understanding of how Z-scores are calculated and interpreted. Here, we review the literature on Z-scores, focusing on application in thoracic aortic aneurysms.

What is a Z-Score?
The Z-score describes how many standard deviations a given measurement lies above or below a size-or age-specific population mean ( Figure 1) [2]. Z-scores are calculated as follows: (1) where χ = the observed measurement, μ = the expected measurement (population mean), and σ = the population standard deviation (adapted from [2]).
A Z-score above the population mean will have a positive value, whereas a Z-score below the population mean will have a negative value. The greater the deviation of the Z-score from zero (in a positive or negative direction), the greater the magnitude of deviation from the mean [2]. A value that is 2 standard deviations above the mean (the 97.7th percentile) will have a Z-score of +2.0. Z-scores make clinical interpretation simple because of the mean of 0 and normal range of -2.0 to +2.0. A change in Z-score value over time is interpreted as a change in the size of the cardiovascular structure beyond what would be expected from the normal growth of that person [4].
For a Z-score to be calculated, the mean and standard deviation for that body structure (e.g., aortic root diameter) must be determined in the population. The mean and standard deviation have been calculated in many individual studies of varying sample sizes. These are empiric observations that are not "written in stone, " but rather vary somewhat among different studies. The individual studies can be used to generate nomograms [5]. This is achieved by selecting a cohort of individuals and calculating their BSA based on one of the available BSA equations. A parameter of interest (e.g., aortic root diameter) is then recorded for each individual, allowing generation of a scatterplot ( Figure 2A) and calculation and plotting of a regression equation and confidence intervals. This scatterplot can then be transformed into a nomogram ( Figure 2B), allowing one to determine the Z-score for an individual patient given their BSA and parameter of interest (e.g., aortic root diameter) [5]. including a variety of ages (infants to adults) and ethnic groups (Black, Hispanic, and White). Because BSA is used in determining the normal distribution of aortic sizes for different ages and body sizes, variations and uncertainties in BSA calculations can have a major impact on the accuracy of Z-scores.

Limitations
Z-scores have significant advantages to alternative methods of measuring aortic diameter, especially in the pediatric population. However, sources of limitations include measurement error, validity of nomograms, inconsistent use of BSA equations (at different ages in a child's development), and our uncertainty of the natural history of Z-scores. These limitations may significantly influence Z-score values and may falsely indicate changes in the size of a structure where true variability does not exist.
There are several formulas available for calculating BSA, which have marked discrepancies in the values they produce and therefore are limited in their accuracy. Furthermore, the validity of the studies used to develop these formulas may be questionable. Often, the studies utilize small sample sizes and do not indicate which patient demographic they represent (see Table 1 for a comparison of the most widely used BSA formulas). In addition, many BSA equations tend to over-or underestimate BSA in certain populations.

Calculation of Z-scores
There are a number of web-based calculation tools for Z-score measurement. The largest is http:// zscore.chboston.org, having collected baseline data over the past 12 years, while www.parameterz.com offers Z-score measurements based on a large number of smaller individual publications. There is also a Z-score calculator available on the Marfan Foundation website (www.marfan.org/dx/zscore) to aid in the detection of a dilated aortic root in an individual with suspected or confirmed Marfan Syndrome. Recently, the Cardio Z App for the iPad/iPhone was made available, revolutionizing the ease with which Z-scores can be calculated in the clinical environment. Z-score values representing the size of the aorta can be determined from the aortic annulus, sinuses of Valsalva, sinotubular junction, and ascending aorta.
BSA has been found to be more useful than age, height, or weight alone for the accurate measurement of the size of different cardiovascular structures [6]. There are a number of different formulas that have been established for the measurement of BSA. The most commonly used formulas include: Haycock, Du Bois, Boyd, Gehan and George, and Mosteller. The Haycock formula [7] (BSA (m 2 ) = weight (kg) 0.5378 × height (cm) 0.3964 × 0.024265) has been recognized as the most accurate method of calculating BSA [8]. This formula was generated from only 81 subjects, limitations in the evidence base of Z-scores.
Z-scores are usually calculated using BSA; however, a weight-only equation also exists for the calculation of BSA (BSA = 0.1023 (weight 0.68 )) [9]. This may be a more convenient tool, but it lacks the valuable adjustment for height in patients, which is a sensitive factor to consider when assessing the aortic diameter. Therefore, clinicians must be mindful of which BSA formula is used when interpreting Z-scores. Furthermore, it is important to be consistent in the choice of Z-score calculator, while also being aware that the accuracy of the specific BSA equation utilized in each Z-score calculation will be affected by changes in body mass and age. The user must keep in mind these  [20]. These factors may contribute to intra-and inter-observer bias and affect the reliability of earlier studies [14]. Even small changes in aortic diameter can represent significant disease progression in Z-score calculations. Together, these factors may lead to inappropriate treatment strategies such as lifelong medical therapy, which can expose patients to unnecessary side effects and financial burden, or high risk surgical interventions. Furthermore, data on the non-pathological natural history of Z-scores is limited. Should the aortic Z-score remain identical in a normal or aneurysmal child from infancy to young adulthood? We simply do not know. Currently, randomized controlled trials (RCTs) investigating aneurysmal pathology rely on Z-score changes as a measure of therapeutic efficacy [21,22]. The natural history of Z-scores in normal and pathological states remains largely unknown, The introduction of web-based Z-score calculators, such as http://www.parameterz.com, has revolutionized the ease with which we can calculate Z-scores in the clinical environment. However, these Z-score calculating programs stratify their data using geographically-specific nomograms. Such geographical studies are not available worldwide, and therefore care must be taken to ensure the most accurate geographical region is used for analysis. One must also remember that these nomograms do not take ethnic diversity into account. Despite recent efforts to improve the accuracy of nomograms, there are still numerical and interpretative uncertainties [4,[10][11][12][13]. Such nomograms may produce widely different Z-score values. This is because many nomograms utilize a small sample size, with an underrepresentation of information across age groups (particularly neonates and premature infants) [14]. There is a lack of complete information on certain cardiovascular structures and racial and gender differences in the literature [14][15][16]. In addition, the use of formalin-fixed pathological specimens to determine base data for nomograms is limited by their availability and may significantly underestimate the dimensions of cardiac structures in vivo, thus producing inappropriate clinical tools [17,18].
To maintain statistical confidence in Z-scores with extreme values, nomograms must adequately represent the heteroscedasticity (change in variance) across body sizes of individuals [2]. Inappropriate averaging of variance may lead to under-or overestimation of Z-score values for children at the extremes of body size [2]. In addition, obesity may skew Z-score data and therefore produce measurement bias when interpreting Z-scores. This is a particular problem in patients with cardiovascular disease. Consequently, an obese patient's Z-score may be an underestimation of the true value. Dallaire et al. [1] explored this problem and suggested that the use of multivariable models with weight and height as independent predictors of Z-scores should be explored to reduce this potential pitfall. Van Kimmenade et al. [19] concluded that, because we are facing an obesity epidemic, the use of Z-scores that correlate with height rather than BSA/weight may be more accurate in evaluating aortic root measurements in those with Marfan Syndrome.
Measurement error can be a significant limiting fac- therefore limiting the meaningful interpretation of Z-scores. Z-scores are commonly used in pediatric settings to evaluate the diameter of the ascending aorta and aortic root. However, raw values of aortic root sizes are usually calculated in adults. The rationale for this is that height stabilizes in adulthood and is unlikely to change over time. However, this is inaccurate, especially in elderly patients who lose height from their young adult maximum. In addition, there is a huge variability in size among the population, which suggests that gender and height may be significant confounding factors when interpreting aortic root values in these patients.
Knowing these limitations, careful interpretation of Z-scores in relation to patients and recognition of information gaps in the literature are essential to improve the clinical interpretation of Z-scores.

Conclusion
In light of the evidence base, Z-scores are a convenient tool for diagnosing and monitoring cardiovascular disease. In addition, they are widely used in RCTs to determine treatment efficacy in aortic aneurysmal disease.
However, there are some notable limitations to the use of Z-scores. All varieties of BSA calculation directly and substantially impact aortic Z-score determination. Some of these limitations can be overcome by calculating Z-scores using consistent and generalizable nomograms. This may require consistent use of specific Z-score nomograms to accurately reflect the structure measured (e.g., aorta) and the gender, race, height, and weight of the patient. Additionally, measurement bias is a contributing factor to inaccuracies when determining aortic root size. To reduce the impact of intra-and inter-observer bias, consistent reporting of aortic root measurements, ideally by experienced technicians, is required, with abnormal measurements reviewed and confirmed by the interpreting cardiologist/cardiothoracic surgeon. As we face an obesity epidemic, it is also important to consider the accuracy of BSA-based Z-score calculations, and whether height-based calculations should be implemented for obese individuals.
We recommend that further investigation be performed into the natural history of Z-scores in nonpathological states, to assure that current interpretations of therapeutic strategies in RCTs are accurate. Specifically, we feel that clear-cut evidence is needed to show that a decreasing Z-score as a pediatric patient ages truly represents a positive therapeutic (pharmacological) effect, and not simply a normal Z-score progression with increasing body size. We have investigations underway on this specific quandary.

Conflict of Interest
We have read and understood the AORTA policy on declaration of interests and declare that we have no competing interests.