Int J Sports Med 2009; 30(11): 773-774
DOI: 10.1055/s-0029-1241216
Editorial

© Georg Thieme Verlag KG Stuttgart · New York

The ‘So What’ Factor: Statistical versus Cinical Significance

C. Stapleton, M. A. Scott, G. Atkinson
Further Information

Publication History

Publication Date:
29 October 2009 (online)

One of the most common editorial requests to IJSM authors is to add descriptive information to a study abstract so that readers can judge the practical or applied value of a study finding; traditionally known as clinical significance [7]. Prospective authors are not yet advised formally to routinely cite confidence intervals, but this stipulation is likely to be made in the near future. Therefore, the purpose of this ‘Statistical Note’, the first of a series to be published over the next few volumes of IJSM, is to encourage prospective authors to interpret the practical significance of their findings via the confidence interval approach.

In order to establish whether an intervention or treatment has been responsible for an observed effect, authors traditionally employ statistical significance tests. The sole use of significance tests in this context might not offer any information about how important the study findings are. Identifying a primary outcome variable that is most important and a value for this outcome variable which would reflect clinical significance prior to the start of an investigation is desirable. Confidence intervals can then be calculated and these statistics can be more informative than the sole use of probability values when trying to make a decision about whether a treatment has clinical significance.

Statistical significance makes no reference to the clinical value of an intervention. A significance test indicates the “rareness” of an event given that the null hypothesis is true [3]. In other words and using the example of a cross-sectional study on two groups, it tells us the probability with which an observed mean difference would occur if we were simply taking two random samples from the same population. A large mean difference between two samples taken at random from the same population is less likely to occur than a small mean difference. If a probability of less than 5% is obtained then it is common practice to reject the null hypothesis and conclude that the two groups have different population means with a 5% chance that a type I error may have occurred (i. e., stating there was a difference between the two populations when in reality there isn’t one).

One of the major weaknesses of significance tests is that they give little indication of the actual value of the population parameter of interest. In the example just provided, this would have been the actual “real” magnitude of difference (if there was one) between the two population means. It is entirely possible to have a “real” difference between groups that is, in clinical terms, trivial [2]. It is also possible to have a real difference between groups that a significance test fails to identify. This is known as a type II error. Type II errors can be predicted by conducting a power analysis prior to conducting the investigation and thereby the chance that a Type II error occurs can be reduced by ensuring an adequate sample size.

A confidence interval indicates the “(im)precision” with which a sample value estimates the population value [5]. In the example given previously, the parameter of interest is the difference between the two population means. The difference between sample means is referred to as a point estimate of this parameter. The confidence interval (or interval estimate) has both an upper value and a lower value, referred to as bounds or limits. The confidence interval includes all the possible values between the limits [2]. Usually, confidence intervals are reported at the 95% level of confidence which means there is a 95% chance that the real mean difference is encapsulated by the upper and lower limit. A confidence interval which includes zero could be interpreted as evidence that the real difference between population means is zero and treatment reported as having no effect. However, as stated in the American Physiological Society Guidelines for Reporting Statistics [4], “if either bound of the confidence interval is important from a scientific perspective, then the experimental effect may be large enough to be relevant”.

Hopkins and Batterham [6] provide a more detailed approach to using confidence intervals to determining if a treatment is clinically beneficial. Prior to conducting the investigation the researcher needs to identify two values for the primary outcome variable, one which indicates whether the treatment caused a beneficial change, the other indicates a harmful change. The region in between these two values would be considered as reflecting trivial differences between means. The location of the confidence interval in relation to these three regions can help the researcher to draw conclusions about the clinical significance. For example, if both the upper and lower bounds of the confidence interval are beyond the value deemed to be beneficial, the decision that the treatment is “almost certainly beneficial” can be made. In a similar manner, if the lower bound of the confidence interval does not go beyond zero and the upper bound does not go beyond the value identified as beneficial then a real difference between population means may exist but it is trivial in a clinical context.

Although superior to using p-values alone, there is an important interpretational caveat with using confidence intervals. As Bland [2] points out, the real mean difference between the two populations is not a random variable. It does not change between samples. What will change between samples are the values for the 95% confidence interval, but in the long run 95% of the samples taken at random will result in an interval that includes the true difference between the population means [2]. Therefore, there is a 5% chance that a confidence interval does not include the population parameter. The width of the 95% confidence interval can be reduced by increasing the sample size. Similar to significance tests, the sample size required to achieve a specific level of precision can be estimated prior to conducting the investigation (see Batterham and Atkinson [1]). However, there will still be a 5% chance of the interval not containing the true population value.

In summary, interpreting a p-value in isolation may not always indicate a clinically important difference. We strongly encourage potential authors of IJSM manuscripts to identify at the design stage of the investigation, the magnitude of what would be a meaningful difference in the clinical setting. The location of the confidence interval in relation to this meaningful difference should then be considered when drawing a conclusion.

References

  • 1 Batterham AM, Atkinson G. How big does my sample need to be? A primer on the murky world of sample size estimation.  Phys Ther Sport. 2005;  6 153-163
  • 2 Bland M. An Introduction to Medical Statistics. Oxford: Oxford University Press 2000
  • 3 Carver RP. The case against statistical significance testing.  Harv Educ Rev. 1978;  48 378-399
  • 4 Curran-Everett D, Benos DJ. Guidelines for reporting statistics in journals published by the American Physiological Society.  Physiol Genomics. 2004;  18 249-251
  • 5 Gardner MJ, Altman DG. Estimating with confidence. In: Altman DG, Machin D, Bryant TN, Gardner MJ, eds. Statistics with Confidence. Bristol: BMJ Books 2000: 3-5
  • 6 Hopkins W, Batterham AM. Making meaningful inferences about magnitudes.  Int J Sports Physiol Perform. 2006;  1 50-57
  • 7 Houle TT, Stump DA. Statistical significance versus clinical significance.  Semin Cardiothorac Vasc Anesth. 2008;  12 5-6
    >