Int J Sports Med 2010; 31(12): 841-842
DOI: 10.1055/s-0030-1268491
Editorial

© Georg Thieme Verlag KG Stuttgart · New York

Exploring Data Distribution Prior to Analysis: Benefits and Pitfalls

G. Atkinson1 , C. Pugh1 , M. A. Scott1
  • 1Research Institute for Sport and Exercise Sciences, Liverpool John Moores University, Liverpool, United Kingdom
Further Information

Publication History

Publication Date:
16 December 2010 (online)

The choice of appropriate statistics is important for all research and should be based on the study design, the research question and whether assumptions about the data are upheld or not. Authors commonly select so-called ‘parametric’ tests, which are associated with several assumptions about the data, e. g. that data are normally distributed in the population of interest. Thankfully, many authors who submit manuscripts to IJSM check that their data meets the necessary assumptions for use of parametric statistical analyses. Nevertheless, some authors do not complete this important step, and are, therefore, asked by editors to justify their choice of statistics. Those authors who do check underlying statistical assumptions sometimes do so using an inappropriate aspect of the data and/or believe that the only solution to violation of parametric assumptions is the adoption of non-parametric ‘ranked’ tests. The aim of this statistical note is to illustrate the importance of checking assumptions with the correct aspect of the data and arriving at the correct choice of analysis, which may involve a mathematical transformation of the data prior to analysis.

In [Table 1] , some hypothetical data are presented to represent the salivary melatonin concentrations of 9 healthy people measured before and after a bright light intervention in the evening. It is hypothesised that melatonin concentration is reduced following bright light exposure [2]. Most readers will recognise that the most appropriate parametric test for this single-group pre/post design is a paired t-test. However, it is important to explore whether the underlying assumptions for this test are upheld or not. In my experience, it is often believed that the baseline and follow-up data-sets should be normally distributed for the paired t-test to be appropriate. This is not so. The necessary assumption for the paired t-test is that the change scores or ‘residuals’ are sampled from a normally distributed population [6] and this is the case for many statistical tests of the repeated measures type. These change scores (baseline minus follow-up data) are presented in the third column of [Table 1] .

Table 1 Salivary melatonin concentration (pg/ml) measured at baseline and following exposure to bright light in 9 healthy participants (hypothetical data). Subject Baseline Follow-up Change (baseline – follow-up) 1 13.60 7.30 6.30 2 15.40 11.70 3.70 3 21.20 27.10 −5.90 4 6.40 3.10 3.30 5 37.20 30.80 6.40 6 34.60 33.10 1.50 7 15.90 7.70 8.20 8 4.20 1.90 2.30 9 6.10 2.90 3.20 mean (SD) 17.2 (12.0) 14.0 (12.7) 3.2 (4.1)

The exploration of the underlying distribution of data is itself a thorny issue, especially when sample sizes are small [3] [4]. One approach is to examine, using the Shapiro-Wilks test for example, the null hypothesis that the change scores have been sampled from a normally distributed population. For the change scores presented in [Table 1] , the Shapiro-Wilks test is not statistically significant (P=0.14). Therefore, it seems reasonable to assume a normally distributed population of change scores and proceed to examine the difference between the mean baseline and follow-up salivary melatonin concentration with a parametric paired t-test. This test is statistically significant (P=0.044) for the data in [Table 1] , indicating that there is a change in mean salivary melatonin concentration, the 95% confidence interval for this change being 0.10–6.34 pg/ml.

The distribution of real data is often not Gaussian, and it is interesting to examine what happens when an ‘outlier’ is present in the sample. For example, let us assume that data have been collected from an additional participant ([Table 2] ). This tenth participant shows a very large change in melatonin concentration between baseline and follow-up measurements. Therefore, it would be tempting to think that this would make the mean change of the sample greater than before and the P-value for statistical significance smaller (and the confidence interval narrower). However, the fundamental signal-to-noise nature of statistical hypothesis tests means that this is not the case at all. When the data in [Table 2] are analysed with a paired t-test as before, the P-value is now no longer statistically significant (P=0.09) and the confidence interval of −1.20 to13.84 pg/ml is now much wider than before and overlaps zero, even though the mean change has doubled to 6.3 pg/ml. Why is this?

Table 2 Salivary melatonin concentration (pg/ml) measured at baseline and following exposure to bright light in 10 healthy participants (hypothetical data). Sample now includes one ‘outlier’ participant 10. Subject Baseline Follow-up Change (baseline – follow-up) 1 13.60 7.30 6.30 2 15.40 11.70 3.70 3 21.20 27.10 −5.90 4 6.40 3.10 3.30 5 37.20 30.80 6.40 6 34.60 33.10 1.50 7 15.90 7.70 8.20 8 4.20 1.90 2.30 9 6.10 2.90 3.20 10 59.90 25.70 34.20 mean (SD) 21.5 (17.6) 15.1 (12.6) 6.3 (10.5)

It can be seen in [Table 2] that, although the mean change is larger than before, the standard deviation of the change scores is also much larger than before (10.5 vs. 4.1 pg/ml), i. e. the inclusion of the outlier subject has essentially increased the error variance in the statistical analysis so that statistical significance is not reached and the confidence interval is less precise However, it would be erroneous to conclude from the results of the paired t-test that the mean change is not statistically significant in this sample of 10 participants. Essentially, the wrong test has been selected for the distribution of data that are being analysed. To explain, the Shapiro-Wilks test of normal distribution is now statistically significant for the data in [Table 2] (P=0.001). Inclusion of the outlier subject has skewed the dataset, so much so that it is no longer likely to have come from a normally distributed population. One solution to this problem is to employ non-parametric analyses, a ‘mirror’ non-parametric test to a paired t-test being the Wilcoxon test. This test indicates that there has in fact been a significant change in the data between baseline and follow-up measurements (P=0.028).

There are several disadvantages associated with non-parametric tests. First, the fundamental lack of a statistical parameter makes it difficult to describe the magnitude of effect [1], and it is for this reason that we are unable to present a confidence interval with the Wilcoxon test. As discussed in a previous statistical note in IJSM, effect size and confidence interval calculations are important for appraisal of practical significance of study findings [5]. Second, non-parametric tests are associated with very low statistical power when small samples are studied [1] [3].

An often overlooked approach to analysing non-Gaussian data is to mathematically ‘transform’ the data to a normal distribution [6]. For example, the data in [Table 2] can be logarithmically transformed, which often ‘corrects’ a skewed dataset. When these data are logged, the Shapiro-Wilks test is now not significant (P=0.08). When the paired t-test is applied to these transformed data, the mean change is found to statistically significant (P=0.003). This P-value is lower than that obtained from the non-parametric Wilcoxon test, which is consistent with reports that transforming data prior to analysis is generally a more powerful approach than non-parametric testing, especially with small samples [1] [3]. Using the back-transformation methods described by Bland and Altman [7], the geometric mean change and associated confidence interval can also be calculated as 1.6 pg/ml (95%CI: 1.22–2.10). Note how this geometric mean and the associated confidence interval are smaller than those calculated from the raw data. This is because the outlier in the raw data exerted a large influence on the arithmetic mean and sampling error [7].

In this statistical note, we have demonstrated the effect of an outlier on study conclusions. It is, at first, perplexing to see how the addition of an outlier participant, who shows a very large change in the hypothesised direction, leads paradoxically to a type II (false negative statistical error) and less precision (wider confidence intervals). But when one appreciates the fundamental ‘signal-to-noise ratio’ nature of statistical testing, it is clear how this can occur. We recommend that all authors check the underlying assumptions for their selected statistical analyses, and consider transforming non-Gaussian data, in order to avoid such conclusion errors.

References

  • 1 Altman DG, Bland JM. Parametric vs. non-parametric methods for data analysis.  Br Med J. 2009;  339 170
  • 2 Atkinson G, Barr D, Chester N, Drust B, Gregson W, Reilly T, Waterhouse J. Bright light and thermoregulatory responses to exercise.  Int J Sports Med. 2008;  29 188-193
  • 3 Bland JM, Altman DG. Analysis of continuous data from small samples.  Br Med J. 2009;  339 961
  • 4 Field A. Discovering Statistics Using SPSS. London: Sage; 2009
  • 5 Stapleton C, Scott M, Atkinson G. The ‘so what ’ factor: statistical versus clinical significance.  Int J Sports Med. 2009;  30 773-774
  • 6 Zar JH. Biostatistical Analysis. London: Prentice Hall; 1999
  • 7 Bland JM, Altman DG. Transformations, means and confidence intervals.  Br Med J. 1996;  312 1079

Correspondence

Prof. Greg Atkinson

Research Institute for Sport and

Exercise Sciences

Liverpool John Moores

University

Rm 102, Tom Reilly Building

Byrom Street

Liverpool L3 3AF

United Kingdom

Email: G.Atkinson@ljmu.ac.uk