Int J Sports Med 2013; 34(03): 281
DOI: 10.1055/s-0032-1331775
Letter to the Editor
© Georg Thieme Verlag KG Stuttgart · New York

Violation of Statistical Assumptions in a Recent Publication?

Authors

  • M. Wilkinson

  • R. Akenhead

Further Information

Publication History

Publication Date:
18 February 2013 (online)

Preview

Dear Editor,

We wish to express our concerns about the application of statistical procedures in the recent study of Varley and Aughey [4]. The study examined positional differences in acceleration and high-velocity running between 5 playing positions using data collected via GPS from 2 teams of players over the course of one playing season. The methods section of the paper describes how players were recorded between 1 and 11 times over the course of the season and how for the 5 playing positions, data comprised the following: central defenders – 31 recordings from 5 players; wide defenders – 17 recordings from 3 players; central midfielders – 33 recordings from 7 players; wide players – 25 recordings from 6 players and; forwards – 20 recordings from 8 players. Taking multiple recordings from a participant results in dependent data. Dependence arises when one observation will likely be affected by or dependent on a prior observation and is common where repeated measurements are made on single participants. The most common examples of this in sport and exercise science are where participants are measured under each of a number of different conditions or time points and interest lies in mean differences between the time points or conditions. The so-called ‘repeated measures’ general linear models used to analyse these type of data take account of the within-subject correlations between measurements made on an individual and include this source of variance in the statistical model. A crucial assumption of these types of models is that in each condition or at each time point, only a single measurement is taken from each participant. In other words, while obser­vations across conditions within a participant are dependent, observations within a condition are independent (i. e., from sepa­rate participants) [1]. The assumption of independence is one of if not the most important underlying assumption of probability-based statistics.

In the study described above, it is clear that measurements that comprise the groups of positional data include multiple measurements of the same participants. Data are therefore ­dependent within a particular condition/group violating the independence assumption. This type of dependence is not in itself a problem as long as this additional source of (co)variance is accounted for in the statistical model. This could be achieved by ensuring that each participant provided the same number of repeated measurements within each group i. e., that the number of repeated measurements was balanced across all groups. The repeats could then be modelled in addition to the usual sources of between and within group variance, however, the element of balance would have to be present [2]. If the number of repeat observations differed between participants either within a group or between different groups, the model would recognise any participant with less than the highest number of repeated observations as having an incomplete data set. By default, the analysis would delete these participants from the analysis [3]. Even if a balanced design was achieved, neither 1-way ANOVA nor its non-parametric equivalents include by default variance generated by dependent observations of the kind apparent in Varley and Aughey’s study. It would need to be separately added. The methods section of the study in question makes no mention of additional modelling of the dependence described, nor does it appear that the number of repeated observations was balanced between the positional groups. This means that the assumption of independence of observations would be violated and the outcome of any subsequent analysis invalidated.

We have no wish to personally denigrate the authors or in any way to belittle their efforts to address the very worthwhile aim of their research. However, to safeguard confidence in knowledge of this area of research, we feel obliged to inform readers of the study that, because of error in statistical modelling of the data collected, the results and the conclusions drawn from them are questionable.

We suggest that the authors and others attempting to address similar research problems with a similar multiple-measurement approach avail themselves of the linear-mixed modelling approach, where the kind of variance that is problematic in this instance can be easily and simply included in the analysis thus satisfying underlying statistical assumptions and resulting in valid inferences to the population of interest.