Effect sizes can be misleading: is it time to change the way we measure change?

Hobart, JC; Cano, SJ; Thompson, AJ; (2010) Effect sizes can be misleading: is it time to change the way we measure change? J NEUROL NEUROSUR PS , 81 (9) 1044 - 1048. 10.1136/jnnp.2009.201392.

Objectives Previous comparisons of the ability to detect change in the Barthel Index (BI) and Functional Independence Measure motor scale (FIMm) have implied these two scales are equally responsive when examined using traditional effect size statistics. Clinically, this is counterintuitive as the FIMm has greater potential to detect change than the BI and raises concerns about the validity of effect size statistics as indicators of rating scale responsiveness. To examine these concerns, in this study a sophisticated psychometric analysis was applied, Rasch measurement to BI and FIMm data.Methods BI and FIMm data were examined from 976 people at a single neurorehabilitation unit. Rasch analysis was used to compare the responsiveness of the BI and FIMm at the group comparison level ( effect sizes, relative efficiency, relative precision) and for each individual person in the sample by computing the significance of their change.Results Group level analyses from both interval measurements and ordinal scores implied the BI and FIMm had equivalent responsiveness ( BI and FIMm effect size ranges -0.82 to -1.12 and -0.77 to -1.05, respectively). However, individual person level analyses indicated that the FIMm detected significant improvement in almost twice as many people as the BI (50%, n=496 vs 31%, n=298), and recorded less people as unchanged on discharge ( FIMm 4%, n=38; BI 12%, n=115). This difference was found to be statistically significant (chi(2)=273.81; p<0.000).Conclusions These findings demonstrate that effect size calculations are limited and potentially misleading indicators of rating scale responsiveness at the group comparison level. Rasch analysis at the individual person level showed the superior responsiveness of the FIMm, supporting clinical expectation, and its added value as a method for examining and comparing rating scale responsiveness.

