Skip to main content

Table 2 Quality of the statistical outcomes to determine psychometric properties [57, 62]

From: A systematic review protocol investigating tests for physical or physiological qualities and game-specific skills commonly used in rugby and related sports and their psychometric properties

Measurement property Definition (Rating) quality criteriaa,b
Reliability
 Internal consistency The extent to which items in a (sub)scale are intercorrelated, thus measuring the same construct (+) Factor analyses performed on adequate sample size (7 * # items and >100) AND Cronbach’s alpha(s) calculated per dimension AND Cronbach’s alpha(s) between 0.70 and 0.95
(?) No factor analysis OR doubtful design or method
(−) Cronbach’s alpha(s) 0.70 or O 0.95, despite adequate design and method
(0) No information found on internal consistency
Reproducibility
 Agreement The extent to which the scores on repeated measures are close to each other (absolute measurement error) (+) MIC < SDC OR MIC outside the LOA OR convincing arguments that agreement is acceptable
(?) Doubtful design or method OR (MIC not defined AND no convincing arguments that agreement is acceptable)
(−) MIC > SDC OR MIC equals or inside LOA, despite adequate design and method; (0) No information found on agreement
 Reliability The extent to which patients can be distinguished from each other, despite measurement errors (relative measurement error) (+) ICC > 0.70 OR k > 0.70
(?) Doubtful design or method (e.g. time interval not mentioned)
(−) ICC or weighted Kappa ≤0.70, despite adequate design and method
(0) No information on reliability found
Validity
 Content validity The extent to which the domain of interest is comprehensively sampled by the items in the questionnaire (+) A clear description is provided of the measurement aim, the target population, the concepts that are being measured, and the item selection AND target population and (investigators OR experts) were involved in the item selection
(?) A clear description of the above-mentioned aspects is lacking OR only target population involved OR doubtful design or method
(−) No target population involvement
(0) No information found on target population involvement
 Construct validity The extent to which scores on a particular questionnaire relate to other measures in a manner that is consistent with theoretically derived hypotheses concerning the concepts that are being measured (+) Specific hypotheses were formulated AND at least 75 % of the results are in accordance with these hypotheses
(?) Doubtful design or method (e.g. no hypotheses)
(−) Less than 75 % of hypotheses were confirmed, despite adequate design and methods
(0) No information found on construct validity
Criterion validity (predictive or concurrent The extent to which scores on a particular questionnaire relate to a gold standard (+) Correlation with standard ≥0.70 OR no statistically significant differences between the two tests found OR sensitivity and specificity ≥0.70 OR convincing arguments that gold standard is ‘gold’ AND correlation with gold standard >0.70c
(?) No convincing arguments that gold standard is ‘gold’ OR doubtful design or method
(−) Correlation with standard <0.70 or AUC < 0.70 OR statistically significant differences between outcome measures and gold standard OR sensitivity or specificity <0.70
Responsiveness The ability of a questionnaire to detect clinically important changes over time (+) SDC or SDC < MIC OR MIC outside the LOA OR RR O 1.96 OR AUC > 0.70
(?) Doubtful design or method
(−) SDC or SDC > MIC OR MIC equals or inside LOA OR RR < 1.96 OR AUC < 0.70, despite adequate design and methods
(0) No information found on responsiveness
Floor and ceiling effects The number of respondents who achieved the lowest or highest possible score (+) ≤15 % of the respondents achieved the highest or lowest possible score
(?) Doubtful design or method
(−) >15 % achieved the highest and lowest possible score despite adequate designs and methods
(0) No information found on interpretation
Interpretability The degree to which one can assign qualitative meaning to quantitative scores (+) Mean and SD scores presented of at least 4 relevant subgroups of patients and MIC defined
(?) Doubtful design or method OR less than 4 subgroups OR no MIC defined
(0) No information found on interpretation
  1. MIC minimal important change, SDC smallest detectable change, LOA limits of agreement, ICC intraclass correlation, SD standard deviation
  2. a(+) positive rating; (?) indeterminate rating; (−) negative rating; (0) no information available
  3. bDoubtful design or method = lack of a clear description of the design or methods of the study, sample size smaller than 50 subjects (should be at least 50 in every (subgroup) analysis), or any important methodological weakness in the design or execution of the study
  4. cAdopted from van Bloemendaal et al. [26]