Skip to main content

Table 4 Details of studies where the high-quality tools (n = 6) were validated for use in evaluating EBM teaching in medical education

From: A systematic review and taxonomy of tools for evaluating evidence-based medicine teaching in medical education

Source instrument name and dateInstrument development-number of participants, level of expertiseEBM learning domainsInstrument descriptionEBM stepsPsychometric properties with results of validity and reliability assessment
Berlin questionnaire-Fritsche [7]266 participants—43 experts in evidence-based medicine, 20 controls (medical students) and 203 participants in evidence-based medicine course (USA)Knowledge and skillsBerlin questionnaire was developed to measure basic knowledge about interpreting evidence from healthcare research, skills to relate a clinical problem to a clinical question, the best design to answer it and the ability to use quantitative information from published research to solve specific patient problems. The questions were built around clinical scenarios and has two separate sets of 15 multiple-choice questions mainly focusing on epidemiological knowledge and skills (scores range from 0 to 15)AppraiseContent validity
Internal validity
Responsive validity
Discriminative validity
The two sets of questionnaires were psychometrically equivalent: interclass correlation coefficient for students and experts 0.96 (95% confidence interval 0.92 to 0.98, p < 0.001). Cronbach’s alpha 0.75 for set 1 and 0.82 for set 2. Ability to discriminate between groups with different levels of knowledge by comparing the three groups with varying expertise: The mean score of controls (4.2 (2.2)), course participants (6.3 (2.9)) and experts (11.9 (1.6)) were significantly different (analysis of variance, p < 0.001)
Fresno test-Ramos et al. [6]Family practice residents and faculty member (n = 43); volunteers self-identified as experts in EBM ( n = 53); family practice teachers (n = 19) (USA)Knowledge and skillsFresno test was developed and validated to assess medical professionals’ knowledge and skills. It consists of two clinical scenarios with 12 open-ended questions which are scored with standardised grading rubrics. Calculation skills were assessed by fill in the blank questions.Ask, acquire and appraiseContent validity
Interrater reliability
Internal validity
Discriminative validity
Expert opinion
Interrater correlations ranged from 0.76 to 0.98 for individual items
Cronbach’s alpha was 0.88. ITC ranged 0.47–0.75. Item difficulties ranged from moderate (73%) to difficult (24%). Item discrimination ranged from 0.41 to 0.86. Construct validity, on the 212 point test, the novice mean was 95.6 and the expert mean was 147.5 (p< 0.001)
MacRae [17]Residents in University of Toronto General Surgery Program (n = 44) (Canada)Knowledge and skillsExamination consisted of three articles each followed by a series of short-answer questions and 7-point rating scales to assess study quality.AppraiseContent validity
Interrater reliability
Internal validity
Discriminative validity
Construct validity
Cronbach’s alpha 0.77
Interrater reliability—Pearson product moment correlation coefficient between clinical epidemiologist and non-epidemiologist-0.91 between clinical epidemiologist and nurse 0.78.Construct validity was assessed by comparing scores of those who attended the journal club versus those who did not and by postgraduate year of training (p= 0.02)
Taylor [14]
Bradley et al. [24]
4 groups of healthcare professionals (n = 152 ) with varying degrees of expertise of EBP (UK) Group 1—with no or little prior EBP education
2—undertaken CASP workshop within last 4 weeks; 3—undertaken CASP workshop in the last 12 months; 4—academics currently teaching EBP and attended 1997 Oxford CEBM workshop
Later, Bradley et al. tried with 175 medical students in RCT of self-directed vs workshop-based EBP curricula (Norway)
Knowledge and attitudesQuestionnaire 11mcqs
-true, false, do not know
Correct responses given 1
Incorrect responses scored 1
Do not know 0
Acquire and appraiseContent validity
Internal validity
Responsive validity
Discriminative validity
Cronbach’s alpha (0.72 for knowledge and 0.64 for attitude questions)
Spearman’s correlation (internal consistency), total knowledge and attitudes scores ranged from 0.12 to 0.66, discriminative validity (novice and expert) Responsiveness (instrument able to detect change)
ACE tool- Dragan Ilic [15]342 medical students—98 EBM-novice, 108 EBM-intermediate and 136 EBM-advanced participants (Australia)Knowledge and skillsAssessing Competency in EBM (ACE )tool was developed and validated to evaluate medical trainees’ competency in EBM across knowledge, skills and attitudes—15 items, dochotomous outcome measure; items 1 and 2, asking the answerable question; items 3 and 4, searching literature; items 5–11 critical appraisal; items 12–15 relate to step 4 applying evidence to the patient scenario.Ask, acquire, appraise and applyContent validity
Interrater reliability
Internal validity
Responsive validity
Discriminative validity
Construct validity—statistically significant linear trend for sequentially improved mean score corresponding to the level of training (p< 0.0001)
Item difficulty ranged from 36 to 84%, internal reliability ranged from 0.14 to 0.20, item discrimination ranged from 0.37 to 0.84, Cronbach’s alpha coefficient for internal consistency was 0.69
Kortekaas-Utrecht questionnaire [16] (original questionnaire in Dutch, English version now available)Postgraduate GP trainees (n=219), hospital trainees (n = 20), GP supervisors (n=20) academic GPs or clinical epidemiologists (n = 8) (Netherlands)KnowledgeUtrecht questionnaire on knowledge on clinical epidemiology (U-CEP): two sets of 25 questions and a combined set of 50Ask, appraise and applyContent validity
Internal validity
Responsive validity
Discriminative validity
Content validity—expert opinion and survey
Construct validity—significant difference in mean score between experts, trainees and supervisors
Internal consistency—Cronbach alpha 0.79 for set A, 0.80 for set B and 0.89 for combined
Responsive validity—significantly higher mean scores after EBM training than before EBM training
Internal reliability—ITC using Pearson product, median 0.22 for set A, 0.26 for set B and 0.24 for combined Item Discrimination ability—median-0.35 for set A, 0.43 for set B and 0.37 for combined
  1. ITC item total correlation, RCT randomised controlled trial, CASP critical appraisal skills program, UCEP Utrecht questionnaire on knowledge on clinical epidemiology for evidence-based practice