#KeyLIMEpodcast 149: It’s Hard to Predict Things…Especially the Future [KeyLIME Live! @ICRE, Part 2]

Coming to you from ICRE 2017, this is the second recording from KeyLIME Live where we were joined by co-hosts Glenn Regehr and Karen Hauer. Part 2 finds the co-hosts discussing an observational study where the authors assert that the #meded literature lacks studies that provide validity evidence for workplace-based assessments (WBAs).

As any frequent listener knows, the hosts have often discussed the importance of the contemporary shift away from high-stakes, end-of-training exams, to programmatic assessment, an approaching involving a system that emphasizes WBAs sampled in authentic environments, often with direct observation, and collated in meaningful ways for decision-making. Will this paper change their point of view?

Check out the podcast here (or on iTunes!) to find out!


KeyLIME Session 149 – Article under review:

Listen to the podcast

View/download the abstract here.


Naidoo S, Lopes S, Patterson F, Mead HM, MacLeod S. Can colleagues’, patients’ and supervisors’ assessments predict successful completion of postgraduate medical training? Med Educ. 2017. 51(4):423-431.

Reviewer:  Jason Frank (@drjfrank)


Over many episodes of the #KeyLIMEpodcast, we’ve discussed the importance of the contemporary shift away from high-stakes, end-of-training exams, to a more sophisticated approach called programmatic assessment (Schuwirth & van der Vleuten, 2011). This is the approach adopted by leaders in CBME, and involves a system that emphasizes workplace-based assessments (WBAs) sampled in authentic environments, often with direct observation, and collated in meaningful ways for decision-making.

However, the authors of today’s paper assert that the meded literature lacks studies that provide validity evidence for WBAs.


In the setting of a regional UK GP training network, the authors set out to characterize the association between 3 types of WBAs and 3 “outcomes of training”, namely performance on 2 national licensing exams and the need for additional training time.

Type of Paper

Research: Observational study

Key Points on Methods

This a paper whose methods get more complex the further you dive into them. The authors uses a different validity paradigm that those we have discussed previously on the Podcast (namely Kane as applied by Cook to meded). The authors are from meded, health psychology, and organizational psych.

Briefly the authors looked for correlations between their 3 defined outcomes of residency training (they called these “adverse events of training”) and 3 types of WBAs used in the GP program. The 3 outcomes were:

  1. Low score on a 200-item MCQ test (data from 1st attempt)
  2. Low score on a 13-station OSCE (data on 1st attempt)
  3. Requirement to do additional training.

The WBAs were described as:

  1. A patient satisfaction questionnaire (PSQ), 80 patient-scored instruments of 11 items on 7-point Likert scales
  2. A multi-source feedback (MSF) tool, collected on clinical acumen and professionalism among 5 clinicians and 5 non-clinicians
  3. An educational supervisor rating (ESR) completed on a regular basis, looking retroactively over a learner’s portfolio + using the supervisor’s own impressions. The rating is captured using 12 “competency areas” on a dichotomous scale of below expectations or not.

Data on trainees in the East Midlands GP program was collected and then analyzed retrospectively. I didn’t see any mention of ethics review or protection of subjects.

The authors calculated statistical correlations and multiple regressions in order to characterize “predictive validity”.

Comment: there are numerous threats to validity here, not to mention the potential for workup bias. The outcomes chosen are major events in the life of a trainee, but not necessarily in the life of a patient.

Key Outcomes

Data was available for 402 trainees, but each correlation involved about 5% missing data. Trainees were 52% female and 36% IMGs.

The authors report that each of the WBAs correlated positively with the “adverse outcomes of training”. They provided no less than 18 comparisons. Further, they assert being scored “below expectations” by anyone, anytime means that trainee is 10.4x more likely to need additional training.

Key Conclusions

The authors conclude that this study provides “predictive validity evidence” for this WBA scheme and important training outcomes.

Spare Keys – other take home points for Clinician Educators

  1. Kudos to the authors for trying to bring real data sets to #meded validity questions.
  2. This is a fascinating study that appears to treat WBAs as individual, independent predictors of residency outcomes. This is a very different paradigm from emerging CBME assessments schemes that use real-time entrustment, collation, a systems view, group decision-making, and a developmental mindset.
  3. This study is a great one for a group discussion of many contemporary debates about assessment schemes, validity frameworks, threats to validity, and what the important outcomes of #meded studies should be.

Shout out

Thanks to Glenn Regehr from UBC & Karen Hauer from UCSF for their comments during the ICRE 2017 KeyLIME Live! sessions.

Access KeyLIME podcast archives here

Check us out on iTunes