#KeyLIMEPodcast 211: Can Assessment Get with the Program?

It’s one of the of the big 5 features of the CMBE era, and it promises to be a better way of organizing assessment… the hosts dig deeper in to this “programmatic assessment” stuff. Check out the podcast here.


KeyLIME Session 211

Listen to the podcast.


New KeyLIME Podcast Episode Image

Bok et al,. Validity evidence for programmatic assessment in competency-based education Perspect Med Educ 2018 Dec;7(6):362-372


Jason Frank (@drjfrank)


The premise of PA is a systematic In van Melle’s framework of the “5 Components of CBME” (pub pending Acad Med 2019), assessment, specifically something called “programmatic assessment” (PA) is one of the big 5 features of the competency-based era. (The other 4 components of CBME are: outcome competencies, sequenced acquisition of expertise, tailored teaching/coaching, and learning activities.)

So, what is this “programmatic assessment” stuff? The promise of PA is that it is a supposed to be better way of organizing assessment. PA is meant to address all the many documented problems with how HPE assessment was done in the 20th century, including:

  • little direct observation
  • few assessment data points
  • few assessors
  • tends to be narrowly focused on 1 or few clinical aspects of competence
  • poor inter-rater reliability
  • rising appeals of assessment decisions
  • confusion between the idea of “summative” vs “formative” assessment
  • different mental models / expectations by each assessor
  • decision-making by 1 or few faculty
  • (see 100s of papers in Medline!)

Purpose and systemic approach to gathering data on a given learner’s progress towards a pre-defined, shared articulation of competence. Based on the foundational work of van der Vleuten, Schuwirth, and others, PA, at its simplest involves a few important elements:

  1. An emphasis on direct observation of actual tasks performed in the workplace (greater workplace-based assessment)
  2. Allows for assessment of many domains of competence
  3. Many raters, over time, each providing recorded assessment data, using multiple assessment methods
  4. Shared mental model of progression of expertise
  5. Assessment for learning, providing information to coach learners to higher ability
  6. Each data point has less impact, because there are so many of them, they are each like a pixel in a larger picture of a learner’s abilities.
  7. Summative decisions are made by a group, not an individual, using a curated and collated set of assessments of all kinds to provide a more holistic view of learner progress (e.g. a competence committee).

Sounds good right? Any validity evidence? Cees van der Vleuten’s work showed that many observations from the workplace was as good or better than many hours of expensive, high-stakes testing. We could use more studies…


Enter the veterinarian clinician-educators. Bok et al set out study if a system of PA could be used “…with a CBE framework to track progression of student learning within and across competencies over time…” They made a few a priori hypotheses that would reflect the effectiveness of PA:

  1. Assessment scores are reliable
  2. Aggregate performance scores should increase over time in proportion to learning
  3. Dominant source of variance in ratings should be attributable to the learners
  4. Scores from multiple assessment methods and multiple competence domains should differentiate performance over time.

Key Points on the Methods

The authors performed a retrospective quantitative analysis of 327,974 assessment data points from 16,575 assessment encounters on 962 Dutch veterinary students over ~6 months. The assessments were designed using a competency-based framework to track progression over time. Assessment methods included MSF, CEX, and self-assessments. Assessments are collected in the workplace using an eportfolio. A 2-person competence committee reviews the curated and graphically rendered portfolio data at the end of training to decide promotion or graduation.

Data were analyzed using time-variable modeling and a multilevel random coefficient model. The study had REB approval.

Key Outcomes

In this study, the 3 workplace assessments displayed high reliability (G coefficient (Ep2)=0.86, 0.88, 0.90 for CEX, MSF, Self). The majority of variance in scores was attributed to students (~40%). Progression appeared to be sigmoidal across all domains.

Key Conclusions

The authors conclude that:

  1. Scores displayed reliability over time;
  2. Scores increased over time, consistent with systematic student learning
  3. Majority of variance in scores was related to the learners
  4. Variance was found between assessment methods > between competencies
  5. All students seemed to benefit from the PA approach

The authors interpreted these findings as providing validity evidence for a PA approach in CBME, given greater variance was attributable to learners (as opposed to raters), there was evidence of progression of abilities over time, and the scores had good reliability.

Spare Keys – other take home points for clinician educators

  1. We need to look for validity evidence. This is an early example of the evidence base for CBME.
  2. The authors noted that they did not seek to address all the Kane validity framework elements, and future studies are needed to provide validity evidence for the system of assessment, involving all the elements.
  3. This study did not address the contribution of narratives, nor did it use EPAs per se.
  4. Cool to see great meded research from veted!


Access KeyLIME podcast archives here