#KEYLIMEPODCAST 335: Can Narrative Comments Provide Educational GPS?

Our entire modern age is desperate for prediction, and medical education is no exception. With this in mind, programmatic assessment was created with a strong emphasis on using feedback to optimize individual learning by identifying high and low performances early in various stages of training.  Enter Kelleher et. al., whom Jon calls “the next generation of narrative assessment”, with their study which looks to determine identifiable patterns that may predict who will receive lower quantitative entrustment ratings over the course of training.

Listen here to learn more about what Jon calls “A great example of a local collaboration across hospitals and departments.”


KeyLIME Session 335

Listen to the podcast


Kelleher et. al., Warnings in early narrative assessment that might predict performance in residency: signal froman internal medicine residency program. Perspect Med Educ. September 2021.


Jon Sherbino (@sherbino)


This modern age is desperate for prediction.  Waze (my GPS app) tells me how long I will be stuck in traffic. (Answer: too long). The weather network predicts the chance of precipitation hourly. (Answer: too much) Even my watch predicts my health based on heart rate variability, quality of sleep, and screen time. (Answer: …well, let’s not talk about that.)

The modern medical education age is also desperate for prediction.  The promise of (so-called) big data and programmatic assessment is the ability to identify high and low performers early in various stages of training.  What is your take on the editorials calling for early warning assessment systems?

In Episode 219 we looked at Shelley Ross’ work on field notes as a way to decrease remediation, but this work was not able to predict at an individual level. In Episode 248 we examined a key-word algorithm to identify residents in difficulty, but this work was better at ruling-out residents than identifying residents in difficulty. Enter the Cinnci crew (I just trademarked this nickname so I hope Ben, Eric and Dan can roll with it.) and the next generation of narrative assessment.

The goal of this study is to aid competency committee members from overreacting to specific comments with the implementation of remediation programs that may be unnecessary OR underreacting and not intervening, while waiting for more data, despite valuable time passing to help a struggling resident.


From the authors:
“…we aimed to explore the… narrative comments from WPBAs …] to determine identifiable patterns that subsequently might predict who will receive lower quantitative entrustment ratings over the course of training.”

Key Points on the Methods

The study was conducted at a single centre internal medicine residency program with 89 residents across 3 years.  Workplace-based assessments (WBA) are completed by faculty, peers and other health professionals using a 5-point entrustment scale plus narrative comments. A student receives ~3700 WBA assessments over 3 years of training.

WBA scores are transformed into Z-scores (a standardized score that measures deviation from the mean) using the delta between the expected WBA score and the observed score via a previously derived model.

26 residents with a cumulative z-score of 1 SD below mean were compared to 13 randomly selected residents with a z-score at or above the mean.

Narrative data for the first 6 months of training was aggregated for all 39 residents.

An inductive thematic analysis was performed examining explicit (i.e.  surface) and implicit (i.e. deeper meaning or assumptions) meaning.  In phase 1 2 authors, blinded to the groups, developed a code book that included categories and related codes. They engaged with 3 other authors to achieve consensus on the code book in an iterative fashion.  In phase 2 2 authors unblinded the data and regrouped according to lower entrustment and comparison group. Then 4 different investigators analyzed the two groups of data (now re-blinded) to identify comparisons and contrasts between codes and categories.

Key Outcomes

6 themes were identified in the lower entrustment group.

Similarities between groups included:

  • Need to further knowledge acquisition
  • Need to broaden DDx
  • Increase confidence
  • Improve efficiency in documentation and work flow
  • Gain more clinical experience

Key Conclusions

The authors conclude…

“Many faculty members describe their reading of narrative data as scanning for red flags, usually in the form of words or phrases. While data suggesting extreme outlier performance can usually help identify residents with performance concerns, [Competence Committees] often rely on accumulated data and trends, both of which take time and potentially delay early identification. Our findings could aid [Competence Committees]  in their incorporation of narrative comments to support specific residents that may benefit most from earlier intervention.”

Spare Keys – other take home points for clinician educators

Hat tip to a program of research that continues to develop contemporary understanding of learning analytics, building on previous transformational work.  A great example of a local collaboration across hospitals and departments (pediatrics and medicine).

Access KeyLIME podcast archives here

The views and opinions expressed in this post and podcast episode are those of the host(s) and do not necessarily reflect the official policy or position of The Royal College of Physicians and Surgeons of Canada. For more details on our site disclaimers, please see our ‘About’ page