I have a bee in my bonnet: validity of entrustment decisions

By: Dr. Claire Touchie (@Drctouchie)

I have a bee in my bonnet about validity of entrustment decisions. That is there are things that bother me about how validity of entrustment decisions is being approached in methods and/or being discussed in manuscripts.

Let me start with a disclaimer. I am a clinician-educator with over 25 years of experience in high-stakes assessments but I am not a psychometrician or measurement expert so I will welcome rebuttals to the arguments I will put forth.

For the sake of this blog, there are two points I wish to touch on:

  1. It is not all about numbers. A reputed psychometrician once told me that “it was all about the numbers”. Here I disagree. When we talk about workplace-based assessments (WBAs), we should not be placing emphasis on the numbers. The numbers ascribed on entrustment scales (usually 1-5) are meaningless. They help us with the order of the scale but are not themselves a score. When we have a series of individual assessments, we are looking to see if a learner is progressing over time, we are not looking at averaging performances or of internal consistency of score. I will even argue here that using these numbers to look at interrater reliability is also meaningless. The raters do not see the same performance in the same clinical situation, even for the same entrustable professional activity (EPA). Finally, although we need evidence that a learner has done a sufficient number of EPAs before they are ready to be entrusted without supervision, that number is not an absolute one (Harvey et al, 2023). It will often depend on other factors such as risk of the EPA to the patient or ability to predict how a learner will perform prospectively the next time they encounter the situation. 
  2. Is validity really nothing without reliability? In looking at WBAs and entrustment, I think this depends on the type of reliability we are talking about. Reliability of the scores or reliability of the decisions? Since I argue that the numbers on the entrustment scales are meaningless then it is hard to make meaning of the reliability of “scores”. I believe one has to be cautious when using generalizability theory to calculate reliability of EPA assessments. As the assessments are gathered over time and that we expect learners to be improving, the assumption that the measured attributes are stable over time does not hold true. We should think of using alternative methods such as growth rate reliability and growth curve reliability to estimate longitudinal consistency (Park et al, 2021). However, there is a role in estimating the reliability of entrustment decisions themselves. This could be done using decision consistency approaches, that is making sure that the data used to make the decision on learner entrustment is reproducible regardless of who sits on the committee.

There is much more to be said but this is a taste of what irks me. Maybe it is time that I sit and write a perspective on the topic…

About the Author

Dr. Claire Touchie (MD, MHPE, FRCPC), is the Associate Director of Educator Development at the Centre for Innovation in Medical Education University of Ottawa, Ottawa, Canada.

References

  1. Harvey A, Paget M, McLaughlin K, Busche K, Touchie C, Naugler C, Desy J. How much is enough? Proposing achievement thresholds for core EPAs of graduating medical students in Canada. Medical Teacher 2023;45(9):1054-1060
  2. Park YS, Hamstra SJ, Yamakazi K, Holmboe E. Longitudinal reliability of milestones-based learning trajectories in family medicine residents. JAMA Network Open 2021;4(12): e2137179

The views and opinions expressed in this post are those of the author(s) and do not necessarily reflect the official policy or position of The University of Ottawa . For more details on our site disclaimers, please see our ‘About’ page