How to best analyse the entrustment scale data?

By: Ylva Holzhausen

The assessment of “Entrustable professional activities” (EPAs) by means of entrustment–supervision scales represent a major refinement of competency-based curricula (ten Cate, 2005). EPAs define the activities that trainees are expected to perform under a certain level of supervision at the end of a training phase. Worldwide, the potential of EPAs has been recognized and they are being implemented as outcomes in both undergraduate- and postgraduate medical curricula and an impressive amount of literature is published yearly (ten Cate, 2019). For me, there is one major problem that I feel has received little attention yet. Since about 10 years, I been involved in projects around the implementation of EPAs in our undergraduate medical curriculum at the Charité-Universitätsmedizin and one question came up repeatedly since the very beginning: how to best analyse the entrustment scale data?

A dataset in an EPA-related research project usually includes per participant an entrustment rating on a number of EPAs.  If one collects for example multiple assessments of students on a set of 20 EPAs over a year, one could gather a quite extensive dataset. It would be nice and easy to just calculate an average grade in order to track the learning progress, but this is not possible. First, each EPA represents a separate activity with different requirements and it might be possible that one is able to perform EPA X unsupervised, but EPA Y only under direct supervision. Thus, the EPA performances for each EPA need to be displayed separately in order to identify the trainee´s strengths and weaknesses.

Second, the entrustment-supervision scales are also troublesome, because they are ordinal scales (ten Cate et al., 2020). There are different approaches to classify scales, but one commonly known classification was introduce by Stevens (1946). He differentiated scales as being nominal, ordinal, interval or ratio. Dependent on the characteristics of the scale, it is possible to perform certain mathematical operations with the data. Entrustment-supervision scales are classified as ordinal, because they have a fixed ranking of supervision levels representing an increasing level of autonomy of the trainee combined with a decreasing level of supervision. However, the distance between the supervision levels is not equal. For example, the difference between the level “not allowed to perform EPA” and “allowed to perform EPA under direct supervision” might not be perceived as equally significant as the difference between “allowed to perform under direct supervision” and “allowed to perform under indirect supervision”. The perceived differences between the supervision levels are subjective and cannot be quantified.  

So, what to do and not to do with EPA performance data that were gathered on an ordinal scale? A glance into literature does actually not help to clarify the issue. One opinion is that is absolutely not possible to use parametric tests to analyse ordinal data (Kuzon et al, 1996; Jamieson et al., 2004). Calculating a mean or perform a linear regression would thus not be appropriate. However, others argue that it might do no harm under specific circumstances (Norman, 2010). After several discussions with statisticians on this topic, I concluded to perform both parametric and nonparametric inferential analyses and to compare the results. If the analyses yield the same results, I tend to report for example the results of the linear regression, as a linear regression is so much easier to interpret and to report (at least for me) than an ordinal regression analysis (Holzhausen et al., 2019). However, when it comes down to describing the EPA data, I prefer to treat them like ordinal data.  I agree with ten Cate and colleagues’ (2020) statement that an average entrustment-supervision level score of 3,5 is just illogical.

The aim of this blog post is not to answer the question on how to analyse entrustment scale data. It was more on raising awareness for this issue and to elicit discussion, as the quality of research on EPAs could be further enhanced, if we think more clearly about the type of data we have and the appropriate types of analyses to use. 

About the author:
Ylva Holzhausen, M.SC., is a Team lead of the division educational research at the Dieter Scheffner Center for Medical Education and Educational Research, located at the Charité-University Berlin in Germany


1.Holzhausen Y, Maaz A, Marz M, Sehy V, Peters H. Exploring the introduction of entrustment rating scales in an existing objective structured clinical examination. BMC Med Educ. 2019;19(1):319.

2. Jamieson S. Likert scales: how to (ab)use them. Med Educ. 2004;38(12):1217-8.

3. Kuzon W, Urbanchek M, McCabe S. The seven deadly sins of statistical analysis. Ann Plast Surg. 1996;37(3).

4. Norman G. Likert scales, levels of measurement and the “laws” of statistics. Adv Health Sci Educ Theory Pract. 2010;15(5):625-32.

5. Stevens SS. On the Theory of Scales of Measurement. Science. 1946;103(2684):677-80.

6. ten Cate O. Entrustability of professional activities and competency-based training. Med Educ. 2005;39(12):1176-7.

7. ten Cate O. An Updated Primer on Entrustable Professional Activities (EPAs). Rev Bras Educ Med. 2019;43(1 suppl 1):712-20.

8. ten Cate O, Schwartz A, Chen HC. Assessing Trainees and Making Entrustment Decisions: On the Nature and Use of Entrustment-Supervision Scales. Acad Med. 2020;95(11):1662-9.

The views and opinions expressed in this post are those of the author(s) and do not necessarily reflect the official policy or position of The Royal College of Physicians and Surgeons of Canada. For more details on our site disclaimers, please see our ‘About’ page