#KeyLIMEPodcast 79: Why do assessors fail to agree when observing learners?

The Key Literature in Medical Education paper this week captures one of the biggest debates (maybe the wrong term, discussion? point of uncertainty? area of research?) in the assessment literature – what is the source of variance in direct observation assessment.

Research that I have been involved suggests that the variance comes from the assessors. (See here for more details). This finding doesn’t seem to be much of a surprise. The more important question – particularly if we’re going to heavily promote direct observation assessment – is how do we decrease this variance to ensure a valid assessment program.

There are three prominent view points.

  1. The frame of reference of the assessor is internal and idiosyncratic. For more info see here.
  2. The assessor is influenced by prior experiences with other learners. For more info see here.
  3. There is no single “true” perspective and the variance from raters represents different perspectives that when collectively assimilated more accurately reflect “truth.” For more info see here.

The KeyLIME podcast tackles a very unique review paper on this topic. It is unique in that the author list includes the thought leaders that represent each of these different perspectives on the sources of assessor variance in direct observation.

If you want to get up to speed on where the literature is at on this important topic, this is a great primer. Even better, checkout the podcast where we unpack the details even more.

– Jonathan (@sherbino)

(PS. We’re blaming the frigid temperatures at the ICE Blog HQ this past month for a mixup in the KeyLIME schedule. The abstract below was published 2 weeks ago, in place of the abstract on digital scholarship that we posted. The sequence was wrong, but all of the content is accounted for. Sorry. Our computers froze… literally, not virtually.)


KeyLIME Session 79 – Article under review:

Fri post_KeyLIME (Dr. Padmos)

Listen to the podcast

View/download the abstract here.

Gingerich A, Kogan J, Yeates P, Govaerts M, Holmboe E. Seeing the ‘black box’ differently: assessor cognition from three research perspectives, Medical Education (Nov 2014), 48 (11):

Reviewer: Linda Snell

Assessor cognition research is a new medical education domain that investigates assessors’ cognitive processes and their impact on assessment quality. The field developed in part as when psychometrics are used to analyze performance assessments, often a greater amount of variance in ratings can be accounted for by the assessors (i.e. rater variance) than the trainees (i.e. true score variance).

The authors focus on one type of performance assessment, workplace-based assessment (WBA), well described as incorporating the ‘assessment of complex clinical tasks within day-to-day practice through direct observation of trainees as they authentically interact with patients in real clinical settings. Direct observation provides information and data to inform judgments about trainee progress. Workplace-based assessment has become an essential component of medical education because, ultimately, clinical supervisors must be able to determine if a trainee can be entrusted with the tasks or activities critical to the profession.’

With better understanding the limitations and strengths of the cognitive processes used by assessors*, modifications in assessment practices might improve the defensibility of assessment decisions and the learning value of formative feedback.

*They use assessors rather than raters as a more general term to include qualitative and quantitative measures

Type of paper
A conceptual review paper with expert consensus. This paper is not a systematic review

Key Points on the Methods

Three distinct, although not mutually exclusive, perspectives on assessor cognition:

(i) the assessor as trainable: assessors vary because they do not apply assessment criteria correctly, use varied frames of reference and make unjustified inferences. This view describes potentially controllable cognitive processes invoked during assessment and draws on behavioral learning theory to frame an approach to reduce unwanted variability in assessors’ assessments through faculty training.

(ii) the assessor as fallible: variations arise as a result of fundamental limitations in human cognition that mean assessors are readily and haphazardly influenced by their immediate context. This view draws on social psychology research and focuses on identifying the automatic and unavoidable biases of human cognition so that
assessment systems can compensate for them.

(iii) the assessor as meaningfully idiosyncratic: experts can make sense of highly complex and nuanced scenarios through inference and contextual sensitivity, which suggests assessor differences may represent legitimate experience-based interpretations. This view draws from socio-cultural theory and expertise literature and proposes that variability in judgments could provide useful assessment information within a radically different assessment design.

Areas of ‘concordance’ and implications:
-all perspectives require assessors to observe trainees interacting with patients
-all recognize that the current quantity and frequency of observation-based assessment is less than ideal.
This is a serious deficiency in assessment programs, and to improve WBA institutions must provide support and ensure that teachers actually do it.
-teachers must maintain their own clinical competence, while concomitantly developing expertise as assessors.
Faculty development for assessors should include training for clinical skills development in addition to training in how to assess those skills.
-To maximize strengths and minimize weaknesses of assessor cognition we need
-a robust sampling of trainee tasks and a robust sample of assessors to improve reliability, validity, quality and defensibility of decisions; and
-group discussions among assessors and decision makers to provide opportunities to synthesize assessment data to create a clearer picture of a trainee’s overall performance and to allow both consistent and variable judgments to be explored and better understood.

Rather than look at areas of discordance, the authors identified circumstances in which the strengths of a particular perspective may be particularly advantageous. E.g. for clinical encounters that can be highly influenced by contextual factors, an assessment system that can accommodate variability and expertise in assessors’ judgments may be appropriate, even valuable.

All perspectives will need to address changing clinical care delivery models such that:
-working in teams will affect how we think about the assessment of individuals,
-judgments of competence will be made through a group process

Key Conclusions
The authors conclude that when considered separately, each proposes a reasonable and logical view of assessor cognition. When considered simultaneously, the three perspectives may seem initially to be irreconcilable. However the goal is to critically reflect on and strategically incorporate both the concordant and discordant views presented by each perspective to enhance the quality of assessments.

Spare Keys – other take home points for clinician educators
Look what happens when you put great brains in a room!

Listen to the podcast here