By: Martin V. Pusic (https://www.linkedin.com/in/martin-pusic-082b1a6/)
The way we usually think about multiple-choice questions.
You do a lot of multiple-choice questions (MCQs) to become a physician. In North America, you do SATs that get you into a good college. Then you do an MCAT to get you into a good medical school. Then you do the USMLE which (arguably) gets you into a good residency program. And finally you often do a MCQ Board exam to allow you to become a fully qualified, board certified physician. Whew!
These are usually knowledge tests, where the MCQ tests your mastery of some foundational principles. The more able the test taker, the more of these questions of varying difficulty they get “right”. This set up has many advantages. It enables test-enhanced learning which is superior to passive methods of learning. The construct of “deep conceptual understanding” – developed by taking these tests — is a key element of adaptive expertise, allowing the expert to innovate new solutions when confronted with novel situations. (Mylopoulos 2017) What separates a physician is the ability to fall back on these conceptual understandings when pattern recognition fails us.
This deep conceptual understanding is subsumed in “synthetic” assessment frameworks such as the RIME or EPA frameworks, bundling conceptual understanding with the other elements of carrying out a provider’s work. Similarly, portfolios of assessment bring together conceptual MCQ measures of “knowledge” with a whole raft of other evidence to guide the learner’s development. This is all to say we need these MCQs for expertise development, recognizing the dedication of learners who spend years mastering this deep conceptual knowledge guided by the tests placed before them.
However, at the expert end of the development scale, where residents are transitioning to competency or longtime providers are maintaining their certification, the MCQ can seem yesterday’s task. How does answering “difficult” questions relate to practical competence? After all, in Miller’s pyramid, “knowing something” is at the very bottom of the pyramid, about as far as can be from the apex of “doing” and the Workplace Based Assessments (WBA) that are the sine qua non of CBE.(Williams 2016)
With the CBE movement, there has been a welcome shift to outcomes-oriented education, where relevant skill/competencies are clearly identified and then guide the learning process. What can the practitioner actually do? This outcomes orientation makes manifest the tradeoff between studying for conceptual knowledge examinations and spending the time in the workplace, developing more holistic skills.
How do MCQs relate to competency, if at all?
The point of this blog-post however is not to remind you of the “broccoli” form of MCQ, good for you in an indirect way that that requires faith on the part of the person doing the eating. Instead, I want to highlight MCQs that are at the higher level of the Miller conceptualization, that are directly related to clinical skill. That is, instead of the MCQ sampling from questions that range from easy to hard, it samples from a clinically relevant construct that reflects directly on the skill or attitudes of the test taker. In other words, can MCQs be useful at the higher ends of Miller’s Pyramid?
Multiple choice questions are frequently listed as the basis of assessment of the lowest, “Knows” level of Miller’s assessment pyramid. Here we list on the right the typical types of assessments used at the different levels; on the left, by contrast, we list how targeted MCQ instruments can provide information at each level.
Multiple choice questions are frequently listed as the basis of assessment of the lowest, “Knows” level of Miller’s assessment pyramid. Here we list on the right the typical types of assessments used at the different levels; on the left, by contrast, we list how targeted MCQ instruments can provide information at each level.
Figure 1. The MCQ Up and Down Miller’s Pyramid

Multiple choice questions are frequently listed as the basis of assessment of the lowest, “Knows” level of Miller’s assessment pyramid. Here we list on the right the typical types of assessments used at the different levels; on the left, by contrast, we list how targeted MCQ instruments can provide information at each level.
Probably easiest to give you some examples:
- An Obstetrics examination has 25 cases where the test-taker has to declare whether a Caesarian section is indicated. The cases vary in the degree to which a C-section is indicated, from cases where almost every obstetrician would think it indicated, to ones where most would not, through some middle cases where there would be practice variation. The construct being evaluated is “propensity to do C-sections”.
- A pathologist does 25 cases along the full Gleason scale of prostate biopsies, from completely benign through to very malignant. The construct being evaluated is “decision-making in threshold cases between grades of cancer.” (Pusic 2023)
- An emergency physician rates 100 chest radiographs where the task is to identify whether a pneumonia is present. The construct being evaluated is ability to detect pneumonia.(Bregman 2022)
The key in each of these cases is that we have gone from an indirect foundational measure (how much conceptual knowledge do you have) to a direct measure of your clinical decision-making. Are you a high C-section propensity OB? Where do you, as a pathologist, set your thresholds between the Gleason score that separates “watchful waiting” from “start chemo”? What is your Emergency Medicine Physician personal False Negative Rate for pneumonia detection? And how do each of these measures differ from your peers?
MCQs as simulations, as measures.
Notice what happens when we think of MCQs as potentially reflecting a range of constructs in clinical practice. We’re no longer shackled to that lowest rung on a “hierarchy” of goodness as theorized by Miller’s pyramid. Instead, your imagination is the only limit on what these instruments/tests can simulate, can biopsy. There is a science to instrument development that goes well beyond knowledge test development, one that is fully within the wheelhouse of the health sciences.
Each level of Miller’s Pyramid can be a different view on competence, as intended in Programmatic Assessment. The notion of hierarchy was important when we were under-sampling, under-valuing workplace learning in favour of high-stakes conceptual knowledge tests. One of the notable successes of the competency-based movement has been to redress this imbalance. But knowing, showing and demonstrating all have their place. And MCQs are a scalable, efficient probing mechanism.
Wait, are you saying that MCQs can be BETTER than workplace-based assessments?
Not better, different. The strength and weakness of WBA is their inextricability from context. Ask yourself, how often in the workplace have 100 of your colleagues done exactly the same case? In a digital world, we can reproduce a histology slide very, very well. Using a dataset where some of the best pathologists in the world provided calibration data, we could make very fine-grained comment on the individual propensities of residents.(Pusic 2023) Think of well-designed MCQ tests as a kind of wisdom of crowds: wisdom in the selection of cases, wisdom from the accumulated ratings of your peers, wisdom that gets distilled into a personalized profile for the learner or practitioner. Used judiciously, this type of MCQ deserves a good reputation.
Figure 2: Multiple Choice Questions that directly reflect a clinical skill

Whereas MCQs are typically thought of as measuring medical knowledge, instruments can be designed to reflect constructs that are more directly related to clinical skill. Here the profiles of two pathologists are demonstrated after they each classified 50 prostate cancer histology cases along the International Society for Urological Pathologists scale (related to Gleason scale). Each curve represents that propensity of that pathologist to choose a given ISUP rating at that part of the malignancy spectrum. Cases (coloured dots at bottom) fall all along the spectrum. The two pathologists would likely have very different opinions of the case indicated by the dashed line.
Summary
Multiple-choice questions play an important role in assessing and developing clinical competency. Their use in foundational examinations can foster test-enhanced learning and deep conceptual understanding, which are vital for adaptive expertise in medicine. However, MCQs can be more than knowledge tests; they can serve as direct measures of clinical decision-making skills, reflecting real-world constructs such as the skill in diagnosis or clinical decision-making. By simulating clinical scenarios, MCQs provide a scalable, efficient way to assess and compare clinical competencies. These assessments complement workplace-based assessments in a portfolio by offering a unique, context-independent perspective on selected aspects of a practitioner’s competence.
About the author: Martin V. Pusic, MD PhD is an Associate Professor of Pediatrics and Emergency Medicine in the Harvard Medical School as well as Director of the Research Education Foundation at the American Board of Medical Specialties. He practices Pediatric Emergency Medicine at the Boston Children’s Hospital. His research has focused on the collection and analysis of learning analytic data to nurture and certify expertise development in fields ranging from pediatrics to pathology to dermatology.
References
- Bregman S, Thau E, Pusic M, Perez M, Boutis K. A Performance-Based Competency Assessment of Pediatric Chest Radiograph Interpretation Among Practicing Physicians. Journal of Continuing Education in the Health Professions. 2022 May 17:10-97.
- Mylopoulos M, Borschel DT, O’Brien T, Martimianakis S, & Woods NN. Exploring Integration in Action: Competencies as Building Blocks of Expertise. Acad Med. 2017; 92(12):1794–1799
- Pusic MV, Rapkiewicz A, Raykov T, Melamed J. Estimating the Irreducible Uncertainty in Visual Diagnosis: Statistical Modeling of Skill Using Response Models. Medical Decision Making. 2023 Aug;43(6):680-91.
- Williams BW, Byrne PD, Welindt D, Williams MV. Miller’s pyramid and core competency assessment: a study in relationship construct validity. Journal of Continuing Education in the Health Professions. 2016 Oct 1;36(4):295-9.
The views and opinions expressed in this post are those of the author(s) and do not necessarily reflect the official policy or position of The University of Ottawa . For more details on our site disclaimers, please see our ‘About’ page
