Understanding the interaction of the ‘truths’ of validity in #MedEd studies [Part 1]

By Damian Roland (@Damian_Roland)

Validity is a commonly used term in research and education. However it is often misunderstood [1] and there is no one universal definition. Messick 1989 [2] described validity as the “degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of interpretations and actions based on test scores“. Here Messick is expressing validity as a descriptor of how ‘truthful’ a statement about an intervention is based on the sampling or testing method used. While this is approach is supported by some [3] the literature on clinical assessment in medical education has used numerous different interpretations [4]. In practical terms good validity is a method that ensures there is not an alternative cause, or causes, that explain the results of experiments [5]. The methodology of determining this has developed considerably in the last 50 years. Initially dichotomous decisions (yes/no) of validity were used in the psychometric era of the 1960’s [6] to frameworks where multiple factors must be satisfied (i.e. Kane’s web of inferences [7]) for a judgment to be deemed an accurate representation of truth. One such classification was that established by the Campbell collaboration [8] [9] who highlighted 4 key constructs (Figure One):

  1.  Statistical Validity is the extent to which differences in outcome are a result of differences in performance rather than chance i.e. it would be invalid if you use the single highest score in a examination to compare two outcomes of a learning intervention rather than a mean
  2.  Internal Validity is the extent to which the intervention can reliably be ascribed to have effected the change i.e. it would be invalid to ascribe the impact of a 5 minute tutorial on head injuries to success in a post-attachment assessment on managing seizures
  3. Construct Validity relates to the association between the concept being investigated and the measures used to test it (does the data collected accurately reflect the outcome measure chosen?) i.e. it would be invalid to collect data on the number of doctor work placed based assessments when you a testing patient satisfaction in an Emergency Department
  4. External Validity is the extent to which the intervention is applicable in different populations. i.e. comparing the outcome results of a resuscitation training programme in a high fidelity simulation suite with a low fidelity mannequin used in a hospital seminar room.

Figure 1 [Part 1]
Figure 1
Validity is an important concept as it allows objective interpretation and review of presented evidence. Therefore in education research a piece of research lacking in internal or statistical conclusion validity should not be used to inform change in teaching methods or trainee selection processes. In the clinical sciences well designed clinical trials should ensure that the outcomes of the research will be valid. However in medical education and implementation science validity is potentially a more challenging issue [10]. In determining whether drug A has specific outcome B the underlying patho-physiological process by which drug A acts and delivers a physiological response is generally well understood. The outcome gain, whether it be altering mortality or a specific physiological response is clear, and there is an obvious ability to reproduce any studies of drug A on different populations. This is not the case in Medical Education where the intervention may be delivered by a number of different mechanisms in a fashion which is not clearly delineated or subject to bias or subject to confounding variables. Reproducibility may be difficult due to changing learning environments and the outcomes not always being linear or discreet [11].

We’ll explore a practical example of the challenge of validity in Medical Education in part 2 [to be posted on Friday]


  1. Carey G GI. Reliability and validity in binary ratings: Areas of common misunderstanding in diagnosis and symptom ratings. Archives of General Psychiatry 1978 December 1;35(12):1454-1459.
  2. Messick S. Validity . In: Linn RL, editor. Educational measurement. 3rd ed. New York: American Council on Education and Macmillan; 1989. p. 13.
  3. American Psychological Association, Standards for Educational and Psychological Testing. Standards for Educational & Psychological Tests 1999.
  4. Beckman TJ, Cook DA, Mandrekar JN. What is the validity evidence for assessments of clinical teaching? J Gen Intern Med 2005 Dec;20(12):1159-1164.
  5. Shuttleworth M. Internal Validity. 2009; Available at: https://explorable.com/internal-validity. Accessed 03,12, 2015.
  6. Wesman AG. Essentials of Psychological Testing. Lee J. Cronbach. Harper, New York, ed. 2, 1960. xxi + 650 pp. Illus. $7. Science 1960 April 22;131(3408):1209-1210.
  7. Kane MT. Current Concerns in Validity Theory. Journal of Educational Measurement 2001;38(4):319-342.
  8. Cook T, Campbell D. Quasi-experimentation: Design and analysis issues for field settings. First ed. Boston, MA: Houghton Mifflin Company.; 1979.
  9. Shadish W, Cook T, Campbell D. Experimental and quasi-experimental designs for generalized causal inference. First ed. Boston, MA: Houghton Mifflan Company; 2002.
  10. Colliver JA, Conlee MJ, Verhulst SJ. From test validity to construct validity … and back? Med Educ 2012 Apr;46(4):366-371.
  11. Cook DA, West CP. Perspective: Reconsidering the focus on “outcomes research” in medical education: a cautionary note. Acad Med 2013 Feb;88(2):162-167.