Education Theory Made Practical – Volume 3, Part 4: The Kirkpatrick Model: Four Levels of Learning Evaluation

As part of the ALiEM Faculty Incubator program, teams of 2-4 incubator participants authored a primer on a key education theory, linking the abstract to practical scenarios. For the third year, these posts are being serialized on our blog, as a joint collaboration with ALiEM. You can view the first e-book here – the second is nearing completion and will soon be released.  You can view all the blog posts from series 1 and 2 here.

The ALiEM team loves hearing your feedback prior to publication. No comment is too big or too small and they will be used to refine each primer prior to the eBook publication.  (note: the blog posts themselves will remain unchanged)

This is the fourth post of Volume 3. You can find the previous posts here: Bolman and Deal’s Four-Frame Model; Validity and Mayer’s Cognitive Theory of Multimedia Learning.


The Kirkpatrick Model: Four Levels of Learning Evaluation

Authors: Chris Fowler; Lisa Hoffman; Shreya Trivedi (@ShreyaTrivediMD); Amanda Young

Editor: Dimitrios Papanagnou

Main Authors or Originators: Donald Kirkpatrick
Other important authors or works: James and Wendy Kirkpatrick

Part 1: The Hook

Jane is an assistant program director (APD) at her residency program. In effort of improving resident in-service scores, she was recently charged with increasing resident engagement during weekly didactic conference. She has already started to implement immersive activities, such as: small group sessions; TED-talk style lectures; as well as a longitudinal simulation curriculum. The department chair and the residency program director are concerned that these curricular modifications require a significant amount of faculty time and effort [when compared to previous conference offerings]. They are not sure if the new curriculum is worth the resources required to continue its sustainability in the long-term.

Jane must evaluate the effectiveness of her educational programming in order to demonstrate its value and validate the resources invested. How can Jane evaluate the impact of the new curriculum on her residents’ learning?

Part 2: The Meat


Kirkpatrick’s model is based on four levels of evaluation, where each level builds on the previous level. Level 1, the most basic level, aims to assess how a learner reacts to a specific training. Level 1 typically involves simple questions, and aims to examine at “customer satisfaction.” Level 2 begins to assess how much knowledge participants learned from training. Assessment tools will generally include tests, interviews, or other tool to assess the learners’ knowledge following a training intervention. Level 3 aims to determine how trainees utilize newly-acquired knowledge, and focuses primarily on actual behavior change. Level 4 seeks to determine how specific training has made an impact on the organization level, or the group as a whole.

Kirkpatrick model
Kurt S. Kirkpatrick Model: Four Levels of Learning Evaluation – Educational Technology. International Journal of Educational Technology. Published October 24, 2016. Accessed July 17, 2018.

Consider the application of the Kirkpatrick framework to a clinical example below.

The Four Levels: A Case Study

A training hospital has noticed that the mortality rate related to sepsis has increased over the past six months. The Emergency Medicine (EM) residency training program was randomly selected to training resident providers on the early recognition of sepsis. A two-week training course was developed and delivered to all EM residents. Below are examples of how Kirkpatrick’s framework could be used to evaluate the effectiveness of this new training program on EM residents.

Level 1: Reaction
At the conclusion of the training course, residents are asked how much they enjoyed the course, specifically focusing on what they liked and/or disliked about the training. This information is collected using paper survey evaluations.

Level 2: Learning
One month after the initial training program, a refresher training session was delivered, along with a 25-item questionnaire, to determine how well the information was retained by residents.

Level 3: Behavior
EM residents are observed during several 12-hour shifts by faculty members. Assessments are made on their performance with regards to recognizing sepsis and meeting important metrics in treating sepsis. Direct feedback is collected after the observation sessions.

Level 4: Results
Mortality rates across the hospital related to sepsis are compiled and compared to rates before the educational intervention. Efforts are made to link organizational outcomes to EM resident involvement in the cases.


The Kirkpatrick Model is based on the work of Donald Kirkpatrick (1954), which he initially developed to determine if his own supervisors were making a significant impact based on training. His work has evolved over the decades to be one of the most frequently used models for evaluating training programs. In 1959, Kirkpatrick wrote four articles that established the basic principles of his training evaluation. Over the next decade, the model evolved and became more widespread until it became one of the standards for industry evaluations. In the mid 1970’s, after the widespread circulation of his thoughts, Kirkpatrick was asked to write a book expanding the ideas that constitute his model.1

Modern takes or advances in this theory

The original purpose of Kirkpatrick’s framework was to provide business leaders with easily identifiable and measurable outcomes in learners and the organizations for which they worked. The success of this framework in business attracted interest from numerous other fields (Yardley and Dornan). Kirkpatrick’s framework has been utilized to evaluate learning outcomes in the areas of sales and marketing; computer skills; technical skills; human performance technology; evaluation of workshops and conferences; business simulations; and “soft-skills” training, such as team building.3

Kirkpatrick’s four levels have also been applied to medical education. For example, the Best Evidence Medical Education (BEME) Collaboration adopted a modified version of Kirkpatrick’s levels as a grading standard for bibliographic reviews. The authors developed a prototype coding sheet using Kirkpatrick’s framework to appraise evidence and grade the impact of education interventions, with the implication that measuring outcomes at a higher Kirkpatrick level represented a greater quality of evidence.2,4 Additionally, numerous researchers have used this approach to evaluate medical education literature and determine best practices in medical education. This has been applied in arenas such as interprofessional education initiatives,5,6 competency-based education and mastery learning7, teachers training workshops,8 faculty development interventions,10,11 patient safety and quality improvement curricula,11,12 high-fidelity simulation a learning tool,13 and internet-based medical education.14

Over the years, various adaptations of the Four Levels of Evaluation have been proposed. Hamtini proposed an adaptation of Kirkpatrick’s framework to evaluate the e-learning environment.15 The framework was simplified into three levels rather than four. The interaction phase would be used to gauge learner satisfaction with the e-learning interface and its ease of use. The learning phase would aim to measure actual learning using a pre-test and post-test approach. The final results phase would aim to measure the ability of the employee to function effectively and efficiently, as well as the overall intrinsic and extrinsic benefits to both the employee and employer following e-learning. Shappell16 also proposed an adaptation of Kirkpatrick’s framework to use in the evaluation of the online learning environment. For Level 1, engagement was measured (e.g., involvement in sharing and discussion, time-on-page) as well as satisfaction. For Level 2, the authors proposed including online quizzes and/or assignments within the curriculum. For level 3, the authors recommended measuring transfer of learning through simulated environments, in addition to the workplace.

In their review of the literature regarding interprofessional education initiatives, Barr et al. proposed an expanded model of Kirkpatrick’s levels.5 They proposed that, at Level 2, both modification of attitudes and perceptions (level 2a) and acquisition of knowledge and skills (level 2b) be measured. Additionally, they proposed that results be measured both at the level of change in organizational practices (4a) and of benefits to patients (level 4b).5

In addition to these adaptations, James and Wendy Kirkpatrick (2016), Donald Kirkpatrick’s son and daughter-in-law, respectively, proposed the New World Kirkpatrick Model (NWKM) in an attempt to address the many critiques of Kirkpatrick’s original levels of evaluation. Amongst other things, they suggested various expansions at each level in an attempt to account for confounders; that the levels do not have to be evaluated sequentially; that not all programs require the “full evaluation package”; and, involving stakeholders in curriculum and evaluation tool development to help determine which outcomes are the most important to measure (Kirkpatrick and Kirkpatrick; Moreau).

Other examples of where this theory might apply in both the classroom and clinical setting

Schumann19 proposed that the Kirkpatrick framework could be utilized to evaluate simulations as educational tools. While the authors aimed to evaluate business simulations and their impact in the business world, the model could be easily applied to medical education simulation. They suggested strategies, such as having a non-simulation control group, utilizing pre- and post-tests, and reaching out to employers or other observers beyond the learners to evaluate for higher level outcomes. Bewley and O’Neil proposed an adaptation of Kirkpatrick’s framework that could be used to evaluate the effectiveness of medical simulations.20 They suggested adding knowledge or skills retention as a measure of sustainable behavioral change over time, perhaps making evaluation at Level 3 easier to obtain.

Annotated Bibliography of Key Papers

Yardley S and Dornan T. Kirkpatrick’s levels and education ‘evidence’. Medical Education 2012;46:97-106.2

Kirkpatrick’s purpose in developing the Four Level Evaluation framework was to provide business leaders with prompt identifiable and easy-to-measure outcomes from learners and the organizations for which they worked. None of Kirkpatrick’s original references to successful application of the levels came from fields as complex as medical education. Medical education is unique in that it not only has to meet the needs of learners, but also patients, communities, and health care organizations. The BEME collaboration adopted a modified version of Kirkpatrick’s levels: a grading standard for bibliographic review. Not all evaluation tools, however, are suitable for evaluating all educational programs; the evaluation method should fit the question being asked and the type of evidence being reviewed. Kirkpatrick’s levels are suitable for simple training interventions, where outcomes emerge rapidly and can be easily observed. They are unsuitable for more complex educational interventions, where the most important outcomes are longer term.

Moreau KA. Has the new Kirkpatrick generation built a better hammer for our evaluation toolbox? Medical Teacher 2017;39(9):999-1001.18

Kirkpatrick’s framework has many limitations and has been critiqued by many as inadequate for assessing medical education. Critiques include the difficulty of evaluation at Levels 3 and 4; neglect of confounding variables; the unfounded causal chain of having to progress from Level 1 through 4 sequentially; and the inability of the model to show why educational programs work. This article discusses the New World Kirkpatrick Model (NWKM) as proposed by Kirkpatrick’s son and daughter-in-law, and how it aims to address these critiques. The NWKM suggests involving stakeholders in curriculum development and development of evaluation tools to determine which outcomes are most important to measure. They also suggest expanding levels in order to account for confounders. Additionally, they argue that the levels do not have to be evaluated sequentially, and not all programs require the “full evaluation package”.

Praslova L. Adaptation of Kirkpatrick’s four level model of training criteria to assessment of learning outcomes and program evaluation in higher education. Educ Asse Eval Acc 2010;22:215-25.21

This article makes various suggestions regarding the application of Kirkpatricks’ framework to higher education. The author suggests splitting Level 1 into both affective reactions (i.e., enjoyment) and utility judgments (i.e., how much they believe they have learned). At Level 2, the author suggests that writing samples and speeches could also be used to gauge learning, in addition to traditional pre- and post-tests. For Level 3, the use of knowledge and skills in future classes, internships, or research projects could be used to evaluate the impact of education on behavior. Finally the author suggests that at Level 4, the beneficiary of the education first needs to be clarified, whether that be the student, society, or the organization, before any impact at this level can be measured.


While Kirkpatrick’s framework has been noted over the years to have limitations when it comes to evaluating medical education initiatives, the authors believe that it still offers value for measuring the effectiveness of a new curriculum. Further development and refinement of this theory is ongoing and aims to build on this foundation.

Part 3: The Denouement

For level 1 evaluation, Jane distributes surveys to the residents regarding their satisfaction with the new components of the conference’s curriculum. As an additional measurement at Level 1, she surveys the residents about what they feel they have learned from the new curriculum, and how confident they are with applying what they have learned into clinical practice.

For Level 2, Jane develops a tool for objectively measuring residents’ knowledge and skills. After reviewing learning goals and objectives, she develops a pre-test and post-test relating to these objectives. This objective measurement gives her insight into ascertaining whether her implementation is having a positive influence on resident education with regards to EM core content.

For Level 3, the evaluation shifts to application. Residents must be observed in the clinical environment, with particular attention to how they are applying knowledge learned through the new curriculum into their clinical practice. This is achieved through direct observation of residents on shift, and by surveying other faculty members as to what they observe when working with residents in the emergency department. Alternatively, simulation scenarios are developed to provide an opportunity to observe residents’ clinical skills as they relate to specific learning objectives. Jane anticipates that the new curriculum will improve retention; and she plans to examine in-service examination scores as another marker of impact.

For Level 4, inarguably the most difficult level to measure, Jane would have to examine impact at the organizational level, or perhaps at the level of patient outcomes. The method of measurement would depend upon the learning goals and objectives. For example, if a simulation session was developed for teaching procedural skills as a part of the new curriculum, Jane and the department would gather data regarding rates of procedural complications following the educational intervention. This information would be most useful if it were to be compared to complication rates prior to the educational intervention.

Jane successfully completes all levels of Kirkpatrick’s models, and the impact of her curriculum on residents education and in the hospital can be articulated to leadership.

Kirkpatrick’s framework does not readily apply to all educational interventions, and at times may require some modification in order to assess an intervention’s effectiveness at achieving its stated objectives. Determination of clear goals and objectives, as well as forethought into how the successful achievement of these goals and objectives is to be measured at each level, are key to effectively using this framework to evaluate educational programs.

Don’t miss the 5th post in the series, coming out Tuesday, May 21, 2019!


Reference List:

1. Our Philosophy. Kirkpatrick Partners, The One and Only Kirkpatrick Company®. Accessed July 17, 2018.

2. Yardley S and Dornan T. Kirkpatrick’s levels and education ‘evidence’. Medical Education. 2012;46:97-106.

3. McLean S and Moss G. They’re happy, but did they make a difference? Applying Kirkpatrick’s framework to the evaluation of a national leadership program. The Canadian Journal of Program Evaluation 2003;18(10):1-23.

4. Hammick M, Dornan T and Steinert Y. Conducting a best evidence review. Part 1: from idea to data coding. BEME guide no. 13. Medical Teacher 2010;32(1):3-15.

5. Barr H, Freeth D, Hammick M, Koppel I, and Reeves S. Positive and null effects of interprofessional education on attitudes toward interprofessional learning and collaboration. Advances in Health Science Education 2012;17(5):651-69.

6. Hammick M, Freeth D, Koppel I, Reeves S, and Barr H. A best evidence systematic review of interprofessional education: BEME guide no. 9. Medical Teacher 2007;29(8):735-51.

7. Bisgaard CH, Rubak SLM, Rodt SA, Pertersen JAK, and Musaeus P. The effects of graduate competency-based education and masterly learning on patient care and return on investment: a narrative review of basic anesthetic procedures. BMC Medical Education 2018;18:154. doi: 10.1186/s12909-018-1262-7.

8. Piryani RM, Dhungan GP, Piryani S, and Neupane MS. Evaluation of teachers training workshop at Kirkpatrick level 1 using retro-pre questionnaire. Advances in Medical Education and Practice 2018;9:453-7.

9. Steinert Y, Mann K, Anderson B, Barnett BM, Centeno A, Naismith L, Prideaux D, Spencer J, Tullo E, Viggiano T, Ward H, and Dolmans D. A systematic review of faculty development initiatives designed to enhance teaching effectiveness: A 10-year update: BEME guide no. 40. Medical Teacher 2016;38(8):769-86.

10. Steinert Y, Mann K, Centeno A, Dolmans D, Spencer J, Gelula M, and Prideaux D. A systematic review of faculty development initiatives designed to improve teaching effectiveness in medical education: BEME guide no. 8. Medical Teacher 2006;28(6):497-526.

11. Walpola RL, McLachlan AJ, and Chen TF. A scoping review of peer-led education in patient safety training. American Journal of Pharmaceutical Education 2018;82(2):115-23.

12. Wong BM, Etchells EE, Kuper A, Levinson W, and Shojania KG. Teaching quality improvement and patient safety to trainees: a systematic review. 2010 Academic Medicine 85(9):1425-39.

13. Issenberg SB, McGaghie WC, Petrusa ER, Gordon DL, and Scalese RJ. Features and uses of high-fidelity medical simulations that lead to effective learning: a BEME systematic review. Medical Teacher 2005;27(1):10-28.

14. Wong G, Greenhalgh T, and Pawson R. Internet-based medical education: a realist review of what works, for whom, and in what circumstances. BMC Medical Education 2010;10:12.

15. Hamtini TM. Evaluating e-learning programs: an adaptation of Kirkpatrick’s model to accommodate e-learning environments. Journal of Computer Science 2008;4(8):693-8.

16. Shappell E, Chan T, Thoma B, Trueger NS, Stuntz B, Cooney R, and Ahn J. Crowdsourced curriculum development for online medical education. Cureus 2017;9(12):e1925.

17. Kirkpatrick JD, Kirkpatrick WK. 2016. Kirkpatrick’s four levels of training evaluation. Alexandria (VA): ATD Press.

18. Moreau KA. Has the new Kirkpatrick generation built a better hammer for our evaluation toolbox? Medical Teacher 2017;39(9):999-1001.

19. Schumann PL, Anderson PH, Scott TW, and Lawton L. A framework for evaluating simulations as educational tools. Developments in Business Simulation and Experiential Learning 2001;28:215-20.

20. Bewley WL and O’Neil HF. Evaluation of medical simulations. Military Medicine 2013;178(10):64-75.

21. Praslova L. Adaptation of Kirkpatrick’s four level model of training criteria to assessment of learning outcomes and program evaluation in higher education. Educ Asse Eval Acc 2010;22:215-25.