#AppliedMedEdMethods101 – Retrospective cohort studies

(This is the fourth post of our #AppliedMedEdMethods101 series. View the others here: Beyond the RCTPre-Post Simulation and Discourse Analysis )

By Debra Pugh (@Debrapugh2013

You have been administering an OSCE progress test to residents from PGY1-4 annually for the past 8 years.  This OSCE is viewed as an important educational activity, but you wonder if performance on the OSCE can predict who is at risk of subsequently failing their Royal College specialty examination.  How could you approach this research question?

When a researcher has access to pre-existing data, performing a retrospective cohort study can be an expeditious way of answering a research question of interest.  A cohort refers to a group of individuals who share certain characteristics (e.g., residents in a training program).  In a cohort study, this group is followed over time to identify associations with an outcome of interest (Mann 2003).

A retrospective cohort study uses data that has already been collected.  This is in contrast to prospective cohort studies in which a group of individuals is identified and then data about them are gathered moving forward in time.  It is important to note that cohort studies are observational (i.e., there is no intervention on the part of the researchers).  This can be useful when there are ethical issues that would preclude the use of a randomized control trial (e.g., allowing some residents to participate in an important educational activity, while excluding others).

The OSCE progress test described above was developed as a means of helping to prepare Internal Medicine residents for their Royal College examination.  Anecdotally, it seemed that residents who performed ‘as expected’ on the OSCE subsequently went on to pass their Royal College examination, and those who were ‘below expectations’ required some sort of remediation.  We had no evidence to support these observations, however we had several years’ worth of data.  In this case the cohort was Internal Medicine Residents at our institution and, because the data had already been collected, it was a retrospective study.

The results of cohort studies are typically reported as risk ratios which are a measure of relative risk for the outcome of interest.  For example, one might want to determine the likelihood of a resident matching to a program if they either have or haven’t done an elective in said program.  The outcome of interest in our study was the risk of failing the Royal College examination.  We used logistic regression analyses to compare OSCE scores to an ‘elevated risk of failure’ on the Royal College examination.  From this, predictive probability curves were derived and demonstrated that OSCE scores (treated as a continuous variable) were indeed predictive of an ‘elevated risk of failure’ (treated as a categorical variable).  Simply put, the lower their score on the OSCE, the higher the risk for the resident of failing their Royal College examination (Pugh et al. 2016).

Now, it is important note that many cohort studies will treat variables categorically (either dichotomously, or using quantiles), however this may not always make sense as it can lead to loss of data (Turner et al. 2010).  In our example, we could have chosen to categorize OSCE scores using a median split or quartiles.  However, there was no theoretical rationale for this, so we chose to treat the data as continuous.  In contrast, we created a binary variable that categorized residents as being either ‘at elevated risk’ or ‘not at elevated risk’ of failing the RCPSC exam.  We could have also chosen to create a binary variable of pass/fail, however the failure rates on this exam are historically quite low and we were worried that this categorization would lead to under-identification of at-risk individuals, hence we used the concept of an ‘elevated risk’ of failure.

Some of the advantages of retrospective cohort studies are that they are relatively quick and inexpensive to conduct, relative to prospective cohort studies, because the data have already been collected.  This helps to minimize bias because the outcome of interest was not known at the time the data were being collected.

However, of course, one of the disadvantages researchers face is that they only have access to whatever data were originally collected.  Thus, there may be important information that it is not possible to include in the analyses simply because it was never added as a data field.  Similarly, researchers must contend with missing data, which may limit the generalizability of their results.

Perhaps the most important limitation of retrospective cohort studies relates to confounding variables.  Confounding variables influence both the independent (e.g., OSCE scores) and the dependent variable (e.g., risk of failing the RCPSC examination).  An example of this might be participation in a preparatory course, which could conceivably impact both variables and subsequently confound the results.

The use of retrospective cohort studies in medical education research is attractive because it uses existing data to estimate relative risk.  This design can be useful when it is not feasible or ethical to perform a randomized control trial or prospective study.  Depending on the quality of the available data, interesting associations can be identified.  However, incomplete data and confounding variables may limit one’s ability to draw meaningful conclusions.

Key References

Mann CJ.  Observational research methods. Research design II: cohort, cross sectional and case-control studies.  Emerg Med J 2003; 20: 54-60.

Pugh D, Bhanji F, Cole G, Dupre J, Hatala R, Humphrey-Murto S, Touchie C, Wood TJ.  Do OSCE progress test scores predict performance in a national high-stakes examination? Med Educ 2016; 50: 351-358.

Turner EL, Dobson, JE, Popock J.  Categorisations of continuous risk factors in epidemiological publications: a survey of current practice. Epidemiol Perspect Innov 2010; 7:9.

Featured image via markmags on Pixabay