This chapter is a part of Section A(a) from the old 2011 Primary Syllabus; "Describe the features of evidence-based medicine, including levels of evidence (eg. NH&MRC), meta-analysis and systematic review". The levels of evidence are also rehashed briefly in the "Levels of evidence and grading the quality of recommendations" chapter from the Required Reading section for the Fellowship Exam. For whatever reason, this topic keeps changing between the primary and fellowship curricula, as do many of these statistics topics.
In the Primary exam, this comes up as Question 19 from the second paper of 2010 and the virtually identical Question 8 from the second paper of 2013. Additionally, Question 17 from the first Fellowship paper of 2012 asked for "a classification for the levels of evidence used for therapeutic studies in EBM". It also comes up in Viva 2 from the second paper of 2012, but because I wrote them it is hardly a fact worth mentioning.
In the college answer to Question 19, a text reference was offered (Myles & Gin Statistical methods for Anaesthesia and Intensive Care, pg114-118), and some attempt was made to use this canonical resource as the main source for this summary. Unfortunately, levels of evidence are essentially untouched by that book up until the very end of page 117, where they are listed - and then discussed over the span of a paragraph. Probably the only insight afforded by Myles and Gin was that "the dimensions of evidence are all important: level, quality, relevance, strength and magnitude of effect." Fortunately, good analysis can be found in "Levels of evidence" by Wright et al, 2006.
Apart from NHMRC, we have several systems of rating evidence to choose from, and some of these are added to the list below, even though according the the primary examiners that would not have attracted any marks as a part of a "good answer".
For a "good answer", the examiners wanted you to regurgitate the following:
That would probably be enough. One can safely stop reading here.
However, if one is cursed with an insatiable lust for hierarchical grading systems, one might already recognise that in actual fact the NHMRC levels of evidence have more strata than the college answer might suggest. The entire classification system is discussed in the NHMRC document "NHMRC additional levels of evidence and grades for recommendations for developers of guidelines". Instead of wading through the entire 23-page morass, the time-poor candidate is invited to explore Table 3 on page 15. In brief:
To make things more confusing, there are other grading systems, which appear to have no weaker validity than the Australian NHMRC, and which have some degree of international recognition. For example:
Grade | Definition |
---|---|
High | High confidence that the evidence reflects the true effect. Further research is very unlikely to change our confidence in the estimate of effect. |
Moderate | Moderate confidence that the evidence reflects the true effect. Further research may change our confidence in the estimate of effect and may change the estimate. |
Low | Low confidence that the evidence reflects the true effect. Further research is likely to change the confidence in the estimate of effect and is likely to change the estimate. |
Insufficient | Evidence either is unavailable or does not permit a conclusion. |
All of these classification systems are easily and quickly available from Wikipedia. With such an array of classification systems and definitions, what is one to do? When reading an article, one might see a reference to a Level II study. Is that an RCT, a systematic review, a "well-designed controlled trial without randomization", or something else entirely, defined arbitrarily by yet another bureaucratic health agency?
One could go quite mad reading too deeply about this. In order to gain insight while retaining vestiges of sanity, the exam candidate may wish to limit themselves to the three editorials on the levels of evidence which were published (of all places!) in the Journal of the British Editorial Society of Bone and Joint Surgery. The specific articles were Tovey and Bognolo, Carr and F.T. Horan. The editorials offer arguments and counterguments regarding the use of this clasisfication system
Sackett, David L., et al. "Evidence based medicine: what it is and what it isn't." (1996): 71-72.
Brown, Gary C., Melissa M. Brown, and Sanjay Sharma. "Value-based medicine: evidence-based medicine and beyond." Ocular immunology and inflammation 11.3 (2003): 157-170.
Wood, Beverly P. "What's the Evidence? 1." Radiology 213.3 (1999): 635-637.
Cook, D. J., and M. K. Giacomini. "The integration of evidence based medicine and health services research in the ICU." Evaluating Critical Care. Springer Berlin Heidelberg, 2002. 185-197.
Kotur, P. F. "Evidence-Based Medicine in Critical Care." Intensive and Critical Care Medicine. Springer Milan, 2009. 47-57.
NHMRC additional levels of evidence and grades for recommendations for developers of guidelines
Robert Lawrence; U. S. Preventive Services Task Force Edition (1989). Guide to Clinical Preventive Services.
Canadian Task Force on the Periodic Health Examination. (3 November 1979). "Task Force Report: The periodic health examination.". Can Med Assoc J. 121 (9): 1193–1254.
Wright, J. G., M. Swiontkowski, and J. D. Heckman. "Levels of evidence." Bone & Joint Journal 88.9 (2006): 1264-1264.
Horan, F. T. "Judging the evidence." J Bone Joint Surg [Br] 2005;87-B:1589–90
Tovey D, Bognolo G. Levels of evidence and the orthopaedic surgeon. J Bone Joint Surg [Br]2005;87-B:1591–2
Carr AJ. Evidence-based orthopaedic surgery: what type of research will best improve clinical practice? J Bone Joint Surg [Br] 2005;87-B:1593–4
Harvie, P., et al. "The use of outcome scores in surgery of the shoulder." Bone & Joint Journal 87.2 (2005): 151-154.
Bhandari, Mohit, et al. "Interobserver agreement in the application of levels of evidence to scientific papers in the American volume of the Journal of Bone and Joint Surgery." J Bone Joint Surg Am 86.8 (2004): 1717-1720.