You have taken over the directorship of a district hospital ICU. Part of your mandate is to establish a Quality Assurance program.
(b) What is the relevance of Evidence Based Medicine to your patients and how will you apply this?
Evidence Based Medicine has been defined as the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients. It is not new let alone revolutionary. Its relevance to the candidate’s practice is its ability to add to clinical experience, basic science and physiological principle.
Unfortunately an individual would be unable to review and critically assess all the literature available in all languages. Practitioners are dependent on reviews, meta-analyses and expert opinions. Many questions have yet to be answered effectively or in many cases are yet to be addressed at all. Other questions are beyond scientific assessment eg the use of no antibiotic in pneumonia. A complete appreciation of EBM requires review of the literature, audit of local practice ie techniques/management in one’s own ICU, implementation of EBM based practice and follow-up audit of results. Although not itself assessed by trials, EBM, by scientific appraisal and review, formalises an aspect of quality improvement which should be relevant to ICU practice.
Evidence based medicine is the system of critical evaluation of published data for applicability to the management of individual patients, or something.
The college regale us with the the Sackett definition of EBM:
"Evidence based medicine is the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients. The practice of evidence based medicine means integrating individual clinical expertise with the best available external clinical evidence from systematic research."
Again, one could digress extensively here, scoring virtually no marks.
Were such an essay-style question ever to return to CICM fellowship papers, one would rant creatively, using the following points as a skeleton:
Relevance of EBM to ICU practice
"How will you apply this?"
Cook, D. J., and M. K. Giacomini. "The integration of evidence based medicine and health services research in the ICU." Evaluating Critical Care. Springer Berlin Heidelberg, 2002. 185-197.
Kotur, P. F. "Evidence-Based Medicine in Critical Care." Intensive and Critical Care Medicine. Springer Milan, 2009. 47-57.
An article appears reporting the positive effects of a new agent in a trial of 50 patients with septic shock.
(c) What criteria will you use to assess the validity of this article to your ICU?
The criteria for assessment of such an article include:
• Is the trials design valid and powered to achieve a result? It seems doubtful in this case but a large effect in a specific group may be detected.
• Was the hypothesis based on valid evidence?
• Were all the entered patients accounted for?
• Were the groups equivalent after randomisation?
• Was there proper blinding of study personnel?
• Apart from the experimental intervention were the groups treated equivalently?
• Was the statistical analysis appropriate?
• How large was the treatment affect?
• Can the results be applied to my patients?
Though not owrd-for-word identical, this question closely resembles Question 8 from the second paper of 2012, as well as Question 8 from the first paper of 2004. It discusses the assessment of the validity of a randomised controlled trial, which is discussed in greater detail elsewhere.
The answer is reproduced below, to simplify revision and damage SEO:
Outline the way you would calculate and how you might use the following features of a diagnostic test: sensitivity, specificity, positive predictive value and negative predictive value.
Disease
Present Absent
Test |
Positive |
A |
B |
A+B |
Negative |
C |
D |
C+D |
|
A+C |
B+D |
A+B + C+D |
Sensitivity = proportion of patients with disease detected by positive test = A/(A+C) Specificity = proportion of patients without disease detected by negative test = D/(B+D)
Positive predictive value = proportion of patients with positive test who have disease = A/(A+B) Negative predictive value = proportion of patients with negative test who do not have disease = D/(C+D)
Very high sensitivity means few false negatives. Very high specificity means few false positives.
This question closely resembles a whole mass of other questions:
The questions may not be identical, but they test the exact same concepts Here's a helpful list of equations the college expects us to memorise.
Sensitivity = true positives / (true positives + false negatives)
This is the proportion of patients in whom disease which was correctly identified by the test.
Specificity = true negatives / (true negatives + false positives)
This is the proportion of patients in whom the disease was correctly excluded
Positive predictive value = (true positives / total positives)
This is the proportion of patients with positive test results who are correctly diagnosed.
Negative predictive value = (true negatives / total negatives)
This is the proportion of patients with negative test results who are correctly diagnosed.
Compare and contrast the use of the Chi-squared test, Fisher’s Exact Test and logistic regression when analysing data.
All these tests are widely used in the statistical reporting of data and give a representation of the likelihood that a given spread of data occurs by chance.
The Chi-square(d) statistic is used when comparing categorical data (e.g. counts). Often, these data are simply displayed in a “contingency table” with R rows and C columns. It’s use is less appropriate where total numbers are small (e.g. N <20) or smallest expected value is less than 5.
Fisher’s Exact test is used when comparing categorical data (e.g. counts), but is only generally applicable in a 2 x 2 contingence table (2 columns and 2 rows). It is specifically indicated when total numbers are small (e.g. N <20) or smallest expected value is less than 5.
Logistic regression is used when comparing a binary outcome (e.g. yes/no, lived/died) with other potential variables. Logistic regression is most commonly used to perform multivariable analysis (“controlling for” various factors), and these variables can be either categorical (e.g. gender), orcontinuous (e.g. weight), or any combination of these. The standard ICU mortality predictions are based on logistic regression analysis.
When one is invited to "compare and contrast" things, one is well served by a table structure.
First, the prose form: much of what follows is heavily borrowed from LITFL.
Additional reading can be done, if one wishes to actually understand these concepts.
I recommend the following free online resources:
Additionally, I invite everybody to visit this page, where the author Steve Simon (presumably, somebody qualified in statistics) responds to an email he received which asked him to comment on the differences between a Chi-square test, Fisher's Exact test, and logistic regression.
A statistical test commonly used to compare observed data with data we would expect to obtain according to a specific hypothesis. The chi-square test can be used to test for the "goodness to fit" between observed and expected data.
Another test like the Chi-square test, to compare observed data with expected data.
Now that the prose is finished, let us tabulate the differences and similarities between these tests.
Chi Square | Fisher's Exact Test | Logistic regression | |
Application | "give a representation of the likelihood that a given spread of data occurs by chance" | ||
Specific uses |
Nominal data: large samples |
Nominal data: small samples |
Binary variables |
Advantages |
|
|
|
Limitations |
|
|
|
The ideal reference for this is the BMJ, with their combination of rich statistics info and Old-World credibility. I link to the relevant sections of their Statistics at Square One, by T D V Swinscow.
Outline the techniques you would use to assess the methodological quality of a placebo controlled prospective randomised clinical trial.
Various checklists are available for assessing methodological quality. One such list is that proposed by David Sackett. It includes 3 main questions: was assignment randomised and was the randomisation list concealed (minimise potential for bias)?; was follow up of patients sufficiently long and complete (ensure endpoints accurately assessed)?; were patients analysed in the groups to which they were randomised (maintain benefits of randomisation)? It also includes 3 finer points to address: were patients and clinicians (and outcome assessors) kept blind to treatment (minimise bias)?; were groups treated equally apart from the experimental treatment (ensure intervention effect is only thing being assessed)?; were the groups similar at the start of the trial (were there any potentially confounding effects that randomisation did not eliminate)? In addition to these, the study should have enrolled enough patients to be sufficiently powered to detect the perceived clinically important benefit in the primary outcome variable! Standardised criteria have also been published (CONSORT) that were recommended to facilitate consistency and clarity in studies submitted for publication, allowing the reader to more readily assess the internal and external validity of a study.
(Sackett DL et al (eds.). Evidence-based medicine. Churchill Livingstone, London. 2000
Begg C et al. Improving the quality of reporting of randomized controlled trials. The CONSORT statement. JAMA. 1996 Aug 28;276(8):637-9).
Though not word-for-word identical, this question closely resembles Question 8 from the second paper of 2012. It discusses the assessment of the validity of a randomised controlled trial, which is discussed in greater detail elsewhere.
In brief:
Compare and contrast the roles of parametric and non-parametric tests in analysing data, including examples of types of data and appropriate tests.
Parametric tests are used to compare different groups of continuous variables when the data is normally (or near-normally) distributed. Non-parametric tests do not make any assumptions about the distribution of data. They focus on order rather than absolute values, and are used to analyse data that is abnormally distributed (eg. significantly skewed) or data which represent ordered categories but may not be linear (eg. pain scores, ASA score, NYHA score). Commonly used parametric tests include the unpaired t-test (comparing 2 different groups with continuous variables [eg. age in males/females) and variations of the ANalysis Of VAriance (ANOVA: comparing multiple groups with continuous variables [eg. PaO2:FIO2 ratio in Medical/Surgical/Trauma patients). Commonly used non-parametric
tests include the Mann-Whitney U test (comparing 2 different groups with continuous variables [eg. ICU stay in males/females]) and the Kruskal-Wallace test (comparing continuous variables in more than 2 groups [eg. pain score with PCA/epidural/s-c morphine]).
You use these to figure out the p-value, i.e. the chance of getting the same results if the null hypothesis were true. There are parametric and non-parametric tests.
Description of parametric tests
Parametric tests are more accurate, but require assumptions to be made about the data, eg. that the data is normally distributed (in a bell curve). If the data deviate strongly from the assumptions, the parametric test could lead to incorrect conclusions.
If the sample size is too small, parametric tests may lead to incorrect conclusions due to the loss of "normality" of sample distribution.
Examples of parametric tests:
Description of non-parametric tests
Non-parametric tests make no assumptions about the distribution of the data. If the assumptions for a parametric test are not met (eg. the distribution has a lot of skew in it), one may be able to use an analogous non-parametric tests.
Non-parametric tests are particularly good for small sample sizes (<30). However, non-parametric tests have less power.
Examples of non-parametric tests:
Hoskin, Tanya. "Parametric and Nonparametric: Demystifying the Terms." Mayo Clinic CTSA BERD Resource. Retrieved from http://www. mayo. edu/mayo-edudocs/center-for-translational-science-activities-documents/berd-5-6. pdf(2012)
For each of the following terms, provide a definition, outline their derivation and outline their role:
Test |
Disease Present |
Disease Absent |
|
Positive |
A |
B |
A+B |
Negative |
C |
D |
C+D |
A+C |
B+D |
A+B + C+D |
Using the presence or absence of a disease, and the result a specific test as an example: Sensitivity = proportion of patients with disease detected by positive test = A/(A+C). Very high values essential if wish to catch all with disease, and allow a negative result to virtually rule out the diagnosis.
Specificity = proportion of patients without disease detected by negative test = D/(B+D). Very high values of specificity essential if wish to catch all without the disease, and allow a positive result to rule in the diagnosis.
Positive predictive value = proportion of patients with positive test who have disease = A/(A+B). PPV allows estimate of certainty around positive result.
Negative predictive value = proportion of patients with negative test who do not have disease = D/(C+D). NPV allows estimate of certainty about a negative result.
Later papers focus merely on the candidate's ability to apply the formulae.
One can make a strong argument for a return to questions which test one's understanding of the actual concept, rather than demanding the regurgitation of rote-learned equations.
To rote-learn the abovemention equations, here is a helpful list.
Sensitivity = true positives / (true positives + false negatives)
This is the proportion of patients in whom disease which was correctly identified by the test.
Specificity = true negatives / (true negatives + false positives)
This is the proportion of patients in whom the disease was correctly excluded
Positive predictive value = (true positives / total positives)
This is the proportion of patients with positive test results who are correctly diagnosed.
Negative predictive value = (true negatives / total negatives)
This is the proportion of patients with negative test results who are correctly diagnosed.
Altman, Douglas G., and J. Martin Bland. "Statistics Notes: Diagnostic tests 2: predictive values." Bmj 309.6947 (1994): 102.
“The absence of evidence of effect does not imply evidence of absence of effect”. Please explain how this statement applies to evaluation of the medical literature.
Candidates were expected to think more broadly than just the “power” of a study. Consider:
No evidence - never asked the question. Low level evidence. Physiological data only. Animal data only. Ethical barriers to conducting the definitive study. Unanswerable for logistic reasons. Restrospective / case series only. Poorly designed existing studies (related to blinding, allocationconcealment, loss of follow up, intention to treat, uniform management apart from intervention., appropriate stats methods etc.). Meta-analysis pitfalls - significant disagreements with subsequent RCT. Type 2 error - false acceptance of null hypothesis - inadequate power - small single centre studies
This question recalls a more uncivilised time, when bewildered CICM fellowship candidates were assailed by vaguely worded essay questions in an attempt to wring some sort of creative lateral thinking from their algorithmic reptile brains. The resulting confusion can be observed even in the college answer, which -rather than defending any particular argument - instead exhorts us to think "broadly", and then presents us with a word salad of key phrases to consider. The modern papers are thankfully free from this sort of thing.
If one were to take this question seriously, one would structure one's response in the following manner:
Definition
“The absence of evidence of effect does not imply evidence of absence of effect” is a rebuttal to the Argument from Ignorance, which (put simply) states that if something has not been proven true, then it must be false. The rebuttal addresses the third possibility, that the currently available evidence has failed to detect a phenomenon. In the interpretation of medical literature, this means that a study that has failed to demonstrate the evidence of a risk has not succeeded in demonstrating the absence of risk. Similarly, a study which has failed to demonstrate a significant difference between two treatments has not demonstrated the absence of difference, only the absence of evidence of a difference.
Rationale
The idea that the absence of evidence for a phenomenon should imply that there is no such phenomenon is known in the form of the Kehoe principle, named after Robert Kehoe who argued that the use of leaded petrol was safe because at that stage there was no evidence to the contrary. The opposite view is known as the Precautionary Principle. It holds that in the absence of evidence, one must take a conservative stance and manage uncertain risks in a manner which most effectively serves human safety.
Advantages
In the absence of evidence, the precautionary principle recommends that the clinician takes reasonable measures to avoid threats that are serious and plausible. In this, it may be a more humanistic principle than the alternatives (such as the Expected Utility Theory).
In brief:
Disadvantages
In its strongest formulation, the Precautionary Principle calls for absolute proof of safety before new treatments or techniques are adopted. Such stringent standards may result in an excessive regulation of potentially useful treatment strategies. One may envision a reductio ad absurdum where table salt is outlawed because there is insufficient evidence for its safety. Some authors have suggested that the precautionary principle "replaces the balancing of risks and benefits with what might best be described as pure pessimism". Furthermore, not all experimental questions can be answered with high-level evidence (eg. in the case of rare diseases with insufficient sample size for RCTs, or in the cases where it is unethical to randomise intervention).
Published data may not offer sufficient evidence. The power of a study influences its ability to discern an effect of a given size, and it is possible that small studies are inadequately powered to detect a small treatment effect. Type 2 errors can be committed in this way.
In brief:
In summary:
There is a danger of misinterpreting "negative studies", because studies which have not found statistically significant differences in effect may have been inadequate to detect such an effect. In careful interpretation of medical literature one must be alert to the idea that not all negative studies are truly "negative". Decisonmaking in uncertainty should be guided by humanistic principles and careful risk-vs-benefit analysis.
Foster, Kenneth R., Paolo Vecchia, and Michael H. Repacholi. "Science and the precautionary principle." Science 288.5468 (2000): 979-981.
Alban, S. "The ‘precautionary principle’as a guide for future drug development."European journal of clinical investigation 35.s1 (2005): 33-44.
Peterson, Martin. "The precautionary principle should not be used as a basis for decision‐making." EMBO reports 8.4 (2007): 305-308.
Altman, Douglas G., and J. Martin Bland. "Statistics notes: Absence of evidence is not evidence of absence." Bmj 311.7003 (1995): 485.
Resnik, David B. "The precautionary principle and medical decision making."Journal of Medicine and Philosophy 29.3 (2004): 281-299.
Rabin, Matthew. "Risk aversion and expected‐utility theory: A calibration theorem." Econometrica 68.5 (2000): 1281-1292.
Alderson, Phil. "Absence of evidence is not evidence of absence." BMJ328.7438 (2004): 476-477.
In the context of clinical trials, define the following terms:
(a) Relative risk
(b) Absolute risk
(c) Number needed to treat
(d) Power of the study
A number of potential definitions exist. One example for each is listed below:
Relative risk: the difference in event rates between 2 groups expressed as proportion of the event rate in the untreated group.
Absolute risk: this is the actual event rate in the treatment or the placebo group. The absolute risk reduction is the arithmetical difference between the event rates between the two groups
Number Needed to Treat: The NNT is the number of patients to whom a clinician would need to administer a particular treatment for 1 patient to receive benefit from it. It is calculated as 100 divided by the absolute risk reduction when expressed as a percentage or 1 divided by the absolute risk reduction when expressed as a proportion.
Power of the study: The probability that a study will produce a significant difference at a given significance level is called the power of the study. It will depend on the difference between the populations compared, the sample size and the significance level chosen.
Thes question is a verbatim copy of Question 9 from the second paper of 2010.
In the context of a clinical trial, define and explain the significance of the following terms:
a) Intention to treat analysis.
b) Randomization.
ITT is the process by which the patients are analysed in the group to which they are randomised.
There are four major lines of justification for intention-to-treat analysis.
1. Intention-to-treat simplifies the task of dealing with suspicious outcomes, that is, it guards against conscious or unconscious attempts to influence the results of the study by excluding odd outcomes.
2. Intention-to-treat guards against bias introduced when dropping out is related to the outcome.
3. Intention-to-treat preserves the baseline comparability between treatment groups achieved by randomization.
4. Intention-to-treat reflects the way treatments will perform in the population by ignoring adherence when the data are analyzed.
RANDOMISATION is the process of assigning clinical trial participants to treatment groups. Randomisation gives each participant a known (usually equal) chance of being assigned to any of the groups. Successful randomisation requires that group assignment cannot be predicted in advance.
Randomisation aims to obviate the possibility that there is a systematic difference (or bias) between the groups due to factors other than the intervention. Allocation of participants to specific treatment groups in a random fashion ensures that each group is, on average, as alike as possible to the other group(s). The process of randomisation aims to ensure similar levels of all risk factors in each group; not only known, but also unknown, characteristics are rendered comparable, resulting in similar numbers or levels of outcomes in each group, except for either the play of chance or a real effect of the intervention(s). Concealment of randomisation is vital.
A brief answer to these questions is possible. However, by asking that the candidate "explain the significance" of these concepts, the college has authorised a torrent of gibberish. One could really get carried away with this.
a)
Definition of intention to treat analysis: This is the practice of grouping patient data according to the randomised allocation of the patient, rather than according to the treatment which they received.
According to Fischer et al,
"ITT analysis includes all randomized patients in the groups to which they were randomly assigned, regardless of their adherence with the entry criteria, regardless of the treatment they actually received, and regardless of subsequent withdrawal from treatment or deviation from the protocol."
Significance of intention to treat analysis:
However:
b)
Definition of randomisation: This is the practice of deliberately haphazard allocation of patients to study groups, in order to simulate the effect of chance. Randomisation gives each participant an equal chance of being assigned to any of the groups. Successful randmisation involves a process of allocation which cannot be predicted or "gamed" prior to allocation.
Significance of randomisation:
Montori, Victor M., and Gordon H. Guyatt. "Intention-to-treat principle."Canadian Medical Association Journal 165.10 (2001): 1339-1341.
Gupta, Sandeep K. "Intention-to-treat concept: A review." Perspectives in clinical research 2.3 (2011): 109.
Fisher LD, Dixon DO, Herson J, Frankowski RK, Hearron MS, Peace KE. Intention to treat in clinical trials. In: Peace KE, editor. Statistical issues in drug research and development. New York: Marcel Dekker; 1990. pp. 331–50. (not even a sample exists online! I was forced to quote from Gupta et.al.)
Beller, Elaine M., Val Gebski, and Anthony C. Keech. "Randomisation in clinical trials." Medical Journal of Australia 177.10 (2002): 565-567.
Moher, David, Kenneth F. Schulz, and Douglas G. Altman. "The CONSORT statement: revised recommendations for improving the quality of reports of parallel group randomized trials." BMC Medical Research Methodology 1.1 (2001): 2.
Herbert, Robert D. "Randomisation in clinical trials." Australian Journal of Physiotherapy 51.1 (2005): 58-60.
Kunz, Regina, and Andrew D. Oxman. "The unpredictability paradox: review of empirical comparisons of randomised and non-randomised clinical trials." Bmj317.7167 (1998): 1185-1190.
Altman, D. G., and C. J. Dore. "Randomisation and baseline comparisons in clinical trials." The Lancet 335.8682 (1990): 149-153.
Zelen, Marvin. "The randomization and stratification of patients to clinical trials."Journal of chronic diseases 27.7 (1974): 365-375.
To evaluate a new biomarker as an early index of bacteraemia, you perform the measurement in a consecutive series of 200 critically ill septic patients. You find that 100 of these patients had subsequently proven bacteraemia. Of these, 70 had a positive biomarker result. Of the remaining 100 patients without bacteraemia, 40 had a positive biomarker result.
Using the above data, show how you would calculate:
a) sensitivity
b) specificity
c) Positive predictive value
d) Negative predictive value
e) Positive Likelihood ratio
Bacteremia present |
Bacteremia absent |
||||
Biomarker+ |
70 |
40 |
|||
Biomarker- |
30 |
60 |
|||
100 |
100 |
a) Sensitivity= (TP/ {TP + FN}) = 70/100
b) Specificity= (TN/{TN + FP}) = 60/100
c) PPV = (TP/{TP+FP}) = 70/110
d) NPV = (TN({TN+FN}) = = 60/90
e) Positive likelihood ratio= Sensitivity /1-specificity = 70/40
This question is very similar to Question 19.1 from the first paper of 2010, and almost entirely identical to Question 29.2 from the first paper of 2008.
However, it also presents one with a 2×2 table breakdown of results, and there is the added question (e), which asks the candidate to calculate a positive likelihood ratio.
That formula, and relevant others, is presented in the helpful list of equations one must memorise for the fellowship.
Thus, going through the motions...
true positives = 70
false positives = 40
true negatives = 60
false negatives = 30
a) Sensitivity = True positives / ( true positives + false negatives)
= 70 / (70 + 30) = 70%
b) Specificity = True negatives / (true negatives + false positives)
= 60 / (60 + 40) = 60%
c) Positive predictive value = True positives / (true positives + false positives)
= 70 / (70 + 40) = 63.6%
d) Negative predictive value = True negatives / (true negatives + false negatives)
= 60 / (60+30) = 66.6%
e) Positive Likelihood ratio = sensitivity / (1-specificity)
= 0.7 / (1 - 0.6) = 1.75
a) What is a meta-analysis?
b) What is the role of meta-analysis in evidence based medicine?
c) What are the features you look for in a meta-analysis to determine if it has been well conducted?
a) A form of systematic review that uses statistical methods to combine the results from different studies
b) roles:
1. |
↑ statistical power by ↑ sample size |
|
2. |
Resolve uncertainty when studies disagree |
|
3. |
Improve estimates of effect size |
|
4. |
Establish questions for future PRCTs |
|
c) |
||
1. |
Are the research questions defined clearly? |
|
2. |
Are the search strategy and inclusion criteria described? |
|
3. |
How did they assess the quality of studies? |
|
4. |
Have they plotted the results? |
|
5. |
Have they inspected the data for heterogeneity? |
|
6. |
How have they calculated a pooled estimate? |
|
7. |
Have they looked for publication bias? |
This question - though not entirely identical - is very similar to Question 5 from the second paper of 2013. The key difference is the inclusion of the nebulous question about the role of meta-analysis in EBM. In the later paper, this was focused specifically on the advantages of meta-analysis over the analysis of a single study. If one compares the above answer to (b) with the answer (b) in Question 5, one will discover similarities, which suggests that the college was looking for a list of advantages here as well.
Thus, much of the below is a direct copy of Question 5.
a) What is a meta-analysis?
Meta-analysis is a tool of quantitative systematic review.
It is used to weigh the available evidence from RCTs and other studies based on the numbers of patients included, the effect size, and on statistical tests of agreement with other trials.
b) What is the role of meta-analysis in evidence based medicine?
c) What are the features you look for in a meta-analysis to determine if it has been well conducted?
Sauerland, Stefan, and Christoph M. Seiler. "Role of systematic reviews and meta-analysis in evidence-based medicine." World journal of surgery 29.5 (2005): 582-587.
DerSimonian, Rebecca, and Nan Laird. "Meta-analysis in clinical trials."Controlled clinical trials 7.3 (1986): 177-188.
Rockette, H. E., and C. K. Redmond. "Limitations and advantages of meta-analysis in clinical trials." Cancer Clinical Trials. Springer Berlin Heidelberg, 1988. 99-104.
Walker, Esteban, Adrian V. Hernandez, and Michael W. Kattan. "Meta-analysis: Its strengths and limitations." Cleveland Clinic Journal of Medicine75.6 (2008): 431-439.
Methodological Expectations of Cochrane Intervention Reviews
A Phase III study of a drug was undertaken to determine if it improved mortality in severe sepsis. The study design was a randomized, double-blind, placebo-controlled, multicenter trial (n=1200). The mortality rates in the placebo arm and the trial drug arm were 32% and 26% respectively. There were no adverse effects noted in relation to the trial drug.
a) What do you understand by the term Phase III.?
b) What was the absolute risk reduction?
c) What was the relative risk reduction?
d) Calculate the “number needed to treat”?
a) What do you understand by the term Phase III.?
Phase III trials compare new treatments with the best currently available treatment (the
standard treatment). Much larger sample sizes than Phase II and are usually randomised. They are aimed at being the definitive assessment of how effective the drug is, in comparison with current 'gold standard' treatment.
b) What was the absolute risk reduction?
6%
c) What was the relative risk reduction?
18.75%
d) Calculate the “number needed to treat”?
16.66
Again, the candidate is called upon to recall equations and to perform basic mathematics. A helpful list of such equations is available.
a) a Phase III trial is a study of the treatment effect of the drug, which is performed in a large group of patients, all of whom have the disease being studied. The purpose of a a Phase III trial is to test efficacy of an experimental treatment in comparison to standard of care or "gold standard" therapy.
One can find more information about the phases of clinical research in brief in this 2011 BMJ statistics question by Phillip Sedgwick, in greater detail in this article by M.A. Rogers, and in great detail in this 2013 publication from the IJPCBS.
b) Absolute risk reduction (ARR) = (AR in treatment group - AR in control group)
In this trial, the ARR = (32% - 26%) = 6%
c) The relative risk reduction (RRR) = (ARR / control group AR)
In this trial, RRR = (0.06 / 0.32) = 18.75%
d) The Numbers Needed to Treat (NNT) = (1/ARR),
In this trial, NNT = (1 / 0.06) = 16.6
Sedgwick, Philip. "Phases of clinical trials." BMJ 343 (2011).
Rogers, M. A. "What are the phases of intervention research." Access Academics and Research (2009).
Rohilla, Ankur, D. Sharma, and R. Keshari. "Phases of clinical trials: a review."IJPCBS 3 (2013): 700-3.
You have been approached by a company which has developed a new biomarker of sepsis. They would like it tested in a cohort of critically ill septic patients. You test this biomarker in a cohort of 100 patients with proven bacteremia. You also test this biomarker in a cohort of 100 patients with drug overdose whom you use as a control. In the bacteremic group 70 patients had abnormal biomarker results. In the control group 60 patients had an abnormal biomarker results.
Calculate
a) Sensitivity
b) specificity
c) Positive predictive value
d) Negative predictive value
Values below expressed as a percentage .
a) |
70/100 |
b) |
40/100 |
c) |
70/130 |
d) |
40/70 |
This question is identical to Question 19.1 from the first paper of 2010. However, the college changed the numbers a little, and made the question about pancreatic necrosis.
Going though the motions,
true positives = 70
false positives = 60
true negatives = 40
false negatives = 30
a) Sensitivity = True positives / ( true positives + false negatives)
= 70 / (70 + 30) = 70%
b) Specificity = True negatives / (true negatives + false positives)
= 40 / (40 + 60) = 40%
c) Positive predictive value = True positives / (true positives + false positives)
= 70 / (70 + 60) = 53.8%
d) Negative predictive value = True negatives / (true negatives + false negatives)
= 40 / (40 + 30) = 57.1%
In the context of a randomised control trial comparing a trial drug with placebo:
a) briefly explain the following terms:
b) List the factors that influence sample size.
Type 1 error
The null hypothesis is incorrectly rejected. Type 1 errors may result in the implementation of therapy that is in fact ineffective or a false positive test result.
Type 2 error
The null hypothesis is incorrectly accepted. Type 2 errors may result in rejection of effective treatment strategies or a false negative test result.
Study power
Power is equal to 1-β. Thus if β = 0.2, the power is 0.8 and the study has 80% probability of detecting a difference if one exists
Effect size
Effect size (∆) is the clinically significant difference the investigator wants to detect between the study groups. This is arbitrary but needs to be reasonable and accepted by peers. It is harder to detect a small difference than a large difference. The effect size helps us to know whether the difference observed is a difference that matters.
Factors influencing sample size
• Selected values for significance level, α, power β and effect size ∆ (smaller values mean larger sample size)
• Variance /SD in the underlying population (larger variance means larger sample size)
The college presents a concise and effective answer to this question, which should serve as a model. Below is a non-model answer overgrown with the unnecessary fat of references and digressions.
a)
Type 1 error: The incorrect rejection of a null hypothesis.
Type 2 error: the incorrect rejection of the alternative hypothesis.
Study power: The probability that the study correctly rejects the null hypothesis, when the null hypothesis is false.
Effect size: a quantitative reflection of the magnitude of a phenomenon; in this case, the magnitude of the positive effects of a drug on the study population.
Factors which influence sample size:
There is a good article on this in Radiology (2003)
There is an online Handbook of Biological Statistics which has an excellent overview of power analysis.
Kelley, Ken, and Kristopher J. Preacher. "On effect size." Psychological methods 17.2 (2012): 137.
Moher, David, Corinne S. Dulberg, and George A. Wells. "Statistical power, sample size, and their reporting in randomized controlled trials." Jama 272.2 (1994): 122-124.
Cohen, Jacob. "A power primer." Psychological bulletin 112.1 (1992): 155.
Dupont, William D., and Walton D. Plummer Jr. "Power and sample size calculations: a review and computer program." Controlled clinical trials 11.2 (1990): 116-128.
Eng, John. "Sample Size Estimation: How Many Individuals Should Be Studied? 1." Radiology 227.2 (2003): 309-313.
Inspect the data representation shown below.
10.1. What form of data representation is depicted here?
10.2. With respect to the study plots what is represented by:
10.3. From the data depicted what could be inferred with regard to the effectiveness of the treatment under investigation?
10.4. What further information relating to the performance of this analysis would you require in order to gauge the accuracy of the conclusions?
10.1. What form of data representation is depicted here?
Forest Plot or Meta Analysis Graph
10.2. With respect to the study plots what is represented by: The horizontal lines?
The position of the square? The size of the square?
The position of the square and the horizontal line indicate the point estimate and the 95%
confidence intervals of the odds ratio respectively. The size of the square indicates the weight of the study.
10.3. From the data depicted what could be inferred with regard to the effectiveness of the treatment under investigation?
The depicted data suggest the treatment is not more effective than control as the 95% confidence limits of the combined odds ratio cross the vertical line.
10.4. What further information relating to the performance of this analysis would you require in order to gauge the accuracy of the conclusions?
Definition of inclusion criteria for studies
Adequate search protocol
Assessment of methodological quality
Measurement of heterogeneity
Assessment of publication bias
This topic is explored in LITFL, where they call it a "forrest plot", perhaps out of respect for Pat Forrest. This is substantially better than Wikipedia, where this form of data representation is referred to as a blobbogram. The example LITFL use for their explanation is derived from the college question.
Anyway. The college answer is correct but very brief, and probably represents something like the "passing grade" for this 10-mark question. With that in mind, and free from the need to be concise, one can launch into an exhaustingly verbose dissection of this question.
10.1 - This is a forest plot. It represents the results of a meta-analysis of studies.
10.2 - The standards for labelling and graphical representation are well summarised by this Cochrane document (however, it appears that careful adherence to standards is no defence against the absence of useful content).
10.3 - From the forest plot, one can infer that though statistically there is a trend towards a positive treatment effect, it still does not achieve statistical significance because the range of the 95% confidence interval for their odds ratio crosses the vertical line (the vertical line being an OR of 1.0, which means "no association"). Thus, on the basis of this meta-analysis one would be forced to conclude that the treatment has no effect.
10.4 - "What further information relating to the performance of this analysis would you require in order to gauge the accuracy of the conclusions?" This is a thinly veiled question about the assessment of the validity of a meta-analysis. The college answer demonstrates this in the points they used. In that context, one would theoretically be interested in every aspect of the analysis.
Generic points in the assessment of validity of a meta-analysis include the following:
If one were to only consider the presented graph, one would be more likely to respond with relevant questions for the meta-analysis authors.
Schriger, David L., et al. "Forest plots in reports of systematic reviews: a cross-sectional study reviewing current practice." International journal of epidemiology39.2 (2010): 421-429.
Lewis, Steff, and Mike Clarke. "Forest plots: trying to see the wood and the trees." Bmj 322.7300 (2001): 1479-1480.
Anzures‐Cabrera, Judith, and Julian Higgins. "Graphical displays for meta‐analysis: An overview with suggestions for practice." Research Synthesis Methods 1.1 (2010): 66-80.
Cochrane: "Considerations and recommendations for
figures in Cochrane reviews: graphs of statistical data" 4 December 2003 (updated 27 February 2008)
Reade, Michael C., et al. "Bench-to-bedside review: Avoiding pitfalls in critical care meta-analysis–funnel plots, risk estimates, types of heterogeneity, baseline risk and the ecologic fallacy." Critical Care 12.4 (2008): 220.
DerSimonian, Rebecca, and Nan Laird. "Meta-analysis in clinical trials."Controlled clinical trials 7.3 (1986): 177-188.
Biggerstaff, B. J., and R. L. Tweedie. "Incorporating variability in estimates of heterogeneity in the random effects model in meta-analysis." Statistics in medicine 16.7 (1997): 753-768.
The Cochrane Handbook: 9.5.4 "Incorporating heterogeneity into random-effects model"
What is a receiver operating characteristic plot (ROC curve) as applied to a diagnostic test? What are its advantages?
An ROC plot is a graphical representation of sensitivity vs. 1- specificity for all the observed data values for a given diagnostic test.
Advantages:
• Simple and graphical
• Represents accuracy over the entire range of the test
• It is independent of prevalence
• Tests may be compared on the same scale
• Allows comparison of accuracy between several tests.
How it may be used:
• Can give a visual assessment of test accuracy
• May be used to generate decision thresholds or “cut off” values
• Can be used to generate confidence intervals for sensitivity and specificity and likelihood ratios.
In this LITFL article, ROC curves are discussed in detail, but without apocryphal gibberish.
If one were to restrict oneself to what is manageable within a 10-minute timeframe while mentioning all the important points, one would produce an asnwer which resembles the following:
Advantages:
That, of course, is the bare bones of the answer. If one were to succumb to basic human urges, one would produce an answer which resembles the following:
Advantages of the ROC curves:
Bewick, Viv, Liz Cheek, and Jonathan Ball. "Statistics review 13: receiver operating characteristic curves." Critical care 8.6 (2004): 508.
Sedgwick, Philip. "Receiver operating characteristic curves." BMJ 343 (2011).- rather than an article, this is more of a "self-directed learning" question with an elaborate explanatory answer.
Fan, Jerome, Suneel Upadhye, and Andrew Worster. "Understanding receiver operating characteristic (ROC) curves." Cjem 8.1 (2006): 19-20.
Akobeng, Anthony K. "Understanding diagnostic tests 3: receiver operating characteristic curves." Acta Paediatrica 96.5 (2007): 644-647.
Ling, Charles X., Jin Huang, and Harry Zhang. "AUC: a statistically consistent and more discriminating measure than accuracy." IJCAI. Vol. 3. 2003.
Greiner, M., D. Pfeiffer, and R. D. Smith. "Principles and practical application of the receiver-operating characteristic analysis for diagnostic tests." Preventive veterinary medicine 45.1 (2000): 23-41.
To evaluate a new biomarker as an early index of infected pancreatic necrosis, you perform the measurement in a consecutive series of 200 critically ill patients with pancreatitis. You find that 100 of these patients had subsequently proven necrosis. Of these, 60 had a positive biomarker result. Of the remaining 100 patients without necrosis, 35 had a positive biomarker result.
Using the above data, show how you would calculate
a) Sensitivity
b) Specificity
c) Positive predictive value
d) Negative predictive value
a) Sensitivity = (TP/ {TP + FN}) = 60/100
b) Specificity = (TN/{TN + FP}) = 65/100
c) Positive predictive value = (TP/{TP+FP}) = 60/95
d) Negative predictive value = (TN({TN+FN}) = 65/105
Its not easy to overdo this discussion, given that the premise of this question rests in basic arithmetic. Given that the question is essentially maths, it is difficult to produce a "model answer" which is somehow an improvement on the already correct college answer (the only possible correct answer)
However, many people (myself included) are biologically unsuited to memorising equations. For this reason, a short list of equations to memorise has been compiled. Perhaps that's an improvement.
Thus, for this biomarker, we have the following spread of data:
a) Sensitivity: True positives / (true positives + false negatives)
= 60 / (60 + 40) = 60%
b) Specificity: True negatives / (true negatives + false positives)
= 65 / (65 + 35) = 65%
c) Positive predictive value: True positives / total positives
= 60 / (60 + 35) = 63%
d) Negative predictive value: True negatives / total negatives
= 65 / (65 + 40) = 62%
A randomized controlled clinical trial was performed to evaluate the effect of a new hormone called Rejuvenon on mortality in septic shock. 3400 patients with septic shock were studied (1700 placebo and 1700 in the Rejuvenon arms). The mortality rates in the placebo and the treatment arms were 30% and 25% respectively.
Calculate:
(a) The absolute risk reduction
(b) The relative risk reduction
(c) The number needed to treat
Using the above data, show how you would calculate:
a) The absolute risk reduction
b) The relative risk reduction
c) The number needed to treat
ARR = 5%
RRR = 5/30*100 =16.6%
NNT =1/0.05 =20
This question also relies on the candidate's ability to memorise equations.
Here is a helpful list of equations the candidate is expected to memorise.
a) ARR = (risk in control group - risk in treatment group)
= 30% - 25%
= 5%
b) RRR = (ARR / control group AR)
= 0.05 / 0.3
= 0.166, or 16.6%
c) Numbers needed to treat (NNT) = ( 1/ ARR)
= 1 / 0.05
= 20.
In the context of clinical trials, define the following terms:
a) Relative risk
b) Absolute risk
c) Number needed to treat
d) Power of the study
A number of potential definitions exist. One example for each is listed below:
Relative risk: the difference in event rates between 2 groups expressed as proportion of the event rate in the untreated group.
Absolute risk: this is the actual event rate in the treatment or the placebo group.
Number Needed to Treat: The NNT is the number of patients to whom a clinician would need to administer a particular treatment for 1 patient to receive benefit from it. It is calculated as 100 divided by the absolute risk reduction when expressed as a percentage or
1 divided by the absolute risk reduction when expressed as a proportion.
Power of the study: The probability that a study will produce a significant difference at a given significance level is called the power of the study. It will depend on the difference between the populations compared, the sample size and the significance level chosen.
Some of this ground is covered in Question 23 from the second paper of 2011. It also asks about risk ratio and NNT.
Here is a link to my summary of basic terms in EBM.
Risk ratio: risk in treatment group / risk in control or placebo group
Absolute risk: Risk of event in a group (any group). Essentially, it is the incidence rate.
NNT: Numbers needed to treat; 1/ absolute risk reduction.
Power of a study: The power of a statistical test is the probability that it correctly rejects the null hypothesis, when the null hypothesis is false. This is the chance that a study is able to discern a treatment effect, if there is an actual treatment effect. It is influenced by the level of statistical significance one expects, the sample size, the variance within the studied population, and the magnitude of the effect size.
Cohen, Jacob. "Statistical power analysis." Current directions in psychological science 1.3 (1992): 98-101.
Cook, Richard J., and David L. Sackett. "The number needed to treat: a clinically useful measure of treatment effect." Bmj 310.6977 (1995): 452-454.
Viera, Anthony J. "Odds ratios and risk ratios: what's the difference and why does it matter?." Southern medical journal 101.7 (2008): 730-734.
Malenka, David J., et al. "The framing effect of relative and absolute risk."Journal of General Internal Medicine 8.10 (1993): 543-548.
Gail, Mitchell H., and Ruth M. Pfeiffer. "On criteria for evaluating models of absolute risk." Biostatistics 6.2 (2005): 227-239.
With reference to a randomized controlled trial, briefly describe the terms “blinding” and “allocation concealment”.
• Blinding and allocation concealment are methods used to reduce bias in clinical trials.
• Blinding: a process by which trial participants and their relatives, care-givers, data collectors and those adjudicating outcomes are unaware of which treatment is being given to the individual participants.
- Prevents clinicians from consciously or subconsciously treating patients differently based on treatment allocation
- Prevents data collectors from introducing bias when there is a subjective assessment to be made for eg “pain score”
- Prevents outcome assessors from introducing bias when there is a subjective outcome assessment to be made for eg Glasgow outcome score.
• Traditionally, blinded RCTs have been classified as "single-blind," "double-blind," or "triple-blind"; The 2010 CONSORT Statement specifies that authors and editors should not use the terms "single-blind," "double-blind," and "triple-blind"; instead, reports of blinded RCT should discuss "If done, who was blinded after assignment to interventions (for example, participants, care providers, those assessing outcomes) and how.
Allocation concealment is an important component of the randomization process and refers to the concealment of the allocation of the randomization sequence from both the investigators and the patient. Poor allocation concealment may potential exaggerate treatment effects.
Methods used for allocation concealment include sealed envelope technique, telephone or web based randomization.
Allocation concealment effectively ensures that the treatment to be allocated is not known before the patient is entered into the study. Blinding ensures that the patient / physician is blinded to the treatment allocation after enrollment into the study.
The question is a 10-mark question, but it for some reason asks for one to "briefly describe" these concepts. Judging from the college answer, a truly brief description was not the expected response.
LITFL has a thorough summary, which is not brief.
If one were to briefly describe these concepts, one would produce something like this:
Allocation concealment:
- Ensures that the patients and investigators cannot predict which treatment will be allocated to which patient before they are enrolled in the study.
- Prevents selection bias
Blinding:
- Ensures that the patients and investigators remain unaware of which treatment is being administered to which idividual patient.
- Prevents detection bias and observer bias
And if one were to go to town on this topic, one would produce something like this:
Allocation concealment
Blinding:
Schulz, Kenneth F., and David A. Grimes. "Allocation concealment in randomised trials: defending against deciphering." The Lancet 359.9306 (2002): 614-618.
Forder, Peta M., Val J. Gebski, and Anthony C. Keech. "Allocation concealment and blinding: when ignorance is bliss." Med J Aust 182.2 (2005): 87-9.
Schulz, Kenneth F. "Assessing allocation concealment and blinding in randomised controlled trials: why bother?." Evidence Based Mental Health 3.1 (2000): 4-5.
In the context of statistical analysis of randomised controlled trials, explain the following terms:
a) Risk ratio
b) Number needed to treat
c) P-value
d) Confidence intervals
a) Risk ratio
A risk ratio is simply a ratio of risk, for example, [risk of mortality in the intervention group] / [risk of mortality in the control group].
It indicates the relative likelihood or experiencing the outcome if the patient received the intervention compared with the outcome if they received the control therapy.
b) Odds ratio
Odds ratio is the odds of an event occurring in one group to the odds of it occurring in another
c) Number needed to treat (NNT)
Number of patients that need to be treated for one patient to benefit compared with a control not receiving the treatment
1/(Absolute Risk Reduction)
Used to measure the effectiveness of a health-care intervention, the higher the NNT the less effective the treatment
d) P-value
A p-value indicates the probability that the observed result or something more extreme occurred by chance. It might be referred to as the probability that the null hypothesis has been rejected when it is true.
e) Confidence intervals
The confidence intervals indicate the level of certainty that the true value for the parameter of interest lies between the reported limits.
For example:
The 95% confidence intervals for a value indicate a range where, with repeated sampling and analysis, these intervals would include the true value 95% of the time
This is a straighforward question about the definitions of basic everyday statistics terms.
Judging by the relatively high pass rate, over two thirds of us already have a fair grasp of this.
Additionally, please note the model answer to the odds ratio question. Clearly we are not expected to demonstrate a genius-level understanding of these concepts. In fact, there is no odds ratio mentioned in the college question, and the very existance of it is inferred from the fact that there is an odds ratio answer.
Anyway, it never hurts to revise the basics.
Here is a link to my summary of basic terms in EBM.
In brief:
Risk ratio: risk in treatment group / risk in control or placebo group
Odds ratio: The odds of an outcome in one group / odds of that outcome in another group.
NNT: Numbers needed to treat; 1/ absolute risk reduction.
p-value in a research study is the probability of obtaining the same (or more extreme) study result assuming that the null hypothesis was true. It is the probability that the null hypothesis was incorrectly rejected. As a single-value assessment of error rate, the p-value has its opponents.
Confidence interval: CI gives a range of results and the percentage chance that the same experimental design would produce results within this range if the experiment were repeated. Thus, a CI of 95% means that in 95% of repeated experiments the results would fall within the specified range.
The CI is a pain in the arse to calculate for the mathematic-averse Homo vulgaris. A good impression of the difficulty involved can form if one reads one of these two BMJ articles.
Viera, Anthony J. "Odds ratios and risk ratios: what's the difference and why does it matter?." Southern medical journal 101.7 (2008): 730-734.
Szumilas, Magdalena. "Explaining odds ratios." Journal of the Canadian Academy of Child and Adolescent Psychiatry 19.3 (2010): 227.
Cook, Richard J., and David L. Sackett. "The number needed to treat: a clinically useful measure of treatment effect." Bmj 310.6977 (1995): 452-454.
Goodman, Steven N. "Toward evidence-based medical statistics. 1: The P value fallacy." Annals of internal medicine 130.12 (1999): 995-1004.
Morris, Julie A., and Martin J. Gardner. "Statistics in Medicine: Calculating confidence intervals for relative risks (odds ratios) and standardised ratios and rates." British medical journal (Clinical research ed.) 296.6632 (1988): 1313.
Campbell, Michael J., and Martin J. Gardner. "Statistics in Medicine: Calculating confidence intervals for some non-parametric analyses." British medical journal (Clinical research ed.) 296.6634 (1988): 1454.
a) EBM
Evidence-based medicine is the process of systematically reviewing, appraising and using clinical research findings to aid the delivery of optimum clinical care to patients
It involves considering research and other forms of evidence on a routine basis when making healthcare decisions. Such decisions include the clinical decisions about choice of treatment, test, or risk management for individual patients, as well as policy decisions for groups and populations.
b) Levels of evidence
(Any recognised system acceptable)
Level |
Therapy/Prevention, Aetiology/Harm |
1a |
Systematic review (with homogeneity) of RCTs |
1b |
Individual RCT (with narrow Confidence Interval) |
1c |
All or none (ie all patients died before the Rx became available, but some now survive on it; or when some patients died before the Rx became available, but none now die on it) |
2a |
Systematic review (with homogeneity ) of cohort studies |
2b |
Individual cohort study (including low quality RCT; e.g., <80% follow-up) |
2c |
"Outcomes" Research or ecologic studies (studies of group chics) |
3a |
Systematic review (with homogeneity) of case-control studies |
3b |
Individual Case-Control Study |
4 |
Case-series (and poor quality cohort and case-control studies ) |
5 |
Expert opinion or based on physiology, bench research or "first principles" |
Level |
|
I |
Evidence from a systematic review of all relevant randomised controlled trials |
II |
Evidence from at least one properly designed randomised controlled trial |
III |
III.1 Evidence from well-designed pseudo-randomised controlled trials |
III.2 Evidence obtained from comparative studies with concurrent controls and allocation not randomised (cohort studies) or case control studies |
|
III.3 Evidence obtained from comparative studies with historical controls |
|
IV |
Evidence from case series, opinions of respected authorities, descriptive studies, reports of expert (i.e. consensus) committees, case studies. |
c) Intention to treat analysis
Analysis based on the initial treatment intent not the treatment eventually administered. Everyone who begins treatment is considered to be part of the trial whether he/she completes the trial or not. ITT analysis avoids the effects of crossover and drop-out
Evidence based medicine is the system of critical evaluation of published data for applicability to the management of individual patients. David Sackett, a great pioneer of EBM, came up with a definition which seems to be frequently quoted, and therefore probably meets with the approval of the CICM examiners:
"Evidence based medicine is the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients. The practice of evidence based medicine means integrating individual clinical expertise with the best available external clinical evidence from systematic research."
As for levels of evidence, we have several systems to choose from. Here are a couple:
Oxford centre for evidence based medicine:
NHMRC levels:
Intention to treat analysis:
This is the practice of preserving the bias-controlling benefits of randomisation by performing analysis of all patients according to which group they were randomised to, rather than according to which treatment they actually received.
Advantages
Disadvantages
Sackett, David L. "Evidence-based medicine." Seminars in perinatology. Vol. 21. No. 1. WB Saunders, 1997.
Sackett, David L., et al. "Evidence based medicine: what it is and what it isn't."Bmj 312.7023 (1996): 71-72.
A colleague directs your attention to a recently published randomised trial on a therapeutic intervention.
Outline the features of the trial that you would lead you to change your practice.
Points to consider in the answer would be:
This question really asks, "how do you assess an RCT for validity?"
This is addressed in greater detail elsewhere.
In brief:
Oh's Intensive Care manual: Chapter 10 (p83), Clinical trials in critical care by Simon Finfer and Anthony Delaney.
The JAMA collection via the John Hopkins Medical School
CASP (Critical Appraisal Skils Program) has checklists for the appraisal of many different sorts of studies; these actually come with tickboxes. One imagines reviewers wandering around a trial headquarters, ticking these boxes on their little clipboards.
CEBM (Centre for Evidence Based Medicine) also has checklists, which (in my opinion) are more informative.
Here is a link to their checklist for the critical appraisal of an RCT.
With reference to the reporting of clinical trials in the literature:
a) What is a meta-analysis?
b) What are the advantages of a meta-analysis over the interpretation of an individual study?
c) List the features of a well-conducted meta-analysis.
d) What is “publication bias” and how can this impact on the validity of a meta-analysis?
a)
b)
c)
d)
LITFL have an excellent resource for this.
a) What is a meta-analysis?
Meta-analysis is a tool of quantitative systematic review.
It is used to weigh the available evidence from RCTs and other studies based on the numbers of patients included, the effect size, and on statistical tests of agreement with other trials
b) What are the advantages of a meta-analysis over the interpretation of an individual study?
c) List the features of a well-conducted meta-analysis.
d) What is “publication bias” and how can this impact on the validity of a meta-analysis?
DerSimonian, Rebecca, and Nan Laird. "Meta-analysis in clinical trials."Controlled clinical trials 7.3 (1986): 177-188.
Rockette, H. E., and C. K. Redmond. "Limitations and advantages of meta-analysis in clinical trials." Cancer Clinical Trials. Springer Berlin Heidelberg, 1988. 99-104.
Walker, Esteban, Adrian V. Hernandez, and Michael W. Kattan. "Meta-analysis: Its strengths and limitations." Cleveland Clinic Journal of Medicine75.6 (2008): 431-439.
Methodological Expectations of Cochrane Intervention Reviews
With reference to clinical studies:
a) Define the term "external validity".
b) Define the term "bias".
c) Briefly explain selection bias and measures to reduce it.
a) External validity is the extent to which the results of a study can be generalised to other
situations, e.g. different case-mix
b) Bias in statistics is defined as systematic distortion of the observed result away from the
"truth", caused by inadequacies in the design, conduct, or analysis of a trial.
c) Selection bias is caused by a systematic error in creating intervention groups, such that
they differ with respect to prognosis. The study groups differ in measured or unmeasured
baseline characteristics because of the way participants were selected or assigned.
Selection bias also means that the study population does not reflect a representative
sample of the target population. Selection bias undermines the external validity of the
study and the conclusions drawn by the study should not be extended to other patients.
Measures to reduce selection bias include:
Randomisation: Randomisation assigns patients to treatment arms by chance,
avoiding any systematic imbalance in characteristics between patients receiving
experimental versus the control intervention.
Allocation concealment: The allocation sequence is the order in which participants are
to be allocated to treatment. Allocation concealment involves not disclosing to patients
and those involved in recruiting trial participants, the allocation sequence before
random allocation occurs.
External validity: the extent to which the study results can be generalised to the greater population, which is influenced by a vast array of factors:
Bias: a systematic error which distorts study findings
Selection bias: The selection of specific patients which results in a sample group which is not random, and which is not representative of a population. This can be avoided by randomisation, blinding and by allocation concealment.
The college answer actually comes from the CONSORT Statement glossary:
"Systematic error in creating intervention groups, such that they differ with respect to prognosis. That is, the groups differ in measured or unmeasured baseline characteristics because of the way participants were selected or assigned. Also used to mean that the participants are not representative of the population of all possible participants."
Higgins, Julian PT, and Sally Green, eds. Cochrane handbook for systematic reviews of interventions. Vol. 5. Chichester: Wiley-Blackwell, 2008.
a) With respect to meta-analysis of randomised controlled trials, what is a funnel plot?
b) In the funnel plot above:
i. What do the outer dashed lines indicate?
ii. To what does the solid vertical line correspond?
c) List three factors that result in asymmetry in funnel plots.
a) A funnel plot is a scatter plot of the effect estimates from individual studies against some measure of each study’s size or precision. The standard error of the effect estimate is often chosen as the measure of study size and plotted on the vertical axis with a reversed scale that places the larger, most powerful studies towards the top. The effect estimates from smaller studies should scatter more widely at the bottom, with the spread narrowing among larger studies.
b)
Outer dashed lines-triangular region where 95% of studies are expected to lie
Solid vertical line- no intervention effect
c)
i) Heterogeneity
ii) Reporting bias
iii) Chance
It was expected that candidates regularly attending journal club would have the knowledge to answer this question but overall it was not well answered and explanation of terms was poor
The abovedepicted plot is not the gospel plot from the CICM paper, but one which I have confabulated myself. Hopefully, it bears some resemblance to the original.
a) is answered by the college in a manner which precisely reflects the wording of the Cochrane Handbook. That is indeed " a simple scatter plot of the intervention effect estimates from individual studies against some measure of each study’s size or precision ".
b)
The lines? what do they mean? Said best by the laconic college:
c)
Causes of assymmetry are well summarised by Sterne et al (2011), whose Box 1 I have shamelessly stolen:
Reporting biases
Poor methodological quality
|
True heterogeneity
Artefactual
Chance
|
DerSimonian, Rebecca, and Nan Laird. "Meta-analysis in clinical trials."Controlled clinical trials 7.3 (1986): 177-188.
Rockette, H. E., and C. K. Redmond. "Limitations and advantages of meta-analysis in clinical trials." Cancer Clinical Trials. Springer Berlin Heidelberg, 1988. 99-104.
Walker, Esteban, Adrian V. Hernandez, and Michael W. Kattan. "Meta-analysis: Its strengths and limitations." Cleveland Clinic Journal of Medicine75.6 (2008): 431-439.
Methodological Expectations of Cochrane Intervention Reviews
Sterne, Jonathan AC, et al. "Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials." Bmj 343 (2011): d4002.
A systematic review of the literature was undertaken comparing proton pump inhibitors with H2-receptor blockers for the prevention of gastro-intestinal bleeding in ICU patients.
a) Name the type of graph illustrated in the above figure. (10% marks)
b) What does it show? (25% marks)
c) What are the benefits of this type of analysis? (25% marks)
d) What are the disadvantages of this analysis? (40% marks)
a)
Forest plot
b)
Combining the trials together, PPI use results in an odds ratio of 0.35 or reduction in the risk of bleeding compared to H2RA. Alternatively, PPI use results in 65% reduction (1- 0.35) in bleeding.
c)
Combines small studies with limited power, increasing the number and thus the ability to pick up a positive effect. Small studies with low power (due to small effect, small numbers) run the risk of a Type II error.
d)
Individual studies might have different patient populations (with different risk of bleeding) or different definitions of outcome.
Individual studies might have been conducted with different degrees of rigour (blinding, etc.)
There is publication bias to positive studies so that negative studies are not reported. Need full disclosure how the studies were selected, their scientific grading, subgroup analyses and assessment of heterogeneity.
I have no idea whether the college actually used this exact image, but certainly the paper was correctly identified by LITFL. My hat is off to Chris Nickson, who managed to track down the exact PPI vs H2A study which had this exact forest plot and OR / RRR. It was indeed the Alhazzani study from 2013.
So:
a) and b) are reviewed in greater detail in the chapter on forest and box-and-whisker plots. In short:
c) and d)
Advantages of meta-analysis
Disadvantages of meta-analysis
Alhazzani, Waleed, et al. "Proton pump inhibitors versus histamine 2 receptor antagonists for stress ulcer prophylaxis in critically ill patients: a systematic review and meta-analysis*." Critical care medicine 41.3 (2013): 693-705.
Methodological Expectations of Cochrane Intervention Reviews
Schriger, David L., et al. "Forest plots in reports of systematic reviews: a cross-sectional study reviewing current practice." International journal of epidemiology39.2 (2010): 421-429.
Lewis, Steff, and Mike Clarke. "Forest plots: trying to see the wood and the trees." Bmj 322.7300 (2001): 1479-1480.
Anzures‐Cabrera, Judith, and Julian Higgins. "Graphical displays for meta‐analysis: An overview with suggestions for practice." Research Synthesis Methods 1.1 (2010): 66-80.
Cochrane: "Considerations and recommendations for
figures in Cochrane reviews: graphs of statistical data" 4 December 2003 (updated 27 February 2008)
Reade, Michael C., et al. "Bench-to-bedside review: Avoiding pitfalls in critical care meta-analysis–funnel plots, risk estimates, types of heterogeneity, baseline risk and the ecologic fallacy." Critical Care 12.4 (2008): 220.
DerSimonian, Rebecca, and Nan Laird. "Meta-analysis in clinical trials."Controlled clinical trials 7.3 (1986): 177-188.
Biggerstaff, B. J., and R. L. Tweedie. "Incorporating variability in estimates of heterogeneity in the random effects model in meta-analysis." Statistics in medicine 16.7 (1997): 753-768.
The Cochrane Handbook: 9.5.4 "Incorporating heterogeneity into random-effects model"
Explain the following terms as applied to a randomised controlled clinical trial:
a) Allocation concealment. (25% marks)
b) Block randomisation, using block sizes of 4, in a trial of drug A versus drug B. (25% marks)
c) Stratification. (25% marks)
d) Minimisation algorithm. (25% marks)
a)
Procedure for protecting the randomization process and ensuring that the clinical investigators and those involved in the conduct of the trial are not aware of the group to which the subject has been allocated
b)
Simple randomisation may result in unequal treatment group sizes; block randomisation is a method that may protect against this problem and is particularly useful in small trials.
In the context of a trial evaluating drug A or drug B and with block sizes of 4, there are 6 possible blocks of randomisation: AABB, ABAB, ABBA, BAAB, BABA, BBAA.
One of the 6 possible blocks is selected randomly and the next 4 study participants are assigned according to the order of the block. The process is then repeated as needed to achieve the necessary sample size.
c)
Stratification is a process that protects against imbalance in prognostic factors that are present at the time of randomisation.
A separate randomisation list is generated for each prognostic subgroup. Usually limited to 23 variables because of increasing complexity with more variables.
d)
This is an alternative to stratification for maintaining balance in several prognostic variables. The minimisation algorithm maintains a running total of the prognostic variables in patients that have already been randomised and then subsequent patients are assigned using a weighting system that minimizes imbalance in those prognostic variables.
In this paper, only one candidate (2.5% of the cohort) managed to just pass this question (i.e. they got 5 marks out of 10).
a) Allocation concelament:
b) Block randomisation:
"...sometimes we want to keep the numbers in each group very close at all times. Block randomisation (also called restricted randomisation) is used for this purpose. For example, if we consider subjects in blocks of four at a time there are only six ways in which two get A and two get B: 1:AABB 2:ABAB 3:ABBA 4:BBAA 5:BABA 6:BAAB. We choose blocks at random to create the allocation sequence. Using the single digits of the previous random sequence and omitting numbers outside the range 1 to 6 we get 5623665611. From these we can construct the block allocation sequence BABA/BAAB/ABAB/ABBA/BAAB, and so on. The numbers in the two groups at any time can never differ by more than half the block length. Block size is normally a multiple of the number of treatments."
c) Stratification:
d) Minimisation algorithm:
Altman, Douglas G., and J. Martin Bland. "How to randomise." Bmj 319.7211 (1999): 703-704.
Evaluation of a novel serum biomarker for the rapid diagnosis of sepsis is performed in a sample of 100 patients with fever. The biomarker is compared with positive culture results as the gold standard and yields the following information:
Sepsis present {culture positive) |
Sepsis absent {culture negative) |
||
Biomarker positive |
30 |
10 |
|
Biomarker negative |
30 |
30 |
|
n |
60 |
40 |
With reference to these results, define the following and give the values for the performance of the test:
a) Sensitivity. (20% marks)
b) Specificity. (20% marks)
c) Positive predictive value. (20% marks)
d) Negative predictive value. (20% marks)
e) Accuracy . (20% marks)
Sepsis present (culture positive) |
Sepsis absent (culture negative) |
||
Biomarker positive |
30 (a) |
10 (b) |
(a + b) |
Biomarker negative |
30 (c) |
30 (d) |
(c + d) |
n |
60 (a + c) |
40 (b + d) |
(a+b+c+d) |
a) Ability of test to identify true positives
Or the probability the test will be positive in individuals who do have the disease
Sensitivity a/(a+c) 30/60 50%
b) Ability of test to identify true negatives
Or the probability the test will be negative in individuals who do not have the disease.
Specificity d/(b+d) 30/40 75%
c) Likelihood of positive test meaning patient has sepsis
PPV a/(a+b) 30/40 75%
d) Likelihood of negative test meaning patient does not have sepsis
NPV d/(c+d) 30/60 50%
e) The ability to differentiate patient and healthy cases correctly.
Accuracy (a+d)/(a+b+c+d) 60/100 60%
Additional Examiners' Comments:
The question clearly stated that a definition was required. Many candidates either could not define the terms or just missed this part of the question and therefore missed out on marks. This question has come up a number of times in past exams and these are basic statistical concepts that some candidates clearly do not understand.
This question closely resembles all other previous questions about the measures of diagnostic test accuracy:
After being absent from the papers for over five years, one might have been forgiven for thinking that such calculator-intense statistics questions were demoted to the level of primary exam material (as most recent statistics questions in the Fellowship Exam have been more about interpretation of meta-analysis data and other such ultra-clever "fellow level" uses of EBM). The main difference in 2016 was the addition of accuracy as one of the examined parameters. This has never been examined previously, and is not a frequently mentioned measure (even though colloquially we might use the term near-constantly). An excellent 2008 article was used to define it for the purposes of this model answer.
Clearly, at least one candidate remembered all the definitions, and got 10 marks.
a)
b)
c)
d)
e)
Šimundić, Ana-Maria. "Measures of diagnostic accuracy: basic definitions." Med Biol Sci 22.4 (2008): 61-5.
Question 11
The following table gives information on the proportions of a population that have been exposed to a risk factor for a disease and then subsequently developed the disease.
Exposure + Indicates the proportion exposed to the risk factor (A+B)
Exposure - Indicates the proportion not exposed to the risk factor (C+D)
Disease + Indicates the proportion that subsequently developed the disease (A+C)
Disease - Indicates the proportion that did not subsequently develop the disease (B+D)
Disease + |
Disease - |
|
Exposure + |
A |
B |
Exposure - |
C |
D |
Define prevalence AND, with reference to A, B, C, D in the table above, give the prevalence of the disease in this population. (20% marks)
Define relative risk (RR) AND, with reference to A, B, C, D in the table above, derive the relative risk Of developing the disease after exposure to the risk factor. (40% marks)
Define attributable risk (AR) AND, with reference to A, B, C, D in the table above, give the attributable risk of exposure to the risk factor on developing the disease in this population. (40% marks)
a) Prevalence: number of event (e.g. disease) in a specific population at a particular time point.
Prevalence of the Disease in this population:
A+C / (A+B+C+D)
b) Relative risk is the ratio of the probability of an event occurring (e.g. developing a disease) in an exposed group to the probability of the event occurring in a comparison, in non-exposed group
[A / (A+B) ] / [ C / (C+D)]
c) Attributable risk is the difference in the rate of a condition between an exposed and unexposed population.
A/ (A+B)-C/(C+D)
This is another SAQ which makes it very easy to earn high marks, as it asks for unambiguous memorised definitions and has a clear-cut right answer.
Somebody got 9.2.
Prevalence:
Relative risk:
This is the difference in event rates between 2 groups expressed as proportion of the event rate in the untreated group. The slightly broken English of the college answer probably comes from an article similar to the 2017 article by Tenny et al, and was probably meant to say "relative risk is a ratio of the probability of an event occurring in the exposed group versus the probability of the event occurring in the non-exposed group."
Attributable risk:
A colleague directs your attention to a recently published randomised trial on a therapeutic intervention.
Outline the features of the trial that would lead you to change your practice.
Points to consider in the answer would be:
This is slightly different to asking "what makes a valid trial" or "how do you judge high-quality evidence", even though these clearly play a role (and in fact the college answer consists of a boring list of such criteria). There are situations where practice is changed by methodologically inferior but otherwise compelling studies; or where expertly designed trials make minimal impact in the daily practice of individuals. A good read on this specific subject is a wonderfully titled 2016 article by John Ioannidis, "Why most clinical research is not useful."
In short, a trial should possess the following features in order to affect practice:
Answers to a real problem. The clinical trial needs to be addressing something which is a problem, and which needs to be fixed in some way. If there is no problem, then the trial was pointless because existing practice is already good enough (i.e. no matter how good the methodological quality, the trial can be safely ignored because your practice does not need to change). Similarly, if the problem is not sufficiently serious, the cost and consequences of changing practice outweighs the benefit.
Information Gain. The clinical trial should have offered an answer which we don't already know.
Pragmatism. The trial should be related to a real-life population and realistic settings, rather than some idealised scenario.
Patient-centered outcome. Some might argue that research should be aligned with the priorities of patients rather than those of investigators or sponsors.
Transparency. The trial authors should be transparent in order for the results to inspire enough confidence to change practice on the basis of its results.
Validity. The trial should be constructed with sufficient methodological quality for its results to be taken seriously.
Oh's Intensive Care manual: Chapter 10 (p83), Clinical trials in critical care by Simon Finfer and Anthony Delaney.
JAMA: User's guides to the medical literature; see if you can get institution access to these articles.
The CONSORT statement has its own website and is available for all to peruse.
CASP (Critical Appraisal Skils Program) has checklists for the appraisal of many different sorts of studies; these actually come with tickboxes. One imagines reviewers wandering around a trial headquarters, ticking these boxes on their little clipboards.
Ioannidis, John PA. "Why most clinical research is not useful." PLoS medicine 13.6 (2016): e1002049.
Regarding randomised clinical trials:
a) What is a noninferiority trial? (10% marks)
b) What is the null hypothesis in a noninferiority trial? (10% marks)
c) Why would a noninferiority trial be undertaken instead of a superiority trial? (40% marks)
d) What are the limitations of noninferiority trials? (40% marks)
a)
An active control trial which tests whether an experimental treatment is not worse than the control treatment by more than a specified margin. Originally conceived as “a safe alternative” treatment.
b)
The null hypothesis states that the primary end point for the new treatment is worse than that of the active control by a prespecified margin, and rejection of the null hypothesis at a prespecified level of statistical significance permits a conclusion of noninferiority
c)
Typically, a placebo controlled trial would be considered unethical as an established treatment already exists.
The investigators may consider the experimental treatment unlikely to be superior to established treatment or the current treatment is highly effective.
The experimental treatment may offer advantages such as safety (reduced adverse effects), better compliance, lower cost or more convenience.
d)
Proving that two treatments are equivalent could mean that they are both ineffective or even harmful. Could lead to the acceptance of progressively worse treatments if noninferiority is blindly accepted with repeated noninferiority trials ('biocreep').
Conditions and practice may have changed since the original placebo trial of the current standard treatment.
Equipoise is more complex.
Analysis is more complex
A poorly conducted study tends to “noninferiority” as missing data and protocol violations favour noninferiority.
The margin by which non-inferiority is determined is arbitrarily decided by the researchers and may not be clinically appropriate
Sample sizes larger than placebo controlled trials
Examiners Comments:
Very poorly answered. Evidence based medicine is an important part of the curriculum and the examiners were concerned at the low level of knowledge displayed. Some candidates appeared to list unrelated phrases from the EBM literature without any appearance of understanding.
The level of detail given in the template was not required to obtain a passing mark in this question.
a) What is a noninferiority trial?
b) What is the null hypothesis in a noninferiority trial?
In superiority trials, the hypothesis is that the experimental treatment is different (better) to the standard treatment, and two-sided statistical tests are used to test the null hypothesis (because the experimental treatment could be better or worse). The null hypothesis is therefore that there really is no difference. In equivalence trials the null hypothesis is that the treatments are significantly different, by a specified margin (the "equivalence margin"). In non-inferiority trials the null hypothesis is that the experimental treatment is worse than the standard treatment - and the pre-specified equivalence margin determines how much worse.
The diagram below is borrowed and modified from Ian A Scott (2009), and demonstrates the results and confidence interval ranges expected of the three different types of trials, when they have demonstrated that the null hypothesis is false.
Superiority trials have to have their results well over to the "favours experimental treatment" side, usually by a pre-specified margin. Equivalence trials need to have their results and confidence intervals within that margin to confirm that the two treatments are in fact equivalent. Non-inferiority trials also need to have their results within that margin, but there is no need to prove that the treatment is superior (i.e the confidence intervals and results simply need to remain within the not much worse margin, the "+1%" line in the diagram).
c) Why would a noninferiority trial be undertaken instead of a superiority trial?
A non-inferiority trial is appropriate when:
d) What are the limitations of noninferiority trials?
Snapinn, Steven M. "Noninferiority trials." Trials 1.1 (2000): 19.
Lesaffre, Emmanuel. "Superiority, equivalence, and non-inferiority trials." Bulletin of the NYU hospital for joint diseases66.2 (2008): 150-154.
Murray, Gordon D. "Switching between superiority and non‐inferiority." British journal of clinical pharmacology 52.3 (2001): 219-219.
Scott, Ian A. "Non-inferiority trials: determining whether alternative treatments are good enough." Med J Aust 190.6 (2009): 326-330.
Piaggio, Gilda, et al. "Reporting of noninferiority and equivalence randomized trials: extension of the CONSORT 2010 statement." Jama 308.24 (2012): 2594-2604.
Garattini, Silvio. "Non-inferiority trials are unethical because they disregard patients' interests." The Lancet 370.9602 (2007): 1875-1877.
Fleming, Thomas R. "Current issues in non‐inferiority trials." Statistics in medicine 27.3 (2008): 317-332.
Give the rationale for using the following techniques in a randomised controlled clinical trial:
a) Allocation concealment. (30% marks)
b) Block randomization. (30% marks)
c) Stratification. (30% marks)
d) Minimisation algorithm. (10% marks)
a) Allocation concealment
Procedure for protecting the randomization process and ensuring that the clinical investigators and those involved in the conduct of the trial are not aware of the group to which the subject has been allocated
b) Block randomisation
Simple randomisation may result in unequal treatment group sizes; block randomisation is a method that may protect against this problem and is particularly useful in small trials.
In the context of a trial evaluating drug A or drug B and with block sizes of 4, there are 6 possible blocks of randomisation: AABB, ABAB, ABBA, BAAB, BABA, BBAA.
One of the 6 possible blocks is selected randomly, and the next 4 study participants is assigned according to the order of the block. The process is then repeated as needed to achieve the necessary sample size.
c) Stratification
Stratification is a process that protects against imbalance in prognostic factors/confounders that are present at the time of randomisation.
A separate randomisation list is generated for each prognostic subgroup. Usually limited to 2-3 variables because of increasing complexity with more variables.
d) Minimisation algorithm
This is an alternative to stratification for maintaining balance in several prognostic variables.
The minimisation algorithm maintains a running total of the prognostic variables in patients that have already been randomised and then subsequent patients are assigned using a weighting system that minimizes imbalance in those prognostic variables.
This question is virtually identical to Question 19 from the first paper of 2019, where the trainees were expected to explain these terms rather than offer a rationale for them. The college answer to both questions is identical, suggesting that the examiners do not see any distinction in their wording (or, that they are indifferent to the candidates' interpretation of the question). Either way, for whatever reason the first time around this SAQ did very poorly (only one candidate passed, and barely at that), whereas this time it seems 49.3% scored over 5.0, and some EBM genius scored 8.5.
Without further ado:
Allocation concealment:
Block randomisation:
"...sometimes we want to keep the numbers in each group very close at all times. Block randomisation (also called restricted randomisation) is used for this purpose. For example, if we consider subjects in blocks of four at a time there are only six ways in which two get A and two get B: 1:AABB 2:ABAB 3:ABBA 4:BBAA 5:BABA 6:BAAB. We choose blocks at random to create the allocation sequence. Using the single digits of the previous random sequence and omitting numbers outside the range 1 to 6 we get 5623665611. From these we can construct the block allocation sequence BABA/BAAB/ABAB/ABBA/BAAB, and so on. The numbers in the two groups at any time can never differ by more than half the block length. Block size is normally a multiple of the number of treatments."
Stratification:
Minimisation algorithm:
Altman, Douglas G., and J. Martin Bland. "How to randomise." Bmj 319.7211 (1999): 703-704.
a) What is a Standardised Mortality Ratio (SMR) and how is it calculated? (20% marks)
b) The SMR in your ICU has increased from 0.95 to 1.05 in the past 12 months. Outline the possible causes. (80% marks)
a) Overview of SMR (20% marks)
SMR is one of the quality indicators that reflect the performance of an ICU.
Definition of SMR = ratio of observed deaths in the study group to expected deaths in the general population based on APACHE or other severity of illness
SMR values of 1 indicate expected performance, whereas values below 1 and above 1 indicate respectively better and worse performances than expected
b) Causes for increase (80% marks)
Lower than expected predicted mortality
Errors in predicted/expected mortality due to gaps in data, changes in case-mix etc
Change in data collection systems or personnel – e.g., change in the way the expected mortality is estimated
Lead-time bias (pre-ICU care) – patients transferred from other facilities may have become more stable after receiving appropriate management at the original hospital.
Increases in observed mortality
Based on hospital mortality, not ICU mortality – therefore, influenced by pre-ICU and post ICU care in the hospital
Change in case-mix, so changes in case mix may account for increase in SMR and increased other hospital admissions
One-off events such as mass disasters, epidemics etc
Variations in practice, changes in clinical protocols either in the hospital or in the ICU Changes in personnel – e.g., new intensivist, new surgeon etc
Changes in staffing levels and training
New services introduced such as ECMO etc.
Examiner’s Comments:
The candidates rarely considered the denominator. Often wrote "admitted sicker patients" without considering these likely to also have higher predicted mortality. Rarely any structure.
In brief:
Causes for an elevation of the SMR were separated into two categories by the college; either the predicted mortality has dropped, or the actual mortality has increased. Another way of looking at this is whether the SMR elevation is "true", or whether it is spurious, i.e. where the change in SMR is not representative of a change in the quality of care being provided by the ICU.
Young, Paul, et al. "End points for phase II trials in intensive care: Recommendations from the Australian and New Zealand clinical trials group consensus panel meeting." Critical Care and Resuscitation 15.3 (2013): 211. - this one is not available for free, but the 2012 version still is:
Young, Paul, et al. "End points for phase II trials in intensive care: recommendations from the Australian and New Zealand Clinical Trials Group consensus panel meeting." Critical Care and Resuscitation 14.3 (2012): 211.
Suter, P., et al. "Predicting outcome in ICU patients." Intensive Care Medicine20.5 (1994): 390-397.
Martinez, Elizabeth A., et al. "Identifying Meaningful Outcome Measures for the Intensive Care Unit." American Journal of Medical Quality (2013): 1062860613491823.
Tipping, Claire J., et al. "A systematic review of measurements of physical function in critically ill adults." Critical Care and Resuscitation 14.4 (2012): 302.
Gunning, Kevin, and Kathy Rowan. "Outcome data and scoring systems." Bmj319.7204 (1999): 241-244.
Woodman, Richard, et al. Measuring and reporting mortality in hospital patients. Australian Institute of Health and Welfare, 2009.
Vincent, J-L. "Is Mortality the Only Outcome Measure in ICU Patients?."Anaesthesia, Pain, Intensive Care and Emergency Medicine—APICE. Springer Milan, 1999. 113-117.
Rosenberg, Andrew L., et al. "Accepting critically ill transfer patients: adverse effect on a referral center's outcome and benchmark measures." Annals of internal medicine 138.11 (2003): 882-890.
Burack, Joshua H., et al. "Public reporting of surgical mortality: a survey of New York State cardiothoracic surgeons." The Annals of thoracic surgery 68.4 (1999): 1195-1200.
Hayes, J. A., et al. "Outcome measures for adult critical care: a systematic review." Health technology assessment (Winchester, England) 4.24 (1999): 1-111.
RUBENFELD, GORDON D., et al. "Outcomes research in critical care: results of the American Thoracic Society critical care assembly workshop on outcomes research." American journal of respiratory and critical care medicine 160.1 (1999): 358-367.
Turnbull, Alison E., et al. "Outcome Measurement in ICU Survivorship Research From 1970 to 2013: A Scoping Review of 425 Publications." Critical care medicine (2016).
Solomon, Patricia J., Jessica Kasza, and John L. Moran. "Identifying unusual performance in Australian and New Zealand intensive care units from 2000 to 2010." BMC medical research methodology 14.1 (2014): 1.
Liddell, F. D. "Simple exact analysis of the standardised mortality ratio." Journal of Epidemiology and Community Health 38.1 (1984): 85-88.
Ben-Tovim, David, et al. "Measuring and reporting mortality in hospital patients." Canberra: Australian Institute of Health and Welfare (2009).
McMichael, Anthony J. "Standardized Mortality Ratios and the'Healthy Worker Effect': Scratching Beneath the Surface." Journal of Occupational and Environmental Medicine 18.3 (1976): 165-168.
Wolfe, Robert A. "The standardized mortality ratio revisited: improvements, innovations, and limitations." American Journal of Kidney Diseases 24.2 (1994): 290-297.
Kramer, Andrew A., Thomas L. Higgins, and Jack E. Zimmerman. "Comparing observed and predicted mortality among ICUs using different prognostic systems: why do performance assessments differ?." Critical care medicine 43.2 (2015): 261-269.
Spiegelhalter, David J. "Funnel plots for comparing institutional performance." Statistics in medicine 24.8 (2005): 1185-1202.
Teres, Daniel. "The value and limits of severity adjusted mortality for ICU patients." Journal of critical care 19.4 (2004): 257-263.
Outline the features and list the advantages and disadvantages of each of the following clinical trial designs:
a) Cluster randomised trial. (50% marks)
a) Non-inferiority trial. (50% marks)
Cluster randomised trial (10%)
Unit of randomisation is the cluster (e.g. one hospital or ICU) rather than individual patients. Individual clusters may be matched / paired with similar clusters to increase power
Power increased more by increasing number of clusters rather than increased numbers of patients within clusters
Advantages (20%)
Ability to test interventions directed at systems rather than individuals (e.g. MET, SDD, education campaigns)
Where individual patients not consented may lead to recruitment of ‘all’ patients with the entry criteria
–increased recruitment and external validity
Disadvantages (20%)
Larger numbers of patients required when compared to conventional individual patient RCT i.e. reduced statistical efficiency
Complex statistics: power calculation require knowledge or estimate of intercluster correlation coefficient
Chance of getting imbalance is greater depending on the characteristics of the cluster
b)
Non-inferiority trial (10%)
The null hypothesis in a noninferiority study states that the primary end point for the experimental treatment is worse than that for the positive control
treatment by a specified margin. Rejection of the null hypothesis supports a claim of noninferiority the control treatment
Advantages: (20%)
Allows investigation of a new therapy to be compared to an existing accepted therapy Does not require a placebo group, where this may be unethical
Allows cheaper or less toxic therapies to the introduced in place of older therapies
Disadvantages: (20%)
Does not prove efficacy of tested therapy Relies upon known / accepted benefit of control
Needs to be performed under similar conditions in which the active control has demonstrated benefit No clear consensus on what margin of noninferiority should be accepted
Repeated noninferiority trial may lead to acceptance of inferior therapies ‘biocreep’
Examiners Comments:
Significant knowledge gap. Disappointing, since several important trials have followed these designs.
The disappointment felt at the 4.5% pass rate for this question underscores the need to promote formal training in statistics and literature analysis. Other colleges have already moved to such a strategy, where their trainees may dispense with the increasingly pointless formal project (a mandatory requirement to generate meaningless papers) by satisfying their research requirements though a university unit of study in interpretation of evidence-based medicine.
In summary:
Features of a cluster-randomised trial:
Advantages of a cluster-randomised trial:
Disadvantages of a cluster-randomised trial:
Features of a non-inferiority trial
Advantages of a non-inferiority trial:
A non-inferiority trial is appropriate when:
Disadvantages of non-inferiority trials
Snapinn, Steven M. "Noninferiority trials." Trials 1.1 (2000): 19.
Lesaffre, Emmanuel. "Superiority, equivalence, and non-inferiority trials." Bulletin of the NYU hospital for joint diseases66.2 (2008): 150-154.
Murray, Gordon D. "Switching between superiority and non‐inferiority." British journal of clinical pharmacology 52.3 (2001): 219-219.
Scott, Ian A. "Non-inferiority trials: determining whether alternative treatments are good enough." Med J Aust 190.6 (2009): 326-330.
Piaggio, Gilda, et al. "Reporting of noninferiority and equivalence randomized trials: extension of the CONSORT 2010 statement." Jama 308.24 (2012): 2594-2604.
Garattini, Silvio. "Non-inferiority trials are unethical because they disregard patients' interests." The Lancet 370.9602 (2007): 1875-1877.
Fleming, Thomas R. "Current issues in non‐inferiority trials." Statistics in medicine 27.3 (2008): 317-332.
Campbell, Marion K., and Jeremy M. Grimshaw. "Cluster randomised trials: time for improvement: the implications of adopting a cluster design are still largely being ignored." (1998): 1171-1172.
In the context of clinical trials what is meant by the following terms:
a) Stratification. (20% marks)
b) Intention to treat analysis. (20% marks)
c) Sensitivity analysis. (20% marks)
d) Kaplan-Meir curve. (20% marks)
e) Analysis of competing risk. (20% marks)
a) Stratification of clinical trials is the partitioning of subjects and results by a factor other than the treatment given
b) Intention to treat analysis is the analysis of all participants allocated to a treatment group irrespective of whether they completed the treatment, withdrew, or deviated from protocol.
c) A sensitivity analysis is the analysis of data from the trial with a change or alteration to one or more underlying assumptions used in the original analysis.
d) A Kaplan-Meir curve is a plot of probability of survival against time.
e) Analysis of competing risk is used when there are multiple endpoints of which the occurrence of one prevents the occurrence of another (e.g. death prevents the occurrence of shock reversal
Stratification
"Stratification is a process that protects against imbalance in prognostic factors that are present at the time of randomisation.
A separate randomisation list is generated for each prognostic subgroup. Usually limited to 2-3 variables because of increasing complexity with more variables"
Intention to treat analysis
Sensitivity analysis
"Kaplan-Meir" curve (it's usually spelled "Meier", after Paul Meier):
Analysis of competing risk:
Morris, Tim P., Brennan C. Kahan, and Ian R. White. "Choosing sensitivity analyses for randomised trials: principles." BMC medical research methodology 14.1 (2014): 11.
Rich, Jason T., et al. "A practical guide to understanding Kaplan-Meier curves." Otolaryngology—Head and Neck Surgery 143.3 (2010): 331-336.
Noordzij, Marlies, et al. "When do we need competing risks methods for survival analysis in nephrology?." Nephrology Dialysis Transplantation 28.11 (2013): 2670-2677.
A prospective observational study examining the association between fluid therapy and outcome reports the following results:
"Crude 90-day mortality of patients who received colloids was higher than in patients treated exclusively with crystalloids; (25.5% vs. 15.4%, odds ratio (OR) 1.84, 95% confidence interval (Cl) 1.56 to 2.18). After multiple logistic regression analysis, the adjusted OR was 0.923, 95% Cl (0.87 to 1.19), p = 0.09."
a) Interpret these results. (30% marks)
a)
There was a significantly higher mortality in patients who received colloids compared to those who received crystalloids. However, when other factors likely to influence mortality were taken into account by multiple logistic regression analysis, the difference was no longer statistically significant. The interpretation is that fluid choice is not significantly associated with 90-day mortality. (3 marks)
The data here comes from Ertmer et al, 2011.
The crude odds ratio here appears statistically significant as the CI is well away from 1.0. The effect size there is also significant. There is no p value reported, which is unhelpful. The adjusted OR is very different and is in fact the opposite of the crude OR, which raises major concerns. The "multiple logistic regression analysis" would have to be more carefully scrutinised to determine which variables they threw into the soup. Usually, the investigators just choose whichever variables had a p-value below 0.05 in the first univariate analysis. The more intelligent method would be to test the independent variables in pairs and in groups to understand the meaning behind their interaction, and then pick only the meaningful variables for your multivariate analysis. In short, there was no difference in mortality, according to the presented fragments of data.
Ertmer, Christian, et al. "Fluid therapy and outcome: a prospective observational study in 65 German intensive care units between 2010 and 2011." Annals of intensive care 8.1 (2018): 27.
A randomised controlled trial examining a treatment for septic shock reports the following results:
"At 90 days after randomization, 27.9% patients who had been assigned to receive the treatment had died, as had 28.8% who had been assigned to receive placebo (odds ratio 0.95; 95% confidence interval [Cl], 0.82 to 1.10; P value= 0.50)."
a) Explain the meaning of the underlined terms. Interpret the result of the trial.
(40% marks)
Odds ratio: The odds of a patient in the treatment group dying within 90 days divided by the odds of patients in the placebo group dying within 90 days.
95% confidence interval: The range of values which is 95% certain to contain the population parameter of interest (in this case, Odds Ratio)
P Value: The probability of obtaining the observed, or more extreme results, assuming the null hypothesis is true. (3 marks)
In case it matters to anybody, in this SAQ the examiners are using the findings of the ADRENAL trial (Venkatesh et al, 2018)
Odds ratio:
Confidence interval:
p-value:
Interpretation of results:
A randomised controlled trial examining a treatment for lung injury reports the following results:
"The primary outcome was change in SOFA score over 96 hours. The mean SOFA score from baseline to 96 hours decreased from 9.8 to 6.8 in the treatment group (3 points) and from 10.3 to 6.8 in the placebo group (3.5 points) (difference, -0.10; 95% CI, - 1. 23 to 1.03; P = 0.86).
There were 30 prespecified secondary outcomes. Twenty-nine were not significantly different between the treatment and the placebo group. In exploratory analyses that did not adjust for multiple comparisons, day 28 mortality was 46.3% in the placebo group vs 29.8% in the treatment group (P = 0.03; between-group difference, 16.58% [95% CI, 2% to 31.1%))."
a) Interpret these results. (30% marks)
a)
The primary outcome does not demonstrate a significant difference between the two groups and so the overall result of the trial is negative. A secondary outcome of day 28-day mortality does show a significant difference in favour of the treatment – however as this is one of 30 secondary outcomes, with no adjustment for multiplicity of testing this is likely a false positive result and should be interpreted cautiously. (3 marks)
Thee findings borrowed for this SAQ come from the CITRIS-ALI trial (Truwit et al, 2019), in case anybody cares.
The primary outcome is not statistically significant because of the high p-value (0.86 is pretty terrible) and because the confidence interval crosses over 1.0
As to the secondary outcome. If you have thirty (and ultimately CITRIS-ALI had forty-six) some of them are bound to produce some sort of publishable information. Day 28 mortality difference was statistically significant (p= 0.03), but because this is a secondary outcome, it should be viewed as hypothesis-generating. On an unrelated note, mortality of 46% in sepsis or ARDS is so 1990s.
Truwit, Jonathon D., et al. "Effect of vitamin C infusion on organ failure and biomarkers of inflammation and vascular injury in patients with sepsis and severe acute respiratory failure: the CITRIS-ALI randomized clinical trial." Jama 322.13 (2019): 1261-1270.
Regarding randomised clinical trials, explain the following terms:
a) External validity. (20% marks)
b) Allocation concealment. (20% marks)
c) Stratification. (20% marks)
d) Sensitivity analysis. (20% marks)
e) Fragility Index. (20% marks)
Not available.
These SAQs are the output of a Random Statistics Definition Generator. All of these definitions have appeared in past paper questions, just not all of them in the same question. Links on the bolded answer headings below lead to past papers where the examiners left us with their formal official answers. The fragility index was the only new one.
a) External validity: the extent to which the study results can be generalised to the greater population, which is influenced by a vast array of factors:
b) Allocation concealment: Procedure for protecting the randomization process and ensuring that the clinical investigators and those involved in the conduct of the trial are not aware of the group to which the subject has been allocated
c) Stratification is a process that protects against imbalance in prognostic factors that are present at the time of randomisation.
d) Sensitivity analysis is the analysis of data from the trial with a change or alteration to one or more underlying assumptions used in the original analysis.
e) The fragility index is a number indicating how many patients would be required to convert a trial from being statistically significant to not significant (p ≥ 0.05)
Regarding randomised clinical trials:
a) What is a noninferiority trial? (10% marks)
b) What is the null hypothesis in a noninferiority trial? (10% marks)
c) Why would a noninferiority trial be undertaken instead of a superiority trial? (40% marks)
d) What are the limitations of noninferiority trials? (40% marks)
Not available.
This question is identical to Question 24 from the first paper of 2018.
a) What is a noninferiority trial?
b) What is the null hypothesis in a noninferiority trial?
In superiority trials, the hypothesis is that the experimental treatment is different (better) to the standard treatment, and two-sided statistical tests are used to test the null hypothesis (because the experimental treatment could be better or worse). The null hypothesis is therefore that there really is no difference. In equivalence trials the null hypothesis is that the treatments are significantly different, by a specified margin (the "equivalence margin"). In non-inferiority trials the null hypothesis is that the experimental treatment is worse than the standard treatment - and the pre-specified equivalence margin determines how much worse.
The diagram below is borrowed and modified from Ian A Scott (2009), and demonstrates the results and confidence interval ranges expected of the three different types of trials, when they have demonstrated that the null hypothesis is false.
Superiority trials have to have their results well over to the "favours experimental treatment" side, usually by a pre-specified margin. Equivalence trials need to have their results and confidence intervals within that margin to confirm that the two treatments are in fact equivalent. Non-inferiority trials also need to have their results within that margin, but there is no need to prove that the treatment is superior (i.e the confidence intervals and results simply need to remain within the not much worse margin, the "+1%" line in the diagram).
c) Why would a noninferiority trial be undertaken instead of a superiority trial?
A non-inferiority trial is appropriate when:
d) What are the limitations of noninferiority trials?
Snapinn, Steven M. "Noninferiority trials." Trials 1.1 (2000): 19.
Lesaffre, Emmanuel. "Superiority, equivalence, and non-inferiority trials." Bulletin of the NYU hospital for joint diseases66.2 (2008): 150-154.
Murray, Gordon D. "Switching between superiority and non‐inferiority." British journal of clinical pharmacology 52.3 (2001): 219-219.
Scott, Ian A. "Non-inferiority trials: determining whether alternative treatments are good enough." Med J Aust 190.6 (2009): 326-330.
Piaggio, Gilda, et al. "Reporting of noninferiority and equivalence randomized trials: extension of the CONSORT 2010 statement." Jama 308.24 (2012): 2594-2604.
Garattini, Silvio. "Non-inferiority trials are unethical because they disregard patients' interests." The Lancet 370.9602 (2007): 1875-1877.
Fleming, Thomas R. "Current issues in non‐inferiority trials." Statistics in medicine 27.3 (2008): 317-332.
Explain the following statistical terms:
a) Sensitivity. (20% marks)
b) Specificity. (20% marks)
c) Receiver Operating Characteristic (ROC) Curve. (60% marks)
Not available.
a)
b)
c)
Hanley, James A., and Barbara J. McNeil. "The meaning and use of the area under a receiver operating characteristic (ROC) curve." Radiology 143.1 (1982): 29-36.
Zweig, Mark H., and Gregory Campbell. "Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine." Clinical chemistry 39.4 (1993): 561-577.
Cook, Nancy R. "Use and misuse of the receiver operating characteristic curve in risk prediction." Circulation 115.7 (2007): 928-935.
Jones, Catherine M., and Thanos Athanasiou. "Summary receiver operating characteristic curve analysis techniques in the evaluation of diagnostic tests."The Annals of thoracic surgery 79.1 (2005): 16-20.
Metz, Charles E. "ROC methodology in radiologic imaging." Investigative radiology 21.9 (1986): 720-733.
Søreide, Kjetil. "Receiver-operating characteristic curve analysis in diagnostic, prognostic and predictive biomarker research." Journal of clinical pathology 62.1 (2009): 1-5.
Lusted, Lee B. "ROC recollected." Medical Decision Making (1984): 131-135.