[Click here to toggle visibility of the answers]

[Click here to toggle printing every question on a separate page]

Question 2b - 2001, Paper 1

You have taken over the directorship of a district hospital ICU.  Part of your mandate is to establish a Quality Assurance program.

(b)  What is the relevance of Evidence Based Medicine to your patients and how will you apply this?

College Answer

Evidence Based Medicine has been defined as the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual  patients. It is not new let alone revolutionary. Its relevance to the candidate’s practice is its ability to add to clinical experience, basic science and physiological principle.

Unfortunately an individual would be unable to review and critically assess all the literature available in all languages. Practitioners are dependent on reviews, meta-analyses and expert opinions. Many questions have yet to be answered effectively or in many cases are yet to be addressed at all. Other questions are beyond scientific assessment eg the use of no antibiotic in pneumonia. A complete appreciation  of  EBM  requires  review  of  the  literature,  audit  of  local  practice  ie techniques/management in one’s own ICU, implementation of EBM based practice and follow-up audit of results. Although not itself assessed by trials, EBM, by scientific appraisal and review, formalises an aspect of quality improvement which should be relevant to ICU practice.

Discussion

Evidence based medicine is the system of critical evaluation of published data for applicability to the management of individual patients, or something.
The college regale us with the the Sackett definition of EBM:

"Evidence based medicine is the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients. The practice of evidence based medicine means integrating individual clinical expertise with the best available external clinical evidence from systematic research."

Again, one could digress extensively here, scoring virtually no marks.

Were such an essay-style question ever to return to CICM fellowship papers, one would rant creatively, using the following points as a skeleton:

Relevance of EBM to ICU practice

  • Adds to clinical experience and physiological science
  • Informs non-abstract bedside decisionmaking as well as broader department policy
  • Forms an aspect of quality improvement

"How will you apply this?"

  • Framing a question or series of questions, which are focused and answerable
  • Literature review
  • Critical appraisal of the literature
  • Audit of local practice
  • Integration into local practice
  • Audit of outcomes and refinement of implementation strategy

References

Cook, D. J., and M. K. Giacomini. "The integration of evidence based medicine and health services research in the ICU." Evaluating Critical Care. Springer Berlin Heidelberg, 2002. 185-197.

 

Kotur, P. F. "Evidence-Based Medicine in Critical Care." Intensive and Critical Care Medicine. Springer Milan, 2009. 47-57.

Question 2c - 2001, Paper 1

An article appears reporting the positive effects of a new agent in a trial of 50 patients with septic shock.

(c) What criteria will you use to assess the validity of this article to your ICU?

College Answer

The criteria for assessment of such an article include:

•    Is the trials design valid and powered to achieve a result? It seems doubtful in this case but a large effect in a specific group may be detected.

•    Was the hypothesis based on valid evidence?

•    Were all the entered patients accounted for?

•    Were the groups equivalent after randomisation?

•    Was there proper blinding of study personnel?

•    Apart from the experimental intervention were the groups treated equivalently?

•    Was the statistical analysis appropriate?

•    How large was the treatment affect?

•    Can the results be applied to my patients?

Discussion

Though not owrd-for-word identical, this question closely resembles Question 8 from the second paper of 2012, as well as Question 8 from the first paper of 2004. It discusses the assessment of the validity of a randomised controlled trial, which is discussed in greater detail elsewhere.

The answer is reproduced below, to simplify revision and damage SEO:

Is the premise sound?

  • Is the primary hypothesis biologically plausible?
  • Is the research ethical?
  • If the results are valid, are there any disastrous logistical ethical or financial emplications to a change in practice?

Is the methodology of high quality?

  • Were the inclusion/exclusion criteria appropriate?
  • Was the assignment of patients to treatments randomised? If yes, then was it truly random?
  • Were the study groups homogenous?
  • Were the groups treated equally?
  • Are there any missing patients? Is every enrolled patient accounted for? 
  • Was follow-up complete? Is the drop-out rate explained? Do we know what happened to the dropouts?

Is the reporting of an appropriate quality?

  • Methods describtion should be complete: the trial should be reproduceable
  • Do the results have confidence intervals?
  • Results should present relative and absolute effect sizes
  • Is a CONSORT-style flow diagram of patient selection available?
  • Discussion should contain limitation, bias and imprecision
  • Funding sources and the full trial protocol should be disclosed

Are the results of the study valid?

  • Was there blinding? Was blinding even possible? Was it double-blind? If not, at least were the data interpreters and statisticians blinded?
  • Was there allocation concealment?
  • Was there intention-to-treat analysis?
  • If there were sub-groups, were they identified a priori?

What were the results?

  • How large was the treatment effect?
  • How precisely was the effect estimated? (i.e. what was the 95% confidence interval)

Is this study helpful for me?

  • Is this applicable to my patient? i.e. would my patient have been enrolled in this study?
  • Does the population studied correspond with the population to which my patient belongs?
  • Were all the clinically meaningful outcomes considered?
  • Does the benefit outweigh the cost and risk?

References

Question 14 - 2002, Paper 2

Outline the way you would calculate and how you might use the following features of a diagnostic test: sensitivity, specificity, positive predictive value and negative predictive value.

College Answer

Disease 

Present      Absent

Test

Positive

A

B

A+B

Negative

C

D

C+D

A+C

B+D

A+B + C+D

Sensitivity = proportion of patients with disease detected by positive test = A/(A+C) Specificity = proportion of patients without disease detected by negative test = D/(B+D)
Positive predictive value = proportion of patients with positive test who have disease = A/(A+B) Negative predictive value = proportion of patients with negative test who do not have disease = D/(C+D)
Very high sensitivity means few false negatives. Very high specificity means few false positives.

Discussion

This question closely resembles a whole mass of other questions:

The questions may not be identical, but they test the exact same concepts Here's a helpful list of equations  the college expects us to memorise.

Sensitivity = true positives / (true positives + false negatives)

This is the proportion of patients in whom disease which was correctly identified by the test.

Specificity = true negatives / (true negatives + false positives)

This is the proportion of patients in whom the disease was correctly excluded

Positive predictive value = (true positives / total positives)

This is the proportion of patients with positive test results who are correctly diagnosed.

Negative predictive value = (true negatives / total negatives)

This is the proportion of patients with negative test results who are correctly diagnosed.

References

Question 4 - 2003, Paper 2

Compare and contrast the use of the Chi-squared test, Fisher’s Exact Test and logistic regression when analysing data.

College Answer

All these tests are widely used in the statistical reporting of data and give a representation of the likelihood that a given spread of data occurs by chance.

The Chi-square(d) statistic is used when comparing categorical data (e.g. counts).  Often, these data are simply displayed in a “contingency table” with R rows and C columns.   It’s use is less appropriate where total numbers are small (e.g. N <20) or smallest expected value is less than 5.

Fisher’s Exact test is used when comparing categorical data (e.g. counts), but is only generally applicable in a 2 x 2 contingence table (2 columns and 2 rows).  It is specifically indicated when total numbers are small (e.g. N <20) or smallest expected value is less than 5.

Logistic regression is used when comparing a binary outcome (e.g. yes/no, lived/died) with other potential variables. Logistic regression is most commonly used to perform multivariable analysis (“controlling for” various factors), and these variables can be either categorical (e.g. gender), orcontinuous (e.g. weight), or any combination of these.  The standard ICU mortality predictions are based on logistic regression analysis.

Discussion

When one is invited to "compare and contrast" things, one is well served by a table structure.

First, the prose form: much of what follows is heavily borrowed from LITFL.

Additional reading can be done, if one wishes to actually understand these concepts.

I recommend the following free online resources:

Additionally, I invite everybody to visit this page, where the author Steve Simon (presumably, somebody qualified in statistics) responds to an email he received which asked him to comment on the differences between a Chi-square test, Fisher's Exact test, and logistic regression.

Chi-square test

A statistical test commonly used to compare observed data with data we would expect to obtain according to a specific hypothesis. The chi-square test can be used to test for the "goodness to fit" between observed and expected data.

  • chi-square is the sum of the squared difference between 
    observed (o) and the expected (e) data: χ>2 χ(o-e)2/e
  • May be inappropriate if the sample numbers are small.
  • Cannot be calculated if the expected value in any category is less than 5.

Fisher's Exact Test

Another test like the Chi-square test, to compare observed data with expected data.

  • Used for small data sets (where Chi-square is useless)
  • Only applicable in a 2x2 contingency table

Logistic regression

  • Method of predicting a binary variable (eg. dead or alive) on the basis of numerous predictive factors, to compare observed and predicted data.
  • ICU mortality is predicted using logistic regression analysis
  • Regression coefficients allow the contribution of different predictor variables to be analysed.
  • Goodness of fit can be estimated using a variety of mathematic methods.

Now that the prose is finished, let us tabulate the differences and similarities between these tests.

Chi Square, Fisher's Exact Test and Logistic Regression 
A Comparison of Methods
  Chi Square Fisher's Exact Test Logistic regression
Application "give a representation of the likelihood that a given spread of data occurs by chance"
Specific uses

Nominal data: large samples

Nominal data: small samples

Binary variables
Advantages
  • Able to analyse multiple tables and rows of data
  • Better suited to small data sets (sample size less than 20)
  • "Exact": does not rely on approximation
  • Useful to predict an outcome variable which is binary and categorical from predictor variables that are continuous
  • Used because having a categorical outcome variable violates the assumption of linearity in normal regression.
Limitations
  • Ineficient in handling ordinal data.
  • Cannot be calculated if the expected value in any category is less than 5.
  • Only suited to small data sets (sample size less than 20)
  • Computationally intense: calculations needed for this test increase rapidly as the sample size increases
  • Cannot adjust for possible confounding variables
  • Assumes an independence of errors
  • Assumes no outliers
  •  

References

The ideal reference for this is the BMJ, with their combination of rich statistics info and Old-World credibility. I link to the relevant sections of their Statistics at Square One, by T D V Swinscow.

Question 6 - 2004, Paper 1

Outline the techniques you would use to assess the methodological quality of a placebo controlled prospective randomised clinical trial.

College Answer

Various checklists are available for assessing methodological quality. One such list is that proposed by  David  Sackett.  It  includes  3  main  questions:  was  assignment  randomised  and  was  the randomisation list concealed (minimise potential for bias)?; was follow up of patients sufficiently long and complete (ensure endpoints accurately assessed)?; were patients analysed in the groups to which they were randomised (maintain benefits of randomisation)? It also includes 3 finer points to address: were patients and clinicians (and outcome assessors) kept blind to treatment (minimise bias)?; were groups treated equally apart from the experimental treatment (ensure intervention effect is only thing being assessed)?; were the groups similar at the start of the trial (were there any potentially confounding effects that randomisation did not eliminate)? In addition to these, the study should have enrolled enough patients to be sufficiently powered to detect the perceived clinically important benefit in the primary outcome variable! Standardised criteria have also been published (CONSORT) that were recommended to facilitate consistency and clarity in studies submitted for publication, allowing the reader to more readily assess the internal and external validity of a study.

(Sackett DL et al (eds.). Evidence-based medicine. Churchill Livingstone, London. 2000

Begg C et al. Improving the quality of reporting of randomized controlled trials. The CONSORT statement. JAMA. 1996 Aug 28;276(8):637-9).

Discussion

Though not word-for-word identical, this question closely resembles Question 8 from the second paper of 2012. It discusses the assessment of the validity of a randomised controlled trial, which is discussed in greater detail elsewhere.

In brief:

Is the premise sound?

  • Is the primary hypothesis biologically plausible?
  • Is the research ethical?
  • If the results are valid, are there any disastrous logistical ethical or financial emplications to a change in practice?

Is the methodology of high quality?

  • Were the inclusion/exclusion criteria appropriate?
  • Was the assignment of patients to treatments randomised? If yes, then was it truly random?
  • Were the study groups homogenous?
  • Were the groups treated equally?
  • Are there any missing patients? Is every enrolled patient accounted for? 
  • Was follow-up complete? Is the drop-out rate explained? Do we know what happened to the dropouts?

Is the reporting of an appropriate quality?

  • Methods describtion should be complete: the trial should be reproduceable
  • Do the results have confidence intervals?
  • Results should present relative and absolute effect sizes
  • Is a CONSORT-style flow diagram of patient selection available?
  • Discussion should contain limitation, bias and imprecision
  • Funding sources and the full trial protocol should be disclosed

Are the results of the study valid?

  • Was there blinding? Was blinding even possible? Was it double-blind? If not, at least were the data interpreters and statisticians blinded?
  • Was there allocation concealment?
  • Was there intention-to-treat analysis?
  • If there were sub-groups, were they identified a priori?

What were the results?

  • How large was the treatment effect?
  • How precisely was the effect estimated? (i.e. what was the 95% confidence interval)

Is this study helpful for me?

  • Is this applicable to my patient? i.e. would my patient have been enrolled in this study?
  • Does the population studied correspond with the population to which my patient belongs?
  • Were all the clinically meaningful outcomes considered?
  • Does the benefit outweigh the cost and risk?

References

Question 4 - 2004, Paper 2

Compare and contrast the roles of parametric and non-parametric  tests in analysing data, including examples of types of data and appropriate tests.

College Answer

Parametric tests are used to compare different groups of continuous variables when the data is normally (or near-normally) distributed. Non-parametric tests do not make any assumptions about the distribution of data. They focus on order rather than absolute values, and are used to analyse data that is abnormally distributed (eg. significantly skewed) or data which represent ordered categories but may not be linear (eg. pain scores, ASA score, NYHA score). Commonly used parametric tests include the unpaired t-test (comparing 2 different groups with continuous variables [eg. age in males/females) and variations of the ANalysis Of VAriance (ANOVA: comparing multiple groups with continuous variables [eg. PaO2:FIO2 ratio in Medical/Surgical/Trauma patients). Commonly used non-parametric
tests include the Mann-Whitney U test (comparing 2 different groups with continuous variables [eg. ICU stay in males/females]) and the Kruskal-Wallace test (comparing continuous variables in more than 2 groups [eg. pain score with PCA/epidural/s-c morphine]).

Discussion

You use these to figure out the p-value, i.e. the chance of getting the same results if the null hypothesis were true. There are parametric and non-parametric tests.

Parametric tests

Description of parametric tests

Parametric tests are more accurate, but require assumptions to be made about the data, eg. that the data is normally distributed (in a bell curve). If the data deviate strongly from the assumptions, the parametric test could lead to incorrect conclusions.

If the sample size is too small, parametric tests may lead to incorrect conclusions due to the loss of "normality" of sample distribution.

Examples of parametric tests:

  • Normal distribution
  • Students T Test
  • Analysis of variance
  • Pearson correlation coefficient
  • Regression or multiple regression

Non-parametric tests

Description of non-parametric tests

Non-parametric tests make no assumptions about the distribution of the data. If the assumptions for a parametric test are not met (eg. the distribution has a lot of skew in it), one may be able to use an analogous non-parametric tests.

Non-parametric tests are particularly good for small sample sizes (<30). However, non-parametric tests have less power.

Examples of non-parametric tests:

  • Mann-Whitney U test
  • Wilcoxon sum test
  • Wilcoxon signed-rank test
  • Kruskal-Wallis test
  • Friedman's test
  • Spearman's rank order

References

Hoskin, Tanya. "Parametric and Nonparametric: Demystifying the Terms." Mayo Clinic CTSA BERD Resource. Retrieved from http://www. mayo. edu/mayo-edudocs/center-for-translational-science-activities-documents/berd-5-6. pdf(2012)

 

Question 13 - 2005, Paper 1

For each of the following terms, provide a definition, outline their derivation and outline their  role:  

  • Sensitivity,
  • Specificity,
  • Positive  Predictive  Value,  
  • Negative  Predictive Value.

College Answer

Test

Disease Present

Disease Absent

Positive

A

B

A+B

Negative

C

D

C+D

A+C

B+D

A+B + C+D

Using the presence or absence of a disease, and the result a specific test as an example: Sensitivity = proportion of patients with disease detected by positive test = A/(A+C). Very high values essential if wish to catch all with disease, and allow a negative result to virtually rule out the diagnosis.

Specificity = proportion of patients without disease detected by negative test = D/(B+D). Very high values of specificity essential if wish to catch all without the disease, and allow a positive result to rule in the diagnosis.

Positive predictive value = proportion of patients with positive test who have disease = A/(A+B). PPV allows estimate of certainty around positive result.

Negative predictive value = proportion of patients with negative test who do not have disease = D/(C+D). NPV allows estimate of certainty about a negative result.

Discussion

Later papers focus merely on the candidate's ability to apply the formulae.

One can make a strong argument for a return to questions which test one's understanding of the actual concept, rather than demanding the regurgitation of rote-learned equations.

To rote-learn the abovemention equations, here is a helpful list.

Sensitivity = true positives / (true positives + false negatives)

This is the proportion of patients in whom disease which was correctly identified by the test.

Specificity = true negatives / (true negatives + false positives)

This is the proportion of patients in whom the disease was correctly excluded

Positive predictive value = (true positives / total positives)

This is the proportion of patients with positive test results who are correctly diagnosed.

Negative predictive value = (true negatives / total negatives)

This is the proportion of patients with negative test results who are correctly diagnosed.

References

Altman, Douglas G., and J. Martin Bland. "Statistics Notes: Diagnostic tests 2: predictive values." Bmj 309.6947 (1994): 102.

Question 21 - 2005, Paper 2

“The absence of evidence of effect does not imply evidence of absence of effect”. Please explain how this statement applies to evaluation of the medical literature.

College Answer

Candidates were expected to think more broadly than just the “power” of a study. Consider:

No evidence - never asked the question. Low level evidence. Physiological  data  only. Animal data  only. Ethical barriers to conducting the definitive study. Unanswerable for logistic reasons. Restrospective / case series only. Poorly designed existing studies (related to blinding, allocationconcealment, loss of follow up, intention to treat, uniform management apart from intervention., appropriate stats methods etc.). Meta-analysis pitfalls - significant disagreements with subsequent RCT. Type 2 error - false acceptance of null hypothesis - inadequate power - small single centre studies

Discussion

This question recalls a more uncivilised time, when bewildered CICM fellowship candidates were assailed by vaguely worded essay questions in an attempt to wring some sort of creative lateral thinking from their algorithmic reptile brains. The resulting confusion can be observed even in the college answer, which -rather than defending any particular argument - instead exhorts us to think "broadly", and then presents us with a word salad of key phrases to consider. The modern papers are thankfully free from this sort of thing.

If one were to take this question seriously, one would structure one's response in the following manner:

Definition

“The absence of evidence of effect does not imply evidence of absence of effect” is a rebuttal to the Argument from Ignorance, which (put simply) states that if something has not been proven true, then it must be false. The rebuttal addresses the third possibility, that the currently available evidence has failed to detect a phenomenon. In the interpretation of medical literature, this means that a study that has failed to demonstrate the evidence of a risk has not succeeded in demonstrating the absence of risk. Similarly, a study which has failed to demonstrate a significant difference between two treatments has not demonstrated the absence of difference, only the absence of evidence of a difference.

Rationale

The idea that the absence of evidence for a phenomenon should imply that there is no such phenomenon is known in the form of the Kehoe principle, named after Robert Kehoe who argued that the use of leaded petrol was safe because at that stage there was no evidence to the contrary. The opposite view is known as the Precautionary Principle. It holds that in the absence of evidence, one must take a conservative stance and manage uncertain risks in a manner which most effectively serves human safety.

Advantages

In the absence of evidence, the precautionary principle recommends that the clinician takes reasonable measures to avoid threats that are serious and plausible. In this, it may be a more humanistic principle than the alternatives (such as the Expected Utility Theory).

In brief:

  • Safest and most humanistic approach
  • Risk-averse
  • The burden of proof of safety is on the investigator
  • The burden of risk and benefit analysis is on the clinician

Disadvantages

In its strongest formulation, the Precautionary Principle calls for absolute proof of safety before new treatments or techniques are adopted. Such stringent standards may result in an excessive regulation of potentially useful treatment strategies. One may envision a reductio ad absurdum where table salt is outlawed because there is insufficient evidence for its safety. Some authors have suggested that the precautionary principle "replaces the balancing of risks and benefits with what might best be described as pure pessimism". Furthermore, not all experimental questions can be answered with high-level evidence (eg. in the case of rare diseases with insufficient sample size for RCTs, or in the cases where it is unethical to randomise intervention).

Published data may not offer sufficient evidence. The power of a study influences its ability to discern an effect of a given size, and it is possible that small studies are inadequately powered to detect a small treatment effect. Type 2 errors can be committed in this way.

In brief:

  • Potentially useful treatments may be discarded for lack of evidence
  • Not all treatments can be the subject of RCTs, particularly
    • where sample size in by necessity small
    • where randomisation is unethical
    • where blinding is impossible
  • Not all studies of effective treatments are appropriately powered to detect an effect of appropriate size
  • Not all meta-analysis reviews are able to find all the available evidence due to publication bias

In summary:

There is a danger of misinterpreting "negative studies", because studies which have not found statistically significant differences in effect may have been inadequate to detect such an effect. In careful interpretation of medical literature one must be alert to the idea that not all negative studies are truly "negative". Decisonmaking in uncertainty should be guided by humanistic principles and careful risk-vs-benefit analysis.

References

Foster, Kenneth R., Paolo Vecchia, and Michael H. Repacholi. "Science and the precautionary principle." Science 288.5468 (2000): 979-981.

 

Alban, S. "The ‘precautionary principle’as a guide for future drug development."European journal of clinical investigation 35.s1 (2005): 33-44.

 

Peterson, Martin. "The precautionary principle should not be used as a basis for decision‐making." EMBO reports 8.4 (2007): 305-308.

 

Altman, Douglas G., and J. Martin Bland. "Statistics notes: Absence of evidence is not evidence of absence." Bmj 311.7003 (1995): 485.

 

Resnik, David B. "The precautionary principle and medical decision making."Journal of Medicine and Philosophy 29.3 (2004): 281-299.

 

Rabin, Matthew. "Risk aversion and expected‐utility theory: A calibration theorem." Econometrica 68.5 (2000): 1281-1292.

 

Alderson, Phil. "Absence of evidence is not evidence of absence." BMJ328.7438 (2004): 476-477.

 

Question 25 - 2006, Paper 1

In the context of clinical trials, define the following terms:

(a)      Relative risk
(b)      Absolute risk
(c)       Number needed to treat
(d)       Power of the study

College Answer

A number of potential definitions exist. One example for each is listed below:
Relative risk: the difference in event rates between 2 groups expressed as proportion of the event rate in the untreated group.

Absolute risk: this is the actual event rate in the treatment or the placebo group. The absolute risk reduction is the arithmetical difference between the event rates between the two groups

Number Needed to Treat: The NNT is the number of patients to whom a clinician would need to administer a particular treatment for 1 patient to receive benefit from it. It is calculated as 100 divided by the absolute risk reduction when expressed as a percentage or 1 divided by the absolute risk reduction when expressed as a proportion.
Power of the study: The probability that a study will produce a significant difference at a given significance level is called the power of the study. It will depend on the difference between the populations compared, the sample size and the significance level chosen.

Discussion

Thes question is a verbatim copy of Question 9 from the second paper of 2010.

References

Question 10 - 2006, Paper 2

In the context of a clinical trial, define and explain the significance of the following terms:

a)  Intention  to treat analysis.

b)  Randomization.

College Answer

ITT is the process by which the patients are analysed in the group to which they are randomised.

There are four major lines of justification for intention-to-treat analysis.

1.    Intention-to-treat simplifies the task of dealing with suspicious outcomes, that is, it guards against conscious or unconscious attempts to influence the results of the study by excluding odd outcomes.
2.  Intention-to-treat guards against bias introduced when dropping out is related to the outcome.
3.  Intention-to-treat preserves the baseline comparability between treatment groups achieved by randomization.
4.  Intention-to-treat reflects the way treatments will perform in the population by ignoring adherence when the data are analyzed.

RANDOMISATION is the process of assigning clinical trial participants to treatment groups. Randomisation gives each participant a known (usually equal) chance of being assigned to any of the groups. Successful randomisation requires that group assignment cannot be predicted in advance.

Randomisation aims to obviate the possibility that there is a systematic difference (or bias) between the groups due to factors other than the intervention. Allocation of participants to specific treatment groups in a random fashion ensures that each group is, on average, as alike as possible to the other group(s). The process of randomisation aims to ensure similar levels of all risk factors in each group; not only known, but also unknown, characteristics are rendered comparable, resulting in similar numbers or levels of outcomes in each group, except for either the play of chance or a real effect of the intervention(s). Concealment of randomisation is vital.

Discussion

A brief answer to these questions is possible. However, by asking that the candidate "explain the significance" of these concepts, the college has authorised a torrent of gibberish. One could really get carried away with this.

a)

Definition of intention to treat analysis: This is the practice of grouping patient data according to the randomised allocation of the patient, rather than according to the treatment which they received.

According to Fischer et al,

"ITT analysis includes all randomized patients in the groups to which they were randomly assigned, regardless of their adherence with the entry criteria, regardless of the treatment they actually received, and regardless of subsequent withdrawal from treatment or deviation from the protocol."

Significance of intention to treat analysis:

  • All enrolled patients have to be a part of the final analysis
  • Maintains prognostic balance generated from the original random treatment allocation (preserving the bias- reducing effects of randomisation)
  • Avoids overoptimistic estimates of the treatment's efficacy
  • Accurately models the effect of noncompliance and protocol deviations in clinical practice
  • Prevents bias introduced due to outcome-associated dropouts
  • Prevents bias by resisting the post-hoc manipulation of data to eliminate inconvenient outcomes
  • Preserves the sample size, thus preserving the statistical power
  • Minimises Type 1 error
  • Allows for the greatest external validity
  • Supported by the CONSORT statement
  • Essential for a superiority trial

However:

  • Heterogeneity may be introduced if dropouts and compliant subjects are mixed together in the final analysis
  • Patients who never received the treatment are analysed together with those wo did, which dilutes the treatment effect
  • A large number of dropouts and non-compliant subjects may cause a massive variation in outcome data and could make an effective treatment appear ineffective.

b)

Definition of randomisation: This is the practice of deliberately haphazard allocation of patients to study groups, in order to simulate the effect of chance. Randomisation gives each participant an equal chance of being assigned to any of the groups. Successful randmisation involves a process of allocation which cannot be predicted or "gamed" prior to allocation.

Significance of randomisation:

  • Minimises selection bias
  • Minimises group heterogeneity
  • Controls unknown confounders, which should be randomly and evenly distributed among the groups
  • Allows probability theory to be used to express the likelihood that chance is responsible for the diffences in outcome among groups.
  • Failure to use random allocation and concealment of allocation were associated with relative increases in estimates of effects of 150% or more.

References

Montori, Victor M., and Gordon H. Guyatt. "Intention-to-treat principle."Canadian Medical Association Journal 165.10 (2001): 1339-1341.

 

Gupta, Sandeep K. "Intention-to-treat concept: A review." Perspectives in clinical research 2.3 (2011): 109.

 

Fisher LD, Dixon DO, Herson J, Frankowski RK, Hearron MS, Peace KE. Intention to treat in clinical trials. In: Peace KE, editor. Statistical issues in drug research and development. New York: Marcel Dekker; 1990. pp. 331–50. (not even a sample exists online! I was forced to quote from Gupta et.al.)

 

Beller, Elaine M., Val Gebski, and Anthony C. Keech. "Randomisation in clinical trials." Medical Journal of Australia 177.10 (2002): 565-567.

 

Moher, David, Kenneth F. Schulz, and Douglas G. Altman. "The CONSORT statement: revised recommendations for improving the quality of reports of parallel group randomized trials." BMC Medical Research Methodology 1.1 (2001): 2.

 

Herbert, Robert D. "Randomisation in clinical trials." Australian Journal of Physiotherapy 51.1 (2005): 58-60.

 

Kunz, Regina, and Andrew D. Oxman. "The unpredictability paradox: review of empirical comparisons of randomised and non-randomised clinical trials." Bmj317.7167 (1998): 1185-1190.

 

Altman, D. G., and C. J. Dore. "Randomisation and baseline comparisons in clinical trials." The Lancet 335.8682 (1990): 149-153.

 

Zelen, Marvin. "The randomization and stratification of patients to clinical trials."Journal of chronic diseases 27.7 (1974): 365-375.

Question 15 - 2007, Paper 1

To evaluate a new biomarker as an early index of bacteraemia, you perform the measurement in a consecutive series of 200 critically ill septic patients. You find that 100 of these patients had subsequently proven  bacteraemia. Of these, 70 had a positive biomarker result. Of the  remaining 100 patients without bacteraemia, 40 had a positive biomarker result. 


Using the above data, show how you would calculate:


a) sensitivity 
b) specificity 
c) Positive predictive value 
d) Negative predictive value 
e) Positive Likelihood ratio

Bacteremia present

Bacteremia absent

Biomarker+

70

40

Biomarker-

30

60

100

100

College Answer

a)  Sensitivity=   (TP/ {TP + FN}) = 70/100  

b) Specificity=   (TN/{TN + FP}) = 60/100   

c) PPV =          (TP/{TP+FP})  = 70/110 

d)   NPV =        (TN({TN+FN}) = = 60/90 

e)  Positive likelihood ratio= Sensitivity /1-specificity = 70/40

Discussion

This question is very similar to Question 19.1 from the first paper of 2010, and almost entirely identical to Question 29.2 from the first paper of 2008.

However, it also presents one with a 2×2 table breakdown of results, and there is the added question (e), which asks the candidate to calculate a positive likelihood ratio.

That formula, and relevant others, is presented in the helpful list of equations  one must memorise for the fellowship.

Thus, going through the motions...

true positives = 70

false positives = 40

true negatives = 60

false negatives = 30

a) Sensitivity = True positives / ( true positives + false negatives)

= 70 / (70 + 30) = 70%

b) Specificity = True negatives / (true negatives + false positives)

= 60 / (60 + 40) = 60%

c) Positive predictive value = True positives / (true positives + false positives)

= 70 / (70 + 40) = 63.6%

d) Negative predictive value = True negatives / (true negatives + false negatives)

= 60 / (60+30) = 66.6%

e) Positive Likelihood ratio = sensitivity / (1-specificity)

= 0.7 / (1 - 0.6) = 1.75

References

Question 30 - 2007, Paper 2

a) What is a meta-analysis?

b) What is the role of meta-analysis in evidence based medicine?

c) What are the features you look for in a meta-analysis to determine if it has been well conducted?

College Answer

a)  A form of systematic review that uses statistical methods to combine the results from different studies


b)  roles:

1.

↑ statistical power by ↑ sample size

2.

Resolve uncertainty when studies disagree

3.

Improve estimates of effect size

4.

Establish questions for future PRCTs

c)

1.

Are the research questions defined clearly?

2.

Are the search strategy and inclusion criteria described?

3.

How did they assess the quality of studies?

4.

Have they plotted the results?

5.

Have they inspected the data for heterogeneity?

6.

How have they calculated a pooled estimate?

7.

Have they looked for publication bias?

Discussion

This question - though not entirely identical - is very similar to Question 5 from the second paper of 2013. The key difference is the inclusion of the nebulous question about the role of meta-analysis in EBM. In the later paper, this was focused specifically on the advantages of meta-analysis over the analysis of a single study. If one compares the above answer to (b) with the answer (b) in Question 5, one will discover similarities, which suggests that the college was looking for a list of advantages here as well.

Thus, much of the below is a direct copy of Question 5.

a) What is a meta-analysis?

Meta-analysis is a tool of quantitative systematic review.

It is used to weigh the available evidence from RCTs and other studies based on the numbers of patients included, the effect size, and on statistical tests of agreement with other trials.

b) What is the role of meta-analysis  in evidence based medicine?

  • It offers an objective quantitative appraisal of evidence
  • It reduces the probability of false negative results
  • The combination of samples leads to an improvement of statistical power
  • Increased sample size may increase the accuracy of the estimate
  • It may explain heterogeneity between the results of different studies
  • Inconsistencies among trials may be quantified and analysed

c) What are the features you look for in a meta-analysis to determine if it has been well conducted?

  • Research questions clearly defined
  • Transparent search strategy
  • Thorough search protocol
  • Authors contacted and unpublished data collected
  • Definition of inclusion and exclusion criteria for studies
  • Sensible exclusion and inclusion criteria
  • Assessment of methodological quality of the included studies
  • Transparent methodology of assessment
  • Calculation of a pooled estimate
  • Plot of the results (Forest Plot)
  • Measurement of heterogeneity
  • Assessment of publication bias (Funnel Plot)
  • Reproduceable meta-analysis strategy (eg. multiple reviewers perform the same meta-analysis, according to the same methods)

References

Sauerland, Stefan, and Christoph M. Seiler. "Role of systematic reviews and meta-analysis in evidence-based medicine." World journal of surgery 29.5 (2005): 582-587.

 

DerSimonian, Rebecca, and Nan Laird. "Meta-analysis in clinical trials."Controlled clinical trials 7.3 (1986): 177-188.

 

Rockette, H. E., and C. K. Redmond. "Limitations and advantages of meta-analysis in clinical trials." Cancer Clinical Trials. Springer Berlin Heidelberg, 1988. 99-104.

 

Walker, Esteban, Adrian V. Hernandez, and Michael W. Kattan. "Meta-analysis: Its strengths and limitations." Cleveland Clinic Journal of Medicine75.6 (2008): 431-439.

 

Methodological Expectations of Cochrane Intervention Reviews

Question 29.1 - 2008, Paper 1

A Phase III study of a drug was undertaken to determine if it improved mortality in severe sepsis.  The study design was a randomized, double-blind,  placebo-controlled, multicenter trial (n=1200). The mortality rates in the placebo arm and the trial drug arm were 32% and 26% respectively.  There were no adverse effects noted in relation to the trial drug.

a)  What do you understand by the term Phase III.?

b)  What was the absolute risk reduction?

c) What was the relative risk reduction?

d)  Calculate the “number needed to treat”?

College Answer

a)  What do you understand by the term Phase III.?
Phase III trials compare new treatments with the best currently available treatment (the
standard treatment). Much larger sample sizes than Phase II and are usually randomised. They are aimed at being the definitive assessment of how effective the drug is, in comparison with current 'gold standard' treatment.

b)  What was the absolute risk reduction?
6%

c) What was the relative risk reduction?
18.75%

d)  Calculate the “number needed to treat”?
16.66

Discussion

Again, the candidate is called upon to recall equations and to perform basic mathematics. A helpful list of such equations is available.

a) a Phase III trial is a study of the treatment effect of the drug, which is performed in a large group of patients, all of whom have the disease being studied. The purpose of a a Phase III trial is to test efficacy of an experimental treatment in comparison to standard of care or "gold standard" therapy.

One can find more information about the phases of clinical research in brief in this 2011 BMJ statistics question by Phillip Sedgwick, in greater detail in this article by M.A. Rogers, and in great detail in this 2013 publication from the IJPCBS.

b) Absolute risk reduction (ARR) = (AR in treatment group - AR in control group)

In this trial, the ARR = (32% - 26%) = 6%

c) The relative risk reduction (RRR) = (ARR / control group AR)

In this trial, RRR = (0.06 / 0.32) = 18.75%

d) The Numbers Needed to Treat (NNT) = (1/ARR),

In this trial, NNT = (1 / 0.06) = 16.6

References

 

Sedgwick, Philip. "Phases of clinical trials." BMJ 343 (2011).

 

Rogers, M. A. "What are the phases of intervention research." Access Academics and Research (2009).

 

Rohilla, Ankur, D. Sharma, and R. Keshari. "Phases of clinical trials: a review."IJPCBS 3 (2013): 700-3.

Question 29.2 - 2008, Paper 1

You have been approached by a company which has developed a new biomarker of sepsis. They would like it tested in a cohort of critically ill septic patients. You test this biomarker in a cohort of 100 patients with proven bacteremia. You also test this biomarker in a cohort of 100 patients with drug overdose whom you use as a control. In the bacteremic group 70 patients had abnormal biomarker results. In the control group 60 patients had an abnormal biomarker results.

Calculate

a) Sensitivity

b) specificity

c) Positive predictive value

d) Negative predictive value

College Answer

Values below expressed as a percentage .


a)

70/100

b)

40/100

c)

70/130

d)

40/70

Discussion

This question is identical to Question 19.1 from the first paper of 2010. However, the college changed the numbers a little, and made the question about pancreatic necrosis.

Going though the motions,

true positives = 70

false positives = 60

true negatives = 40

false negatives = 30

a) Sensitivity = True positives / ( true positives + false negatives)

= 70 / (70 + 30) = 70%

b) Specificity = True negatives / (true negatives + false positives)

= 40 / (40 + 60) = 40%

c) Positive predictive value = True positives / (true positives + false positives)

= 70 / (70 + 60) = 53.8%

d) Negative predictive value = True negatives / (true negatives + false negatives)

= 40 / (40 + 30) = 57.1%

References

Question 23 - 2008, Paper 2

In the context of a randomised control trial comparing a trial drug with placebo:

a)  briefly explain the following terms:

  • Type 1 error
  • Type 2 error
  • Study power
  • Effect size

b)  List the factors that influence sample size.

College Answer

Type 1 error
The null hypothesis is incorrectly rejected. Type 1 errors may result in the implementation of therapy that is in fact ineffective or a false positive test result.

Type 2 error
The null hypothesis is incorrectly accepted. Type 2 errors may result in rejection of effective treatment strategies or a false negative test result.

Study power
Power is equal to 1-β. Thus if β = 0.2, the power is 0.8 and the study has 80% probability of detecting a difference if one exists

Effect size
Effect size (∆) is the clinically significant difference the investigator wants to detect between the study groups. This is arbitrary but needs to be reasonable and accepted by peers. It is harder to detect a small difference than a large difference. The effect size helps us to know whether the difference observed is a difference that matters.

Factors influencing  sample size
•    Selected values for significance level, α, power β and effect size ∆ (smaller values mean larger sample size)
•    Variance /SD in the underlying population (larger variance means larger sample size)

Discussion

The college presents a concise and effective answer to this question, which should serve as a model. Below is a non-model answer overgrown with the unnecessary fat of references and digressions.

a)

Type 1 error: The incorrect rejection of a null hypothesis.

  • A false positive study.
  • Finding a treatment effect where there actually is none.
  • Results in the implementation of an ineffective treatment.

Type 2 error: the incorrect rejection of the alternative hypothesis.

  • A false negative study.
  • Finding no treatment effect, when there actually is one.
  • Results in an effective treatment being wrongly discarded.

Study power: The probability that the study correctly rejects the null hypothesis, when the null hypothesis is false.

  • Expressed as (1-β), where β is the probability of Type 2 error (i.e. the probability of incorrectly accepting the null hypothesis).
  • Generally, the power of a study is agreed to be 80% (i.e. = 0.2), because anything less would incur too great a risk of Type 2 error, and anything more would be prohibitively expensive in terms of sample size.

Effect size: a quantitative reflection of the magnitude of a phenomenon; in this case, the magnitude of the positive effects of a drug on the study population.

  • In this case, it is the difference in the incidence of an arbitrarily defined outcome between the treatment group and the placebo group.
  • Effect size suggests the clinical relevance of an outcome
  • The effect size is agreed upon a priori so that a sample size can be calculated (as the study needs to be powered appropriately to detect a given effect size)

Factors which influence sample size:

There is a good article on this in Radiology (2003)

  • Alpha value: the level of significance (normally 0.05)
  • Beta-value: the probability of incorrectly accepting the null hypothesis (normally 0.2)
  • The statistical test one plans to use
  • The variance of the population (the greater the variance, the larger the sample size)
  • Estimated measurement variability (similar to population variance)
  • The effect size (the smaller the effect size, the larger the required sample)

References

There is an online Handbook of Biological Statistics which has an excellent overview of power analysis.

Kelley, Ken, and Kristopher J. Preacher. "On effect size." Psychological methods 17.2 (2012): 137.

Moher, David, Corinne S. Dulberg, and George A. Wells. "Statistical power, sample size, and their reporting in randomized controlled trials." Jama 272.2 (1994): 122-124.

Cohen, Jacob. "A power primer." Psychological bulletin 112.1 (1992): 155.

Dupont, William D., and Walton D. Plummer Jr. "Power and sample size calculations: a review and computer program." Controlled clinical trials 11.2 (1990): 116-128.

Eng, John. "Sample Size Estimation: How Many Individuals Should Be Studied? 1." Radiology 227.2 (2003): 309-313.

Question 10 - 2009, paper 1

Inspect the data representation shown below.

10.1. What form of data representation is depicted here?


10.2. With respect to the study plots what is represented by:

  • The horizontal lines?
  • The position of the square?
  • The size of the square?

10.3. From the data depicted what could be inferred with regard to the effectiveness of the treatment under investigation?

10.4. What further information relating to the performance of this analysis would you require in order to gauge the accuracy of the conclusions?

College Answer

10.1. What form of data representation is depicted here?

Forest Plot or Meta Analysis Graph

10.2. With respect to the study plots what is represented by: The horizontal lines?
The position of the square? The size of the square?

The position of the square and the horizontal line indicate the point estimate and the 95%
confidence intervals of the odds ratio respectively. The size of the square indicates the weight of the study.

10.3. From the data depicted what could be inferred with regard to the effectiveness of the treatment under investigation?

The depicted data suggest the treatment is not more effective than control as the 95% confidence limits of the combined odds ratio cross the vertical line.

10.4. What further information relating to the performance of this analysis would you require in order to gauge the accuracy of the conclusions?

Definition of inclusion criteria for studies
Adequate search protocol
Assessment of methodological quality

Measurement of heterogeneity

Assessment of publication bias

Discussion

This topic is explored in LITFL, where they call it a "forrest plot", perhaps out of respect for Pat Forrest. This is substantially better than Wikipedia, where this form of data representation is referred to as a blobbogram. The example LITFL use for their explanation is derived from the college question.

Anyway. The college answer is correct but very brief, and probably represents something like the "passing grade" for this 10-mark question. With that in mind, and free from the need to be concise, one can launch into an exhaustingly verbose dissection of this question.

10.1 - This is a forest plot. It represents the results of a meta-analysis of studies.

10.2 - The standards for labelling and graphical representation are well summarised by this Cochrane document (however, it appears that careful adherence to standards is no defence against the absence of useful content).

  • The horizontal lines: the confidence interval of the individual study
  • The position of the square: a point estimate of the odds ratio (OR)
  • The size of the square: the weight of the study according to the weighing rules of the meta-analysis, likely representing the sample size and statistical power. This is a powerful tool of psychological manipulation. A paper by a couple of psychiatrists dissected this practice, and suggested that a failure to use square size to identify study weight "may result in unnecessary attention being attracted to those smaller studies with wider confidence intervals that put more ink on the page (or more pixels on the screen)".

10.3 - From the forest plot, one can infer that though statistically there is a trend towards a positive treatment effect, it still does not achieve statistical significance because the range of the 95% confidence interval for their odds ratio crosses the vertical line (the vertical line being an OR of 1.0, which means "no association"). Thus, on the basis of this meta-analysis one would be forced to conclude that the treatment has no effect.

10.4 - "What further information relating to the performance of this analysis would you require in order to gauge the accuracy of the conclusions?" This is a thinly veiled question about the assessment of the validity of a meta-analysis. The college answer demonstrates this in the points they used. In that context, one would theoretically be interested in every aspect of the analysis.

Generic points in the assessment of validity of a meta-analysis include the following:

  • Research questions are clearly defined.
  • Definition of inclusion criteria for studies is clear.
  • Search protocol is adequate.
  • Methodological quality of the included studies is rigorously assessed, and the assessment method is transparent.
  • A pooled estimate is calculated, and the calculation is transparent.
  • A graphical representation of the results is available (Forest plot).
  • A measurement of heterogeneity is carried out, with appropriate corrections for heterogeneity (eg. use of fixed-effects or random-effects analysis
  • An assessment of publication bias is attempted (Funnel Plot)

If one were to only consider the presented graph, one would be more likely to respond with relevant questions for the meta-analysis authors.

  • Inclusion and exclusion criteria. Study 2 is a massive outlier; it would be interesting to learn why it was included, and whether other excluded studies had similar characteristics. Potentially, the exclusion of this study would shift the overall OR off the vertical line.
  • Assessment of methodological quality. Again - if the methodology of Study 2 was called into question and it were excluded, this meta-analysis would reach substantially different conclusions. It would be important to learn how the authors of the meta-analysis evaluated its methodology, and whether they were correct to include this study.
  • Search strategy and attempts to detect publication bias. There are only 4 studies in the meta-analysis. The addition of another 1 or 2 studies may have a significant impact on the overall OR. If the search strategy was somehow inadequate, studies which might meet the inclusion criteria may have been missed.
  • Dealing with heterogeneity. This is important, because there is substantial heterogeneity (again I point to Study 2). Excluding studies simply because they do not agree with the majoritydefeats the purpose of the meta-analysis, but it is important to correct for heterogeneity-inducing differences between trials. This can be done with the use of a random-effects model, which uses a "heterogeneity parameter" as a coefficient to downgrade the precision and weighing of each individual study's effect estimate. This model assumes that in each study the intervention had a different effect, and views each study as a random sample from a hypothetical population of similar studies. The effect of this on the forest plot may not be magical; it merely distributes the weighting (usually giving more weight to smaller studies and less to large ones; Cochrane's handbook suggests that this is because "small studies are more informative for learning about the distribution of effects across studies than for learning about an assumed common intervention effect"). Having used such a heterogeneity correction technique, one can be more confident that the resulting summed OR is not damaged by the inclusion of a garbage study. However, the use of a random-effects model can exacerbate publication bias if the results of smaller studies are systematically different from results of larger ones (eg. small studies and independent and find no treatment effect, but large studies are funded by Big Pharma and find a treatment effect where there is none). Cochranerecommends meta-analysis authors compare the results of a fixed-effects model and random-effects model analysis to see whether the smaller studies have a significant effect on the effect size.

References

Schriger, David L., et al. "Forest plots in reports of systematic reviews: a cross-sectional study reviewing current practice." International journal of epidemiology39.2 (2010): 421-429.

Lewis, Steff, and Mike Clarke. "Forest plots: trying to see the wood and the trees." Bmj 322.7300 (2001): 1479-1480.

Anzures‐Cabrera, Judith, and Julian Higgins. "Graphical displays for meta‐analysis: An overview with suggestions for practice." Research Synthesis Methods 1.1 (2010): 66-80.

Cochrane: "Considerations and recommendations for 
figures in Cochrane reviews: graphs of statistical data"
 4 December 2003 (updated 27 February 2008)

Reade, Michael C., et al. "Bench-to-bedside review: Avoiding pitfalls in critical care meta-analysis–funnel plots, risk estimates, types of heterogeneity, baseline risk and the ecologic fallacy." Critical Care 12.4 (2008): 220.

DerSimonian, Rebecca, and Nan Laird. "Meta-analysis in clinical trials."Controlled clinical trials 7.3 (1986): 177-188.

Biggerstaff, B. J., and R. L. Tweedie. "Incorporating variability in estimates of heterogeneity in the random effects model in meta-analysis." Statistics in medicine 16.7 (1997): 753-768.

The Cochrane Handbook: 9.5.4 "Incorporating heterogeneity into random-effects model"

 

Question 24 - 2009, Paper 2

What is a receiver operating characteristic plot (ROC curve) as applied to a diagnostic test? What are its advantages?

College Answer

An ROC plot is a graphical representation of sensitivity vs. 1- specificity for all the observed data values for a given diagnostic test.

Advantages:
•    Simple and graphical
•    Represents accuracy over the entire range of the test
•    It is independent of prevalence
•    Tests may be compared on the same scale
•    Allows comparison of accuracy between several tests.

How it may be used:
•    Can give a visual assessment of test accuracy
•    May be used to generate decision thresholds or “cut off” values
•    Can be used to generate confidence intervals for sensitivity and specificity and likelihood ratios.

Discussion

In this LITFL article, ROC curves are discussed in detail, but without apocryphal gibberish.

If one were to restrict oneself to what is manageable within a 10-minute timeframe while mentioning all the important points, one would produce an asnwer which resembles the following:

  • The ROC curve is a plot of sensitivity versus false positive rate (1-specificity) for all observed values of a diagnostic test.
  • It is a graphical representation of a tests' diagnostic accuracy
  • It allows the comparison of accuracy between tests
  • It allows the determination of cutoff values
  • It can be used to generate confidence intervals for sensitivity and specificity and likelihood ratios.

Advantages:

  • Simple and graphical
  • Independent of prevalence
  • Allows comparison between tests, on the same scale

That, of course, is the bare bones of the answer. If one were to succumb to basic human urges, one would produce an answer which resembles the following:

  • The ROC curve is a plot of sensitivity vs. false positive rate, for a range of diagnostic test results.
  • Sensitivity is on the y-axis, from 0% to 100%
  • The ROC curve graphically represents the compromise between sensitivity and specificity in tests which produce results on a numerical scale, rather than binary (positive vs. negative results)
  • ROC analysis can be used for diagnostic tests with outcomes measured on ordinal, interval or ratio scales.
  • The ROC curve can be used to determine the cut off point at which the sensitivity and specificity are optimal.
  • All possible combinations of sensitivity and specificity that can be achieved by changing the test's cutoff value can be summarised using a single parameter, the area under the ROC curve (AUC).
    • The higher the AUC, the more accurate the test
    • An AUC of 1.0 means the test is 100% accurate
    • An AUC of 0.5 (50%) means the ROC curve is a a straight diagonal line, which represents the "ideal bad test", one which is only ever accurate by pure chance.
  • When comparing two tests, the more accurate test is the one with an ROC curve further to the top left corner of the graph, with a higher AUC.
  • The best cutoff point for a test (which separates positive from negative values) is the point on the ROC curve which is closest to the top left corner of the graph.
  • The cutoff values can be selected according to whether one wants more sensitivity or more specificity.

Advantages of the ROC curves:

  • A simple graphical representation of the diagnostic accuracy of a test: the closer the apex of the curve toward the upper left corner, the greater the discriminatory ability of the test. 
  • Allows a simple graphical comparison between diagnostic tests
  • Allows a simple method of determining the optimal cutoff values, based on what the practitioner thinks is a clinically appropriate (and diagnostically valuable) trade-off between sensitivity and false positive rate.
  • Also, allows a more complex (and more exact) measure of the accuracy of a test, which is the AUC
    • The AUC in turn can be used as a simple numeric rating of diagnostic test accuracy, which simplifies comparison between diagnostic tests.
 

References

Bewick, Viv, Liz Cheek, and Jonathan Ball. "Statistics review 13: receiver operating characteristic curves." Critical care 8.6 (2004): 508.

Sedgwick, Philip. "Receiver operating characteristic curves." BMJ 343 (2011).- rather than an article, this is more of a "self-directed learning" question with an elaborate explanatory answer.

Fan, Jerome, Suneel Upadhye, and Andrew Worster. "Understanding receiver operating characteristic (ROC) curves." Cjem 8.1 (2006): 19-20.

Akobeng, Anthony K. "Understanding diagnostic tests 3: receiver operating characteristic curves." Acta Paediatrica 96.5 (2007): 644-647.

Ling, Charles X., Jin Huang, and Harry Zhang. "AUC: a statistically consistent and more discriminating measure than accuracy." IJCAI. Vol. 3. 2003.

Greiner, M., D. Pfeiffer, and R. D. Smith. "Principles and practical application of the receiver-operating characteristic analysis for diagnostic tests." Preventive veterinary medicine 45.1 (2000): 23-41.

 

Question 19.1 - 2010, Paper 1

To evaluate a new biomarker as an early index of infected pancreatic necrosis, you perform the measurement in a consecutive series of 200 critically ill patients with pancreatitis. You find that 100 of these patients had subsequently proven necrosis. Of these, 60 had a positive biomarker result. Of the remaining 100 patients without necrosis, 35 had a positive biomarker result.

Using the above data, show how you would calculate

a)  Sensitivity

b)  Specificity

c)  Positive predictive value

d)  Negative predictive value

College Answer

a)  Sensitivity = (TP/ {TP + FN}) = 60/100

b)  Specificity = (TN/{TN + FP}) = 65/100

c)  Positive predictive value = (TP/{TP+FP})  = 60/95

d)  Negative predictive value = (TN({TN+FN}) = 65/105

Discussion

Its not easy to overdo this discussion, given that the premise of this question rests in basic arithmetic. Given that the question is essentially maths, it is difficult to produce a "model answer" which is somehow an improvement on the already correct college answer (the only possible correct answer)

However, many people (myself included) are biologically unsuited to memorising equations. For this reason, a short list of equations to memorise has been compiled. Perhaps that's an improvement.

Thus, for this biomarker, we have the following spread of data:

  • 60 true positives
  • 35 false positives
  • 65 true negatives
  • 40 false negatives

a) Sensitivity: True positives / (true positives + false negatives)

= 60 / (60 + 40) = 60%

b) Specificity: True negatives / (true negatives + false positives)

= 65 / (65 + 35) = 65%

c) Positive predictive value: True positives / total positives

= 60 / (60 + 35) = 63%

d) Negative predictive value: True negatives / total negatives

= 65 / (65 + 40) = 62%

References

Question 19.2 - 2010, Paper 1

A randomized controlled clinical trial was performed to evaluate the effect of a new hormone called Rejuvenon on mortality in septic shock.  3400 patients with septic shock were studied (1700 placebo and 1700 in the Rejuvenon arms). The mortality rates in the placebo and the treatment arms were 30% and 25% respectively.

Calculate:

(a)        The absolute risk reduction

(b)        The relative risk reduction

(c)        The number needed to treat

College Answer

Using the above data, show how you would calculate:

a) The absolute risk reduction

b) The relative risk reduction

c) The number needed to treat

ARR = 5%

RRR = 5/30*100 =16.6%

NNT =1/0.05 =20

Discussion

This question also relies on the candidate's ability to memorise equations.

Here is a helpful list of equations the  candidate is expected to memorise.

a) ARR = (risk in control group - risk in treatment group)

= 30% - 25%

= 5%

b) RRR = (ARR / control group AR)

= 0.05 / 0.3

= 0.166, or 16.6%

c) Numbers needed to treat (NNT) = ( 1/ ARR)

= 1 / 0.05

= 20.

References

Question 9 - 2010, Paper 2

In   the   context   of   clinical   trials,  define the following terms:

a)  Relative risk

b)  Absolute risk

c)  Number needed to treat

d)  Power of the study

College Answer

A number of potential definitions exist. One example for each is listed below:

Relative risk: the difference in event rates between 2 groups expressed as proportion of the event rate in the untreated group.

Absolute risk: this is the actual event rate in the treatment or the placebo group.

Number Needed to Treat: The NNT is the number of patients to whom a clinician would need to administer a particular treatment for 1 patient to receive benefit from it. It is calculated as 100 divided by the absolute risk reduction when expressed as a percentage or
1 divided by the absolute risk reduction when expressed as a proportion.

Power of the study: The probability that a study will produce a significant difference at a given significance level is called the power of the study. It will depend on the difference between the populations compared, the sample size and the significance level chosen.

Discussion

Some of this ground is covered in Question 23 from the second paper of 2011. It also asks about risk ratio and NNT.

Here is a link to my summary of basic terms in EBM.

Risk ratio: risk in treatment group / risk in control or placebo group

Absolute risk: Risk of event in a group (any group). Essentially, it is the incidence rate.

NNT: Numbers needed to treat; 1/ absolute risk reduction.

Power of a study: The power of a statistical test is the probability that it correctly rejects the null hypothesis, when the null hypothesis is false. This is the chance that a study is able to discern a treatment effect, if there is an actual treatment effect. It is influenced by the level of statistical significance one expects, the sample size, the variance within the studied population, and the magnitude of the effect size.

References

Cohen, Jacob. "Statistical power analysis." Current directions in psychological science 1.3 (1992): 98-101.

 

Cook, Richard J., and David L. Sackett. "The number needed to treat: a clinically useful measure of treatment effect." Bmj 310.6977 (1995): 452-454.

 

Viera, Anthony J. "Odds ratios and risk ratios: what's the difference and why does it matter?." Southern medical journal 101.7 (2008): 730-734.

 

Malenka, David J., et al. "The framing effect of relative and absolute risk."Journal of General Internal Medicine 8.10 (1993): 543-548.

 

Gail, Mitchell H., and Ruth M. Pfeiffer. "On criteria for evaluating models of absolute risk." Biostatistics 6.2 (2005): 227-239.

 

Question 29 - 2011, Paper 1

With reference to a randomized controlled trial, briefly describe the terms “blinding” and “allocation concealment”.

College Answer

•    Blinding  and  allocation  concealment  are methods  used to reduce  bias in clinical trials.
•    Blinding: a process by which trial participants and their relatives, care-givers,  data collectors and those adjudicating outcomes are unaware of which treatment is being given to the individual participants.

-      Prevents   clinicians   from   consciously   or   subconsciously   treating   patients differently based on treatment allocation
-      Prevents  data  collectors  from  introducing  bias  when  there  is  a  subjective assessment to be made for eg “pain score”
-      Prevents outcome  assessors  from introducing  bias when there is a subjective outcome assessment to be made for eg Glasgow outcome score.

•    Traditionally, blinded RCTs have been classified as "single-blind," "double-blind," or "triple-blind";  The  2010  CONSORT  Statement  specifies  that  authors  and  editors should not use the terms "single-blind," "double-blind," and "triple-blind"; instead, reports of blinded RCT should discuss "If done, who was blinded after assignment to interventions  (for example,  participants,  care providers,  those assessing outcomes) and how.

Allocation concealment is an important component of the randomization process and refers to  the  concealment   of  the  allocation  of  the  randomization   sequence  from  both  the investigators  and  the  patient.      Poor  allocation  concealment  may  potential  exaggerate treatment effects.
Methods used for allocation concealment include sealed envelope technique, telephone or web based randomization.

Allocation concealment effectively ensures that the treatment to be allocated is not known before the patient is entered into the study. Blinding ensures that the patient / physician is blinded to the treatment allocation after enrollment into the study.

Discussion

The question is a 10-mark question, but it for some reason asks for one to "briefly describe" these concepts. Judging from the college answer, a truly brief description was not the expected response.

LITFL has a thorough summary, which is not brief.

If one were to briefly describe these concepts, one would produce something like this:

Allocation concealment:

  • Ensures that the patients and investigators cannot predict which treatment will be allocated to which patient before they are enrolled in the study.
  • Prevents selection bias

Blinding:

  • Ensures that the patients and investigators remain unaware of which treatment is being administered to which idividual patient.
  • Prevents detection bias and observer bias

And if one were to go to town on this topic, one would produce something like this:

Allocation concealment

  • Allocation concealment is the technique of reducing bias by preventing the prediction of treatment allocation before the allocation is completed.
  • Thus, both the investigators and the patients cannot predict which patient will be selected for the treatment group and which patient will be selected for the control or placebo group
  • This ensures that no selection bias influences the treatment allocation; i.e. patients are allocated randomly, which avoids the problem of investigators choosing most suitable candidates for the treatment group.
  • Allocation concelament can be performed either with sealed envelopes, web-based randomisation, or random number generation.
  • Alternative techniques also exist; for instance one can randomise patients according to the day of the week of their presentationto hospital, or according to the odd or even date of the calendar month. The allocation is not concealed, but the investigators still have little control over the allocation.

Blinding:

  • Blinding is the technique of reducing bias by concealing the allocation of the treatment and control groups from either the patients, the investigators, the statistical analysts, or everyone involved.
  • Reduces bias by preventing anybody from knowing which patient is receiving which treatment, and thus decreasing the likelihood that a particular group will receive preferential treatment, that a particular group will be assessed differently, or that a particular group will develop expectations of their treatment,
  • Reduces detection bias by blinding the investigators
  • Reduces observer bias by blinding the observers
  • Reduces recall bias by blinding the patients
  • The exact method of the blinding should be transparently reported (as per he CONSORT statement). Thus, the reader of the article should be able to immediately discern who was blinded and how.

References

Schulz, Kenneth F., and David A. Grimes. "Allocation concealment in randomised trials: defending against deciphering." The Lancet 359.9306 (2002): 614-618.

Forder, Peta M., Val J. Gebski, and Anthony C. Keech. "Allocation concealment and blinding: when ignorance is bliss." Med J Aust 182.2 (2005): 87-9.

Schulz, Kenneth F. "Assessing allocation concealment and blinding in randomised controlled trials: why bother?." Evidence Based Mental Health 3.1 (2000): 4-5.

 

Question 23 - 2011, Paper 2

In the context of statistical analysis of randomised controlled trials, explain the following terms:
a) Risk ratio
b) Number needed to treat
c) P-value
d) Confidence intervals

College Answer

a) Risk ratio
A risk ratio is simply a ratio of risk, for example, [risk of mortality in the intervention group] / [risk of mortality in the control group].
It indicates the relative likelihood or experiencing the outcome if the patient received the intervention compared with the outcome if they received the control therapy.

b) Odds ratio
Odds ratio is the odds of an event occurring in one group to the odds of it occurring in another


c) Number needed to treat (NNT)
Number of patients that need to be treated for one patient to benefit compared with a control not receiving the treatment

1/(Absolute Risk Reduction)

Used to measure the effectiveness of a health-care intervention, the higher the NNT the less effective the treatment

d) P-value
A p-value indicates the probability that the observed result or something more extreme occurred by chance. It might be referred to as the probability that the null hypothesis has been rejected when it is true.


e) Confidence intervals
The confidence intervals indicate the level of certainty that the true value for the parameter of interest lies between the reported limits.
For example:
The 95% confidence intervals for a value indicate a range where, with repeated sampling and analysis, these intervals would include the true value 95% of the time

Discussion

This is a straighforward question about the definitions of basic everyday statistics terms.

Judging by the relatively high pass rate, over two thirds of us already have a fair grasp of this.

Additionally, please note the model answer to the odds ratio question. Clearly we are not expected to demonstrate a genius-level understanding of these concepts. In fact, there is no odds ratio mentioned in the college question, and the very existance of it is inferred from the fact that there is an odds ratio answer.

Anyway, it never hurts to revise the basics.

Here is a link to my summary of basic terms in EBM.

In brief:

Risk ratio: risk in treatment group / risk in control or placebo group

Odds ratio: The odds of an outcome in one group / odds of that outcome in another group.

NNT: Numbers needed to treat; 1/ absolute risk reduction.

p-value in a research study is the probability of obtaining the same (or more extreme) study result assuming that the null hypothesis was true. It is the probability that the null hypothesis was incorrectly rejected. As a single-value assessment of error rate, the p-value has its opponents.

Confidence interval: CI gives a range of results and the percentage chance that the same experimental design would produce results within this range if the experiment were repeated. Thus, a CI of 95% means that in 95% of repeated experiments the results would fall within the specified range.

The CI is a pain in the arse to calculate for the mathematic-averse Homo vulgaris. A good impression of the difficulty involved can form if one reads one of these two BMJ articles.

 

References

Viera, Anthony J. "Odds ratios and risk ratios: what's the difference and why does it matter?." Southern medical journal 101.7 (2008): 730-734.

Szumilas, Magdalena. "Explaining odds ratios." Journal of the Canadian Academy of Child and Adolescent Psychiatry 19.3 (2010): 227.

Cook, Richard J., and David L. Sackett. "The number needed to treat: a clinically useful measure of treatment effect." Bmj 310.6977 (1995): 452-454.

Goodman, Steven N. "Toward evidence-based medical statistics. 1: The P value fallacy." Annals of internal medicine 130.12 (1999): 995-1004.

Morris, Julie A., and Martin J. Gardner. "Statistics in Medicine: Calculating confidence intervals for relative risks (odds ratios) and standardised ratios and rates." British medical journal (Clinical research ed.) 296.6632 (1988): 1313.

Campbell, Michael J., and Martin J. Gardner. "Statistics in Medicine: Calculating confidence intervals for some non-parametric analyses." British medical journal (Clinical research ed.) 296.6634 (1988): 1454.

Question 17 - 2012, Paper 1

  • Briefly explain what is meant by “Evidence Based Medicine”?
  • Give a classification for the levels of evidence used for therapeutic studies in EBM.
  • Explain what is meant by the term “intention to treat analysis”

College Answer

a) EBM

Evidence-based medicine is the process of systematically reviewing, appraising and using clinical research findings to aid the delivery of optimum clinical care to patients

It involves considering research and other forms of evidence on a routine basis when making healthcare decisions. Such decisions include the clinical decisions about choice of treatment, test, or risk management for individual patients, as well as policy decisions for groups and populations.

b) Levels of evidence

(Any recognised system acceptable)

  • Level I - High-quality, multicentre or single-centre randomized controlled trial with adequate power; or systematic review of these studies
  • Level II - Lesser quality, randomized controlled trial; prospective cohort study; or systematic review of these studies
  • Level III - Retrospective comparative study; case-control study; or systematic review of these studies
  • Level IV - Case series
  • Level V - Expert opinion; case report or clinical example; or evidence based on physiology, bench research.

Level

Therapy/Prevention,  Aetiology/Harm

1a

Systematic review (with homogeneity) of RCTs

1b

Individual RCT (with narrow Confidence Interval)

1c

All or none (ie all patients died before the Rx became available, but some now survive on it; or when some patients died before the Rx became available, but none now die on it)

2a

Systematic review (with homogeneity ) of cohort studies

2b

Individual cohort study (including low quality RCT; e.g., <80% follow-up)

2c

"Outcomes" Research or ecologic studies (studies of group chics)

3a

Systematic review (with homogeneity) of case-control studies

3b

Individual Case-Control Study

4

Case-series (and poor quality cohort and case-control studies )

5

Expert opinion or based on physiology, bench research or "first principles"

Level

I

Evidence from a systematic review of all relevant randomised controlled trials

II

Evidence from at least one properly designed randomised controlled trial

III

III.1 Evidence from well-designed pseudo-randomised controlled trials

III.2 Evidence obtained from comparative studies with concurrent controls and allocation not randomised (cohort studies) or case control studies

III.3 Evidence obtained from comparative studies with historical controls

IV

Evidence from case series, opinions of respected authorities, descriptive studies, reports of expert (i.e. consensus) committees, case studies.

c) Intention to treat analysis

Analysis based on the initial treatment intent not the treatment eventually administered. Everyone who begins treatment is considered to be part of the trial whether he/she completes the trial or not. ITT analysis avoids the effects of crossover and drop-out

Discussion

Evidence based medicine is the system of critical evaluation of published data for applicability to the management of individual patients. David Sackett, a great pioneer of EBM, came up with a definition which seems to be frequently quoted, and therefore probably meets with the approval of the CICM examiners:

"Evidence based medicine is the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients. The practice of evidence based medicine means integrating individual clinical expertise with the best available external clinical evidence from systematic research."

As for levels of evidence, we have several systems to choose from. Here are a couple:

Oxford centre for evidence based medicine:

  • Levels:
    • I - systemic review of all relevant RCTs
    • II - Randomized trial or observational study with dramatic effect
    • III - Non-randomized controlled cohort/follow-up study
    • IV - Case-series, case-control studies, or historically controlled studies
    • V – mechanism-based reasoning (expert opinion, based on physiology, animal or laboratory studies)
  • Grades:
    • A – consistent level 1 studies
    • B – consistent level 2 or 3 studies or extrapolations from level 1 studies
    • C – level 4 studies or extrapolations from level 2 or 3 studies
    • D – level 5 evidence or troubling inconsistent or inconclusive studies of any level

NHMRC levels:

  • Level I: systematic review of RCTs
  • Level II: RCT
  • Level III-1: pseudorandomised trial of high quality
  • Level III-2: cohort studies or case control studies - but with a control group
  • Level III-3: cohort studies with historical controls, or no control group
  • Level IV: case series

Intention to treat analysis:

This is the practice of preserving the bias-controlling benefits of randomisation by performing analysis of all patients according to which group they were randomised to, rather than according to which treatment they actually received.

  • "Once randomised, always analysed"
  • All enrolled patients have to be a part of the final analysis
  • This preserves the bias-protective effect of randomisation

Advantages

  • A more reliable estimate of treatment effectiveness
  • Prevents bias
  • Minimises Type 1 errors (false positives)
  • Supported by the CONSORT statement
  • When intention-to-treat analysis agrees with per-protocol analysis, it increases the validity of the study

Disadvantages

  • Treatment effect is diluted (ends up underestimated)
  • ITT is inaccurate unless there are negligible protocol violations
  • ITT alone is inappropriate for non-inferiority trials

References

Sackett, David L. "Evidence-based medicine." Seminars in perinatology. Vol. 21. No. 1. WB Saunders, 1997.

Sackett, David L., et al. "Evidence based medicine: what it is and what it isn't."Bmj 312.7023 (1996): 71-72.

Question 8 - 2012, Paper 2

A colleague directs your attention to a recently published randomised trial on a therapeutic intervention.

Outline the features of the trial that you would lead you to change your practice.

 

College Answer

Points to consider in the answer would be:

  • Does the population studied correspond with the population the candidate expects to treat?
  • Were the inclusion/exclusion criteria appropriate?
  • Was the trial methodology appropriate – was there adequate blinding and randomisation?
  • Was the primary outcome a clinically relevant or a surrogate endpoint?
  • Was the length of follow up adequate?
  • Was the trial sufficiently powered to detect a clinically relevant effect?
  • Were the groups studied equivalent at baseline?
  • Is the statistical analysis appropriate – was there an intention to treat analysis, have differences between groups at baseline been adjusted for? Are there multiple sub group analyses, and if so were they specified a priori?
  • Is this a single centre study or multi centre?
  • Were the results clinically significant rather than just statistically significant?
  • Is the primary hypothesis biologically plausible with pre existing supporting evidence?
  • Are the findings supported by other evidence – have these results been replicated?
  • Would there be logistical and/or financial implications in practice change?
  • Are there important adverse effects of the treatment?

Discussion

This question really asks, "how do you assess an RCT for validity?"

This is addressed in greater detail elsewhere.

In brief:

Is the premise sound?

  • Is the primary hypothesis biologically plausible?
  • Is the research ethical?
  • If the results are valid, are there any disastrous logistical ethical or financial emplications to a change in practice?

Is the methodology of high quality?

  • Were the inclusion/exclusion criteria appropriate?
  • Was the assignment of patients to treatments randomised? If yes, then was it truly random?
  • Were the study groups homogenous?
  • Were the groups treated equally?
  • Are there any missing patients? Is every enrolled patient accounted for? 
  • Was follow-up complete? Is the drop-out rate explained? Do we know what happened to the dropouts?

Is the reporting of an appropriate quality?

  • Methods describtion should be complete: the trial should be reproduceable
  • Do the results have confidence intervals?
  • Results should present relative and absolute effect sizes
  • Is a CONSORT-style flow diagram of patient selection available?
  • Discussion should contain limitation, bias and imprecision
  • Funding sources and the full trial protocol should be disclosed

Are the results of the study valid?

  • Was there blinding? Was blinding even possible? Was it double-blind? If not, at least were the data interpreters and statisticians blinded?
  • Was there allocation concealment?
  • Was there intention-to-treat analysis?
  • If there were sub-groups, were they identified a priori?

What were the results?

  • How large was the treatment effect?
  • How precisely was the effect estimated? (i.e. what was the 95% confidence interval)

Is this study helpful for me?

  • Is this applicable to my patient? i.e. would my patient have been enrolled in this study?
  • Does the population studied correspond with the population to which my patient belongs?
  • Were all the clinically meaningful outcomes considered?
  • Does the benefit outweigh the cost and risk?

References

Oh's Intensive Care manual: Chapter 10 (p83), Clinical trials in critical care by Simon Finfer and Anthony Delaney.

 

The JAMA collection via the John Hopkins Medical School

CASP (Critical Appraisal Skils Program) has checklists for the appraisal of many different sorts of studies; these actually come with tickboxes. One imagines reviewers wandering around a trial headquarters, ticking these boxes on their little clipboards.

CEBM (Centre for Evidence Based Medicine) also has checklists, which (in my opinion) are more informative.

Here is a link to their checklist for the critical appraisal of an RCT.

Question 5 - 2013, paper 2

With reference to the reporting of clinical trials in the literature:
a) What is a meta-analysis?
b) What are the advantages of a meta-analysis over the interpretation of an individual study?
c) List the features of a well-conducted meta-analysis.
d) What is “publication bias” and how can this impact on the validity of a meta-analysis?
 

College Answer

a)

  • A form of systematic review that uses statistical methods to combine the results from different studies

b)

  • ↑ Statistical power by ↑ sample size.
  • Resolve uncertainty when studies disagree.
  • Improve estimates of effect size.
  • Inconsistency of results across studies can be quantified and analysed e.g. heterogeneity of studies, sampling error.
  • Presence of publication bias can be investigated.
  • Establish questions for future RCTs.
  • May provide information regarding generalisability of results.

c)

  • Clearly defined research question.
  • Thorough search strategy that makes it unlikely that significant studies have been missed.
  • Reproducible and clear criteria for inclusion in the meta-analysis.
  • Adequate and reproducible assessment of the methodological quality of the included studies.
  • Use of appropriate statistical methods to assess for heterogeneity between studies and pooling of the results of studies when appropriate.
  • Utilisation of methods to ensure that the results of the meta-analysis are reproducible; e.g. two reviewers perform aspects of the study (the search, the application of the inclusion/exclusion criteria, the assessment of validity, the data extraction).
  • Assessment for the presence of publication bias/small study bias with report of the results of these analyses.

d)

  • Publication bias is the publication or non-publication of studies depending on the direction and statistical significance of the results. A meta-analysis evaluating studies where there has been publication bias will be flawed, no matter how well conducted in other aspects.
  • Publication bias may also extend to bias of selection of studies for inclusion in a meta-analysis based on language, journal of publication, ease of access, field of research etc., (dissemination bias).

Salient points

  • Meta analysis = tool of quantitative systematic review
  • Advantages:
    • increased statistical power
    • resolves heterogeneity
    • avoids Simpson's paradox
  • A good meta-analysis has:
    • a well-structured question
    • broad search strategy
    • transparent methodology
    • attempt to exclude publication bias
    • Forest plot
    • measures of heterogeneity

Discussion

LITFL have an excellent resource for this.

a) What is a meta-analysis?

Meta-analysis is a tool of quantitative systematic review.

It is used to weigh the available evidence from RCTs and other studies based on the numbers of patients included, the effect size, and on statistical tests of agreement with other trials

b) What are the advantages of a meta-analysis over the interpretation of an individual study?

  • A more objective quantitative appraisal of evidence
  • Reduces the probability of false negative results
  • The combination of samples leads to an improvement of statistical power
  • Increased sample size may "normalise" the sample distribution and render the results more generalisable, i.e. increase the external validity of the findings
  • Increased sample size may increase the accuracy of the estimate
  • May explain heterogeneity between the results of different studies
  • Inconsistencies among trials may be quantified and analysed
  • RCT heterogeneity may be resolved
  • Publication bias may be revealed
  • Future research directions may be identified
  • Avoids Simpson’s paradox, in which a consistent effect in constituent trials is reversed when results are simply pooled.

c) List the features of a well-conducted meta-analysis.

  • Research questions clearly defined
  • Transparent search strategy
  • Thorough search protocol
  • Authors contacted and unpublished data collected
  • Definition of inclusion and exclusion criteria for studies
  • Sensible exclusion and inclusion criteria
  • Assessment of methodological quality of the included studies
  • Transparent methodology of assessment
  • Calculation of a pooled estimate
  • Plot of the results (Forest Plot)
  • Measurement of heterogeneity
  • Assessment of publication bias (Funnel Plot)
  • Reproduceable meta-analysis strategy (eg. multiple reviewers perform the same meta-analysis, according to the same methods)

d) What is “publication bias” and how can this impact on the validity of a meta-analysis?

  • Publication bias is the influence of study results on the likelihood of their publication
  • A funnel plot can be used to identify publication bias.
  • A meta-analysis can be invalidated if publication bias has influenced the included studies.
  • Publication bias leads to the selection of mostly positive (or mostly negative) studies, which in turn leads to positive meta-analysis results. Studies with the opposite effect may not have been selected for publication, and may not be available to the meta-analysis authors.
  • Meta-analysis authors may develop an inherent publication bias by only using English-language studies, only free-access articles, or only focusing their search within a narrow field of research.
  • Publication bias can be overcome by contacting relevant authors and requesting unpublished trial data, by searching for publications in all languages, and by searching broadly in multiple cross-specialty databases.

References

DerSimonian, Rebecca, and Nan Laird. "Meta-analysis in clinical trials."Controlled clinical trials 7.3 (1986): 177-188.

Rockette, H. E., and C. K. Redmond. "Limitations and advantages of meta-analysis in clinical trials." Cancer Clinical Trials. Springer Berlin Heidelberg, 1988. 99-104.

Walker, Esteban, Adrian V. Hernandez, and Michael W. Kattan. "Meta-analysis: Its strengths and limitations." Cleveland Clinic Journal of Medicine75.6 (2008): 431-439.

Methodological Expectations of Cochrane Intervention Reviews

Question 26 - 2014, Paper 1

With reference to clinical studies: 

a) Define the term "external validity". 

b) Define the term "bias". 

c) Briefly explain selection bias and measures to reduce it.

ollege Answer

a) External validity is the extent to which the results of a study can be generalised to other 
situations, e.g. different case-mix 

b) Bias in statistics is defined as systematic distortion of the observed result away from the 
"truth", caused by inadequacies in the design, conduct, or analysis of a trial. 

c) Selection bias is caused by a systematic error in creating intervention groups, such that 
they differ with respect to prognosis. The study groups differ in measured or unmeasured 
baseline characteristics because of the way participants were selected or assigned. 
Selection bias also means that the study population does not reflect a representative 
sample of the target population. Selection bias undermines the external validity of the 
study and the conclusions drawn by the study should not be extended to other patients. 

Measures to reduce selection bias include: 
 Randomisation: Randomisation assigns patients to treatment arms by chance, 
avoiding any systematic imbalance in characteristics between patients receiving 
experimental versus the control intervention. 
 Allocation concealment: The allocation sequence is the order in which participants are 
to be allocated to treatment. Allocation concealment involves not disclosing to patients 
and those involved in recruiting trial participants, the allocation sequence before 
random allocation occurs.

Discussion

External validity: the extent to which the study results can be generalised to the greater population, which is influenced by a vast array of factors:

  • The setting and the population from which the sample was selected
  • The inclusion and exclusion criteria
  • The "randomness" of the sample, and the baseline chacteristics of the patients
  • The difference between the trial control group and the routine practice
  • The changes in practice since the publication of the trial
  • The use of patient centered outcomes
  • The degree to which the surrogate outcome measures are related to patient-centered outcomes

Bias: a systematic error which distorts study findings

  • It is caused by flaws in study design, data collection or analysis
  • It is not altered by sample size (increasing sample size only decreases random variations and the influence of chance)
  • It can creep in at any stage in research, from the literature search to publishing of the results.

Selection bias: The selection of specific patients which results in a sample group which is not random, and which is not representative of a population. This can be avoided by randomisation, blinding and by allocation concealment.

The college answer actually comes from the CONSORT Statement glossary:

"Systematic error in creating intervention groups, such that they differ with respect to prognosis. That is, the groups differ in measured or unmeasured baseline characteristics because of the way participants were selected or assigned. Also used to mean that the participants are not representative of the population of all possible participants."

References

Higgins, Julian PT, and Sally Green, eds. Cochrane handbook for systematic reviews of interventions. Vol. 5. Chichester: Wiley-Blackwell, 2008.

Question 13 - 2014, paper 2

a) With respect to meta-analysis of randomised controlled trials, what is a funnel plot?

b) In the funnel plot above:

i. What do the outer dashed lines indicate?

ii. To what does the solid vertical line correspond?

c) List three factors that result in asymmetry in funnel plots.

College Answer

a) A funnel plot is a scatter plot of the effect estimates from individual studies against some measure of each study’s size or precision. The standard error of the effect estimate is often chosen as the measure of study size and plotted on the vertical axis with a reversed scale that places the larger, most powerful studies towards the top. The effect estimates from smaller studies should scatter more widely at the bottom, with the spread narrowing among larger studies.

b)

Outer dashed lines-triangular region where 95% of studies are expected to lie

Solid vertical line- no intervention effect

c)

i) Heterogeneity

  • Size of effect differs according to study size
  • Clinical differences
  • Methodological differences

ii) Reporting bias

  • Publication bias- delayed publication, language, citation, multiple publication bias
  • Selective outcome reporting
  • Selective analysis/inadequate analysis reporting
  • Poor design
  • Fraud

iii) Chance

It was expected that candidates regularly attending journal club would have the knowledge to answer this question but overall it was not well answered and explanation of terms was poor

Discussion

The abovedepicted plot is not the gospel plot from the CICM paper, but one which I have confabulated myself. Hopefully, it bears some resemblance to the original.

a) is answered by the college in a manner which precisely reflects the wording of the Cochrane Handbook. That is indeed " a simple scatter plot of the intervention effect estimates from individual studies against some measure of each study’s size or precision ".

b)
The lines? what do they mean? Said best by the laconic college:

  • Outer dashed lines-triangular region where 95% of studies are expected to lie. This triangle is centred on a fixed effect summary estimate, and extens 1.96 standard errors in each direction. If no bias is present, this triangle will  include about 95% of studies, provided the true treatment effect is the same in each study (i.e. none were using some sort of dodgy home-made levosimendan, for instance).
  • Solid vertical line- no intervention effect. This corresponds to an OR of 1.00.

c)

Causes of assymmetry are well summarised by Sterne et al (2011), whose Box 1 I have shamelessly stolen:

Sources of Assymmetry in Funnel Plots

Reporting biases

  • Delayed publication (also known as time lag or pipeline) bias
  • Location biases (eg, language bias, citation bias, multiple publication bias)
  • Selective outcome reporting
  • Selective analysis reporting

Poor methodological quality
i.e. smalle studies inflated the effect size

  • Poor methodological design
  • Inadequate analysis
  • Fraud
 
 

True heterogeneity

  • Size of effect differs according to study size
    (eg, in smaller studies the intervention was less intense: eg. PROSEVA trial)

Artefactual

  • In some circumstances, sampling variation can lead to an association between the intervention effect and its standard error

Chance

  • Asymmetry may occur by chance, which motivates the use of asymmetry tests

References

DerSimonian, Rebecca, and Nan Laird. "Meta-analysis in clinical trials."Controlled clinical trials 7.3 (1986): 177-188.

Rockette, H. E., and C. K. Redmond. "Limitations and advantages of meta-analysis in clinical trials." Cancer Clinical Trials. Springer Berlin Heidelberg, 1988. 99-104.

Walker, Esteban, Adrian V. Hernandez, and Michael W. Kattan. "Meta-analysis: Its strengths and limitations." Cleveland Clinic Journal of Medicine75.6 (2008): 431-439.

Methodological Expectations of Cochrane Intervention Reviews

Sterne, Jonathan AC, et al. "Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials." Bmj 343 (2011): d4002.

Question 8 - 2015, Paper 1

A systematic review of the literature was undertaken comparing proton pump inhibitors with H2-receptor blockers for the prevention of gastro-intestinal bleeding in ICU patients.

PPI-vs-H2A-bleeding--620x284

a)  Name the type of graph illustrated in the above figure. (10% marks)

b)  What does it show? (25% marks)

c)  What are the benefits of this type of analysis?  (25% marks)

d)  What are the disadvantages of this analysis? (40% marks)

College Answer

a)

Forest plot

b)

Combining the trials together, PPI use results in an odds ratio of 0.35 or reduction in the risk of bleeding compared to H2RA. Alternatively, PPI use results in 65% reduction (1- 0.35) in bleeding.

c)

Combines small studies with limited power, increasing the number and thus the ability to pick up a positive effect. Small studies with low power (due to small effect, small numbers) run the risk of a Type II error.

d)

Individual studies might have different patient populations (with different risk of bleeding) or different definitions of outcome.

Individual studies might have been conducted with different degrees of rigour (blinding, etc.)

There is publication bias to positive studies so that negative studies are not reported. Need full disclosure how the studies were selected, their scientific grading, subgroup analyses and assessment of heterogeneity.

Discussion

I have no idea whether the college actually used this exact image, but certainly the paper was correctly identified by LITFL. My hat is off to Chris Nickson, who managed to track down the exact PPI vs H2A study which had this exact forest plot and OR / RRR. It was indeed the Alhazzani study from 2013.

So:

a) and b) are actually a part of the Primary exam Syllabus, and are reviewed in greater detail in the chapter on forest and box-and-whisker plots. In short:

  • This is a forest plot.
  • The horizontal lines – confidence intervals of the OR
  • The position of the square – point estimate of the OR
  • The size of the square – the weight of the study
  • The vertical line: OR of 1 (no association)
  • If the CI of the summed results crosses the vertical line, the treatment is no more effective than control.
  • This study shows that PPIs are better than H2As in reducing the risk of bleeding.

c) and d)

Advantages of meta-analysis

  • A more objective appraisal of evidence
  • Reduces the probability of false negative results
  • May explain heterogeneity between the results of different studies
  • Avoids Simpson’s paradox, in which a consistent effect in constituent trials is reversed when results are simply pooled.

Disadvantages of meta-analysis

  • Frustrated by heterogeneity of population samples and methodologies
  • Selection of studies may be biased
  • Negative studies are rarely published, and thus may not be included
  • The meta-analysis uses summary data rather than individual data

References

Alhazzani, Waleed, et al. "Proton pump inhibitors versus histamine 2 receptor antagonists for stress ulcer prophylaxis in critically ill patients: a systematic review and meta-analysis*." Critical care medicine 41.3 (2013): 693-705.

Methodological Expectations of Cochrane Intervention Reviews

Schriger, David L., et al. "Forest plots in reports of systematic reviews: a cross-sectional study reviewing current practice." International journal of epidemiology39.2 (2010): 421-429.

Lewis, Steff, and Mike Clarke. "Forest plots: trying to see the wood and the trees." Bmj 322.7300 (2001): 1479-1480.

Anzures‐Cabrera, Judith, and Julian Higgins. "Graphical displays for meta‐analysis: An overview with suggestions for practice." Research Synthesis Methods 1.1 (2010): 66-80.

Cochrane: "Considerations and recommendations for 
figures in Cochrane reviews: graphs of statistical data"
 4 December 2003 (updated 27 February 2008)

Reade, Michael C., et al. "Bench-to-bedside review: Avoiding pitfalls in critical care meta-analysis–funnel plots, risk estimates, types of heterogeneity, baseline risk and the ecologic fallacy." Critical Care 12.4 (2008): 220.

DerSimonian, Rebecca, and Nan Laird. "Meta-analysis in clinical trials."Controlled clinical trials 7.3 (1986): 177-188.

Biggerstaff, B. J., and R. L. Tweedie. "Incorporating variability in estimates of heterogeneity in the random effects model in meta-analysis." Statistics in medicine 16.7 (1997): 753-768.

The Cochrane Handbook: 9.5.4 "Incorporating heterogeneity into random-effects model"

Question 19 - 2016, Paper 1

Explain the following terms as applied to a randomised controlled clinical trial:

a) Allocation concealment. (25% marks)

b) Block randomisation, using block sizes of 4, in a trial of drug A versus drug B. (25% marks)

c) Stratification. (25% marks)

d) Minimisation algorithm. (25% marks)

College Answer

a)

Procedure for protecting the randomization process and ensuring that the clinical investigators and those involved in the conduct of the trial are not aware of the group to which the subject has been allocated

b)

Simple randomisation may result in unequal treatment group sizes; block randomisation is a method that may protect against this problem and is particularly useful in small trials.

In the context of a trial evaluating drug A or drug B and with block sizes of 4, there are 6 possible blocks of randomisation: AABB, ABAB, ABBA, BAAB, BABA, BBAA. 

One of the 6 possible blocks is selected randomly and the next 4 study participants are assigned according to the order of the block.  The process is then repeated as needed to achieve the necessary sample size. 

c)

Stratification is a process that protects against imbalance in prognostic factors that are present at the time of randomisation.  

A separate randomisation list is generated for each prognostic subgroup. Usually limited to 23 variables because of increasing complexity with more variables. 

d)

This is an alternative to stratification for maintaining balance in several prognostic variables.  The minimisation algorithm maintains a running total of the prognostic variables in patients that have already been randomised and then subsequent patients are assigned using a weighting system that minimizes imbalance in those prognostic variables. 

Discussion

In this paper, only one candidate (2.5% of the cohort) managed to just pass  this question (i.e. they got 5 marks out of 10).

a) Allocation concelament:

  • This is technique of preventing selection bias.
  • The selection of patients is randomised, and nobody knows what treatment the next enrolled patient will receive.
  • A truly random sequence of allocations prevents the investigators from being able to predict the allocated treatment on the basis of previous allocated treatments.
  • The difference between blinding and allocation concealment is that allocation concealment prevents the investigators from predicting who is getting what treatment before the patient is enrolled, whereas blinding prevents the investigators from knowing who is getting what treatment after the patient is enrolled.

b) Block randomisation:

  • Arrangement of experimental subjects in blocks, designed to keep the group numbers the same.
  • Usually, the block size is a multiple of the number of treatments (i.e. if it is a binary Drug A vs Drug B trial, the blocks would be in multiples of two).
  • Small blocks are better than large blocks.
  • The example where block sizes of 4 are used in a trial of drug A versus drug B is the same example used by Bland and Altman in their classical 1999 article, "How to randomise".
  • That example now, verbatim:

"...sometimes we want to keep the numbers in each group very close at all times. Block randomisation (also called restricted randomisation) is used for this purpose. For example, if we consider subjects in blocks of four at a time there are only six ways in which two get A and two get B: 1:AABB 2:ABAB 3:ABBA 4:BBAA 5:BABA 6:BAAB.  We choose blocks at random to create the allocation sequence. Using the single digits of the previous random sequence and omitting numbers outside the range 1 to 6 we get 5623665611. From these we can construct the block allocation sequence BABA/BAAB/ABAB/ABBA/BAAB, and so on. The numbers in the two groups at any time can never differ by more than half the block length. Block size is normally a multiple of the number of treatments."

c) Stratification:

  • Stratification is the partitioning of subjects and results by a factor other than the treatment given.
  • Stratification ensures that pre-identified confounding factors are equally distributed, to achieve balance. The objective is to remove "nuisance variables", eg. the presence of neutropenia in a trial performed on septic patients. One would want to ensure that the treatment group and the placebo group had equal numbers of these haematology disasters.

d) Minimisation algorithm:

  • Minimisation is a method of adaptive stratified sampling.
  • The objective is to minimise the imbalance between groups of patients in a clinical trial by ensuring that the treatment group and placebo group each get an equal number of patients with some sort of predetermined characteristics which might act as confounding factors.
  • The minimisation algorithm carefully places patients in groups according to thse pre-identified confounding factors. Only the first patient is randomly allocated.
  • Minimisation is methodologically equivalent to true randomisation, but does not correct for unknown confounders (only the  known pre-determined ones)

References

Altman, Douglas G., and J. Martin Bland. "How to randomise." Bmj 319.7211 (1999): 703-704.

Question 20 - 2016, Paper 2

Evaluation of a novel serum biomarker for the rapid diagnosis of sepsis is performed in a sample of 100 patients with fever. The biomarker is compared with positive culture results as the gold standard and yields the following information:

Sepsis present

{culture positive)

Sepsis absent

{culture negative)

Biomarker  positive

30

10

Biomarker negative

30

30

n

60

40

With reference to these results, define the following and give the values for the performance of the test:

a) Sensitivity. (20% marks)

b) Specificity. (20% marks)

c) Positive predictive value. (20% marks)

d) Negative predictive value. (20% marks)

e) Accuracy . (20% marks)

College answer

 

Sepsis present  (culture positive)

Sepsis absent (culture negative)

 

Biomarker positive

30 (a)

10 (b)

(a + b)

Biomarker negative

30 (c)

30 (d)

(c + d)

n

60 (a + c)

40 (b + d)

(a+b+c+d)

a)    Ability of test to identify true positives 
Or the probability the test will be positive in individuals who do have the disease 
    Sensitivity          a/(a+c)          30/60      50% 
 
b)    Ability of test to identify true negatives 
Or the probability the test will be negative in individuals who do not have the disease. 
   Specificity          d/(b+d)          30/40      75% 
 
c)    Likelihood of positive test meaning patient has sepsis 
    PPV           a/(a+b)          30/40      75% 
      
 
d)    Likelihood of negative test meaning patient does not have sepsis 
    NPV           d/(c+d)          30/60      50% 
 
e)    The ability to differentiate patient and healthy cases correctly. 
    Accuracy          (a+d)/(a+b+c+d)          60/100     60% 

Additional Examiners' Comments: 
The question clearly stated that a definition was required. Many candidates either could not define the terms or just missed this part of the question and therefore missed out on marks. This question has come up a number of times in past exams and these are basic statistical concepts that some candidates clearly do not understand. 

Discussion

This question closely resembles all other previous questions about the measures of diagnostic test accuracy:

  • Question 19.2 from the first paper of 2010 (Calculate sensitivity, specificity, PPV and NPV)
  • Question 29.2 from the first paper of 2008 (Calculate sensitivity, specificity, PPV and NPV)
  • Question 15 from the first paper of 2007 (Calculate sensitivity, specificity, PPV,  NPV and PLR)
  • Question 13 from the first paper of 2005 (Define sensitivity, specificity, PPV and NPV)
  • Question 14 from the second paper of 2002 (Define sensitivity, specificity, PPV and NPV)

After being absent from the papers for over five years, one might have been forgiven for thinking that such calculator-intense statistics questions were demoted to the level of primary exam material (as most recent statistics questions in the Fellowship Exam have been more about interpretation of meta-analysis data and other such ultra-clever "fellow level" uses of EBM).  The main difference in 2016 was the addition of accuracy as one of the examined parameters. This has never been examined previously, and is not a frequently mentioned measure (even though colloquially we might use the term near-constantly). An excellent 2008 article was used to define it for the purposes of this model answer.

Clearly, at least one candidate remembered all the definitions, and got 10 marks. 

a)

  • Sensitivity = true positives / (true positives + false negatives)
  • This is the proportion of disease which was correctly identified.
  • In this case, Sn = 30 / (30 + 30) = 50%

b)

  • Specificity = true negatives / (true negatives + false positives)
  • This is the proportion of healthy patients in who disease was correctly excluded
  • In this case, Sp = 30 / (30 + 10) = 75%

c)

  • Positive Predictive Value = true positives / total positives (true and false)
  • This is the proportion of the positive tests results which are actually positive
  • In this case, PPV = 30 / (30 + 10) = 75%

d)

  • Negative Predictive Value = true negatives / total negatives (true and false)
  • This is the proportion of negative test results which are actually negative
  • In this case, NPV = 30 / (30 + 30) = 50%

e)

  • Accuracy = (true positives + true negatives) / (total)
  • This is the proportion of correctly classified subjects among all subjects
  • In this case, accuracy = (30+30) / 100 = 50%

References

Šimundić, Ana-Maria. "Measures of diagnostic accuracy: basic definitions." Med Biol Sci 22.4 (2008): 61-5.

Question 11 - 2017, Paper 1

Question 11

The following table gives information on the proportions of a population that have been exposed to a risk factor for a disease and then subsequently developed the disease.

Exposure +      Indicates the proportion exposed to the risk factor (A+B)

Exposure -       Indicates the proportion not exposed to the risk factor (C+D)

Disease +         Indicates the proportion that subsequently developed the disease (A+C)

Disease -         Indicates the proportion that did not subsequently develop the disease (B+D)

Disease +

Disease -

Exposure +

A

B

Exposure -

C

D

Define prevalence AND, with reference to A, B, C, D in the table above, give the prevalence of the disease in this population.           (20% marks)

Define relative risk (RR) AND, with reference to A, B, C, D in the table above, derive the relative risk Of developing the disease after exposure to the risk factor. (40% marks)

Define attributable risk (AR) AND, with reference to A, B, C, D in the table above, give the attributable risk of exposure to the risk factor on developing the disease in this population. (40% marks)

College answer

a) Prevalence: number of event (e.g. disease) in a specific population at a particular time point.

Prevalence of the Disease in this population:

A+C / (A+B+C+D)

b) Relative risk is the ratio of the probability of an event occurring (e.g. developing a disease) in an exposed group to the probability of the event occurring in a comparison, in non-exposed group

[A / (A+B) ] / [ C / (C+D)]

c) Attributable risk is the difference in the rate of a condition between an exposed and unexposed population.

A/ (A+B)-C/(C+D)

Discussion

This is another SAQ which makes it very easy to earn high marks, as it asks for unambiguous memorised definitions and has a clear-cut right answer.

Somebody got 9.2.

Prevalence:

  • The proportion of individuals in a population having a disease or characteristic in a particular population at a given time.
  • Prevalence = number of affected individuals / total number in population
    = (A+C) / (A+B+C+D)

Relative risk:

This is the difference in event rates between 2 groups expressed as proportion of the event rate in the untreated group. The slightly broken English of the college answer probably comes from an article similar to the 2017 article by Tenny et al, and was probably meant to say "relative risk is a ratio of the probability of an event occurring in the exposed group versus the probability of the event occurring in the non-exposed group."

  • RR =  absolute risk in treatment group / absolute risk in control group
    (absolute risk = number of cases in group / total number of group)
  • Thus, RR = [A/(A+B)]/[C/(C+D)]

Attributable risk:

  • This is a measure of the absolute effect of the risk of those exposed compared to unexposed. It indicates the number of cases of a disease among exposed individuals that can be attributed to that exposure
  • AR = Incidence(exposed) – Incidence(unexposed) 
    (incidence = number of cases / population at risk)
  • Thus, AR = ( A / A+B) - (C /C+D)

References

Question 28 - 2017, Paper 2

A colleague directs your attention to a recently published randomised trial on a therapeutic intervention.

Outline the features of the trial that would lead you to change your practice.

College answer

Points to consider in the answer would be:

  1. Does the population studied correspond with the population the candidate expects to treat?
  2. Were the inclusion/exclusion criteria appropriate?
  3. Was the trial methodology appropriate – was there adequate blinding and randomisation?
  4. Was the primary outcome a clinically relevant or a surrogate endpoint?
  5. Was the length of follow up adequate?
  6. Was the trial sufficiently powered to detect a clinically relevant effect?
  7. Were the groups studied equivalent at baseline?
  8. Is the statistical analysis appropriate – was there an intention to treat analysis, have differences between groups at baseline been adjusted for? Are there multiple sub group analyses, and if so were they specified a priori?
  9. Is this a single centre study or multi centre?
  10. Were the results clinically significant rather than just statistically significant?
  11. Is the primary hypothesis biologically plausible with pre-existing supporting evidence?
  12. Are the findings supported by other evidence – have these results been replicated? 
  13. Would there be logistical and/or financial implications in practice change?
  14. Are there important adverse effects of the treatment?

Discussion

This is slightly different to asking "what makes a valid trial" or "how do you judge high-quality evidence", even though these clearly play a role (and in fact the college answer consists of a boring list of such criteria).  There are situations where practice is changed by methodologically inferior but otherwise compelling studies; or where expertly designed trials make minimal impact in the daily practice of individuals. A good read on this specific subject is a wonderfully titled 2016 article by John Ioannidis, "Why most clinical research is not useful." 

In short, a trial should possess the following features in order to affect practice:

Answers to a real problem. The clinical trial needs to be addressing something which is a problem, and which needs to be fixed in some way. If there is no problem, then the trial was pointless because existing practice is already good enough (i.e. no matter how good the methodological quality, the trial can be safely ignored because your practice does not need to change). Similarly, if the problem is not sufficiently serious, the cost and consequences of changing practice outweighs the benefit.

Information Gain. The clinical trial should have offered an answer which we don't already know. 

Pragmatism. The trial should be related to a real-life population and realistic settings, rather than some idealised scenario.

Patient-centered outcome. Some might argue that research should be aligned with the priorities of patients rather than those of investigators or sponsors. 

Transparency. The trial authors should be transparent in order for the results to inspire enough confidence to change practice on the basis of its results.

Validity. The trial should be constructed with sufficient methodological quality for its results to be taken seriously. 

References

Oh's Intensive Care manual: Chapter 10 (p83), Clinical trials in critical care by Simon Finfer and Anthony Delaney.

JAMA: User's guides to the medical literature; see if you can get institution access to these articles.

The CONSORT statement has its own website and is available for all to peruse.

CASP (Critical Appraisal Skils Program) has checklists for the appraisal of many different sorts of studies; these actually come with tickboxes. One imagines reviewers wandering around a trial headquarters, ticking these boxes on their little clipboards.

Ioannidis, John PA. "Why most clinical research is not useful." PLoS medicine 13.6 (2016): e1002049.

Question 24 - 2018, Paper 1

Regarding randomised clinical trials:

a) What is a noninferiority trial?    (10% marks)
b) What is the null hypothesis in a noninferiority trial?    (10% marks)
c) Why would a noninferiority trial be undertaken instead of a superiority trial?    (40% marks)
d) What are the limitations of noninferiority trials?    (40% marks)

 

College answer

a)

An active control trial which tests whether an experimental treatment is not worse than the control treatment by more than a specified margin. Originally conceived as “a safe alternative” treatment.

b)

The null hypothesis states that the primary end point for the new treatment is worse than that of the active control by a prespecified margin, and rejection of the null hypothesis at a prespecified level of statistical significance permits a conclusion of noninferiority

c)

Typically, a placebo controlled trial would be considered unethical as an established treatment already exists.

The investigators may consider the experimental treatment unlikely to be superior to established treatment or the current treatment is highly effective.

The experimental treatment may offer advantages such as safety (reduced adverse effects), better compliance, lower cost or more convenience. 

d)

Proving that two treatments are equivalent could mean that they are both ineffective or even harmful.  Could lead to the acceptance of progressively worse treatments if noninferiority is blindly accepted with repeated noninferiority trials ('biocreep'). 

Conditions and practice may have changed since the original placebo trial of the current standard treatment.

Equipoise is more complex.

Analysis is more complex

A poorly conducted study tends to “noninferiority” as missing data and protocol violations favour noninferiority.

The margin by which non-inferiority is determined is arbitrarily decided by the researchers and may not be clinically appropriate

Sample sizes larger than placebo controlled trials

Examiners Comments:

 

Very poorly answered. Evidence based medicine is an important part of the curriculum and the examiners were concerned at the low level of knowledge displayed. Some candidates appeared to list unrelated phrases from the EBM literature without any appearance of understanding.

 The level of detail given in the template was not required to obtain a passing mark in this question.

Discussion

a) What is a noninferiority trial?    

  • Superiority trials aim to demonstrate that there is a difference between treatments, i.e. that one treatment is better than another
  • Equivalence trials aim to demonstrate that the effects differ by no more than a specific amount (the "equivalence margin"). 
  • Non-inferiority trials aim to demonstrate that an experimental treatment is not worse than an active control by more than the equivalence margin

b) What is the null hypothesis in a noninferiority trial?    

In superiority trials, the hypothesis is that the experimental treatment is different (better) to the standard treatment, and two-sided statistical tests are used to test the null hypothesis (because the experimental treatment could be better or worse). The null hypothesis is therefore that there really is no difference. In equivalence trials the null hypothesis is that the treatments are significantly different, by a specified margin (the "equivalence margin"). In non-inferiority trials the null hypothesis is that the experimental treatment is worse than the standard treatment - and the pre-specified equivalence margin determines how much worse

The diagram below is borrowed and modified from Ian A Scott (2009), and demonstrates the results and confidence interval ranges expected of the three different types of trials, when they have demonstrated that the null hypothesis is false.

superiority equivalence and non-inferiority trials

Superiority trials have to have their results well over to the "favours experimental treatment" side, usually by a pre-specified margin. Equivalence trials need to have their results and confidence intervals within that margin to confirm that the two treatments are in fact equivalent. Non-inferiority trials also need to have their results within that margin, but there is no need to prove that the treatment is superior (i.e the confidence intervals and results simply need to remain within the not much worse margin, the "+1%" line in the diagram).


c) Why would a noninferiority trial be undertaken instead of a superiority trial?  

A non-inferiority trial is appropriate when:

  • A placebo treatment is unethical
  • The standard treatment is exceptionally effective
  • The experimental treatment is thought to be equivalent or at least not worse but not superior to the current treatment (i.e. everybody is convinced that a superiority trial would show no difference)
  • The experimental treatment is expected to be similar to the standard treatment in terms of the primary outcome, but has other unrelated advantages (eg. is cheaper, less invasive or more convenient) in which case it would be helpful to demonstrate that its' efficacy is not worse.

d) What are the limitations of noninferiority trials?    

  • The standard of care you test against may be more harmful than placebo.
  • Because you are not testing against placebo, a situation may arise where both treatments are similarly harmful, and you have merely demonstrated that your experimental treatment is not any more harmful than the current harmful standard of care.
  • Because you are not testing against placebo the effect size difference is smaller, and in order to achieve satisfactory power the sample size needs to be larger (and your trial becomes more expensive).
  • If the effect of the standard treatment is very close to the effect of a placebo, then the effect of the supposedly non-inferior experimental treatment may end up being very close to the placebo.
  • If you test one treatment and prove that it is not much worse, and then test another treatment proving that it is not much  worse than the last, you may eventually come to a point where after multiple noninferiority trials you have demonstrated that your terrible useless treatment is not  much worse than the other terrible useless treatment, something described as "biocreep", or the acceptance of progressively worse treatments.
  • Equipoise is ethically necessary to run these trials, but there may be no equipoise with regards to non-inferiority (i.e. some may genuinely believe that the standard treatment is substantially superior to the experimental treatment). Considering that the null hypothesis is that the experimental treatment is much worse, some ethicists may argue that true equipoise is impossible. You basically end up consenting your enrolled patients to agree that they may be randomised to a treatment which is believed to be inferior, or which at best might turn out to be no better.
  • A poorly conducted superiority trial (i.e. with many protocol violations and drop-outs) will have a result which trends towards non-inferiority because through intention-to-treat analysis the effect size of the experimental treatment will be diluted.
  • The investigators are in control of the equivalence margin, which means they could have decided on an inappropriately wide margin. If the margin is established after the results become available, the experimental treatment could appear not much worse by manipulating how much worse you would accept as a threshold. Even pre-specified margins might be completely arbitrary and inappropriate. There is some pressure to select an inappropriately wide limit - the wider the limit, the smaller the sample size you will require, and the cheaper your trial. This may lead to truly ridiculous conclusions. For instance, Silvio Garattini (2007) describes the COMPASS trial where "the thrombolytic saruplase was judged equivalent to streptokinase for post-myocardial infarction, even though the saruplase group had 50% more deaths than the control group".
  • For a drug company, to prove non-inferiority of a new drug is less risky than to try to demonstrate their superiority. Failure to demonstrate superiority may stop the product from making its way into the market, and doesn't look as good on the promotional literature.

References

Snapinn, Steven M. "Noninferiority trials." Trials 1.1 (2000): 19.

Lesaffre, Emmanuel. "Superiority, equivalence, and non-inferiority trials." Bulletin of the NYU hospital for joint diseases66.2 (2008): 150-154.

Murray, Gordon D. "Switching between superiority and non‐inferiority." British journal of clinical pharmacology 52.3 (2001): 219-219.

Scott, Ian A. "Non-inferiority trials: determining whether alternative treatments are good enough." Med J Aust 190.6 (2009): 326-330.

Piaggio, Gilda, et al. "Reporting of noninferiority and equivalence randomized trials: extension of the CONSORT 2010 statement." Jama 308.24 (2012): 2594-2604.

Garattini, Silvio. "Non-inferiority trials are unethical because they disregard patients' interests." The Lancet 370.9602 (2007): 1875-1877.

Fleming, Thomas R. "Current issues in non‐inferiority trials." Statistics in medicine 27.3 (2008): 317-332.

Question 6 - 2018, Paper 2

Give the rationale for using the following techniques in a randomised controlled clinical trial: 
 
a)    Allocation concealment.          (30% marks) 
 
b)    Block randomization.               (30% marks) 
 
c)    Stratification.                            (30% marks) 
 
d)    Minimisation algorithm.          (10% marks) 

College answer

a) Allocation concealment                                         
Procedure for protecting the randomization process and ensuring that the clinical investigators and those involved in the conduct of the trial are not aware of the group to which the subject has been allocated 
 
b) Block randomisation  
Simple randomisation may result in unequal treatment group sizes; block randomisation is a method that may protect against this problem and is particularly useful in small trials. 
 
In the context of a trial evaluating drug A or drug B and with block sizes of 4, there are 6 possible blocks of randomisation: AABB, ABAB, ABBA, BAAB, BABA, BBAA.  
 
One of the 6 possible blocks is selected randomly, and the next 4 study participants is assigned according to the order of the block.  The process is then repeated as needed to achieve the necessary sample size.  
  
c) Stratification                                              
Stratification is a process that protects against imbalance in prognostic factors/confounders that are present at the time of randomisation.   
A separate randomisation list is generated for each prognostic subgroup.  Usually limited to 2-3 variables because of increasing complexity with more variables.  
  
d) Minimisation algorithm                                         
This is an alternative to stratification for maintaining balance in several prognostic variables.  
The minimisation algorithm maintains a running total of the prognostic variables in patients that have already been randomised and then subsequent patients are assigned using a weighting system that minimizes imbalance in those prognostic variables.  

Discussion

This question is virtually identical to Question 19 from the first paper of 2019, where the trainees were expected to explain these terms rather than offer a rationale for them. The college answer to both questions is identical, suggesting that the examiners do not see any distinction in their wording (or, that they are indifferent to the candidates' interpretation of the question). Either way, for whatever reason the first time around this SAQ did very poorly (only one candidate passed, and barely at that), whereas this time it seems 49.3% scored over 5.0, and some EBM genius scored 8.5.

 Without further ado:

Allocation concealment:

  • This is a technique of preventing selection bias.
  • The selection of patients is randomised, and nobody knows what treatment the next enrolled patient will receive.
  • A truly random sequence of allocations prevents the investigators from being able to predict the allocated treatment on the basis of previously allocated treatments.
  • Allocation concealment prevents the investigators from predicting who is getting what treatment before the patient is enrolled, whereas blinding prevents the investigators from knowing who is getting what treatment after the patient is enrolled.

Block randomisation:

  • The arrangement of experimental subjects in blocks, designed to keep the group numbers the same.
  • Usually, the block size is a multiple of the number of treatments (i.e. if it is a binary Drug A vs Drug B trial, the blocks would be in multiples of two).
  • Small blocks are better than large blocks.
  • The example offered by the college answer is the same example used by Bland and Altman in their classical 1999 article, "How to randomise".  That example now, verbatim:
  • "...sometimes we want to keep the numbers in each group very close at all times. Block randomisation (also called restricted randomisation) is used for this purpose. For example, if we consider subjects in blocks of four at a time there are only six ways in which two get A and two get B: 1:AABB 2:ABAB 3:ABBA 4:BBAA 5:BABA 6:BAAB.  We choose blocks at random to create the allocation sequence. Using the single digits of the previous random sequence and omitting numbers outside the range 1 to 6 we get 5623665611. From these we can construct the block allocation sequence BABA/BAAB/ABAB/ABBA/BAAB, and so on. The numbers in the two groups at any time can never differ by more than half the block length. Block size is normally a multiple of the number of treatments."

Stratification:

  • Stratification is the partitioning of subjects and results by a factor other than the treatment given.
  • Stratification ensures that pre-identified confounding factors are equally distributed, to achieve balance. The objective is to remove "nuisance variables", eg. the presence of neutropenic bone marrow transplant recipients in a trial performed on septic patients. One would want to ensure that the treatment group and the placebo group had equal numbers of these haematology disasters.

Minimisation algorithm:

  • Minimisation is a method of adaptive stratified sampling.
  • The objective is to minimise the imbalance between groups of patients in a clinical trial by ensuring that the treatment group and placebo group each get an equal number of patients with some sort of predetermined characteristics which might act as confounding factors.
  • The minimisation algorithm carefully places patients in groups according to the pre-identified confounding factors. Only the first patient is randomly allocated.
  • Minimisation is methodologically equivalent to true randomisation but does not correct for unknown confounders (only the  known pre-determined ones)

References

Altman, Douglas G., and J. Martin Bland. "How to randomise." Bmj 319.7211 (1999): 703-704.

Question 11 - 2019, Paper 1

a)    What is a Standardised Mortality Ratio (SMR) and how is it calculated?    (20% marks)

b)    The SMR in your ICU has increased from 0.95 to 1.05 in the past 12 months. Outline the possible causes.    (80% marks)
 

College answer

a)    Overview of SMR (20% marks)

SMR is one of the quality indicators that reflect the performance of an ICU.
Definition of SMR = ratio of observed deaths in the study group to expected deaths in the general population based on APACHE or other severity of illness
SMR values of 1 indicate expected performance, whereas values below 1 and above 1 indicate respectively better and worse performances than expected
 

b)    Causes for increase (80% marks)

Lower than expected predicted mortality
Errors in predicted/expected mortality due to gaps in data, changes in case-mix etc

Change in data collection systems or personnel – e.g., change in the way the expected mortality is estimated

Lead-time bias (pre-ICU care) – patients transferred from other facilities may have become more stable after receiving appropriate management at the original hospital.

Increases in observed mortality

Based on hospital mortality, not ICU mortality – therefore, influenced by pre-ICU and post ICU care in the hospital

Change in case-mix, so changes in case mix may account for increase in SMR and increased other hospital admissions

One-off events such as mass disasters, epidemics etc

Variations in practice, changes in clinical protocols either in the hospital or in the ICU Changes in personnel – e.g., new intensivist, new surgeon etc

Changes in staffing levels and training

New services introduced such as ECMO etc.

Examiner’s Comments:

The candidates rarely considered the denominator. Often wrote "admitted sicker patients" without considering these likely to also have higher predicted mortality. Rarely any structure.

Discussion

In brief:

  • SMR is the ratio of the observed mortality vs. predicted mortality for a specified time period.
  • The formula is SMR = observed number of deaths / expected number of deaths,  where the expected number of deaths is predicted by an illness severity scoring system
  • One can use this to compare hospitals and ICUs
  • One needs to first calculate the predicted hospital mortality using an illness severity scoring system.
  • An SMR of 1 means the mortality is as expected.
  • An SMR of < 1 is better than expected, and >1 is worse than expected.

Causes for an elevation of the SMR were separated into two categories by the college; either the predicted mortality has dropped, or the actual mortality has increased. Another way of looking at this is whether the SMR elevation is "true", or whether it is spurious, i.e. where the change in SMR is not representative of a change in the quality of care being provided by the ICU. 

  • Spurious elevation of SMR
    • Poor data entry (i.e. true illness severity is not captured by lazy registrars failing to dutifully record every last drop of urine in the APACHE form)
    • "Lead time bias" - treatment received prior to ICU admission may result in artifically normalised acute physiology scores
    • "Healthy worker effect" - a change towards selective ICU admission practices may be favouring patients who score low on illness severity scales, eg. young elective surgical patients
  • True elevation due to internal ICU issues
    • A new staffing model is in place (inexperienced staff)
    • Understaffing has impaired patient care
    • Junior people are not following unfamiliar protocols, or the new protocols are of a poor quality
    • New equipment or technique is less useful than advertised
  • True elevation due to external problems
    • Increase in the pre-hospital morbidity of admitted patients (eg. increased acuity, where you suddenly become a trauma centre or an organ transplant service)
    • Play of chance, eg. mass casualty event 
    • Deterioration of the quality of pre-ICU care
    • Parameters which govern ICU admission have changed (eg. administrative pressure is being placed on the ICU to rapidly admit ED patients who have had little management or workup)
    • Discharge arrangements have changed (eg. a local palliative care ward had shut down, and you keep dying patients in the ICU because it would be insensitive to transfer them to the next nearby palliative care unit)

References

Young, Paul, et al. "End points for phase II trials in intensive care: Recommendations from the Australian and New Zealand clinical trials group consensus panel meeting." Critical Care and Resuscitation 15.3 (2013): 211. - this one is not available for free, but the 2012 version still is:

Young, Paul, et al. "End points for phase II trials in intensive care: recommendations from the Australian and New Zealand Clinical Trials Group consensus panel meeting." Critical Care and Resuscitation 14.3 (2012): 211.

Suter, P., et al. "Predicting outcome in ICU patients." Intensive Care Medicine20.5 (1994): 390-397.

Martinez, Elizabeth A., et al. "Identifying Meaningful Outcome Measures for the Intensive Care Unit." American Journal of Medical Quality (2013): 1062860613491823.

Tipping, Claire J., et al. "A systematic review of measurements of physical function in critically ill adults." Critical Care and Resuscitation 14.4 (2012): 302.

Gunning, Kevin, and Kathy Rowan. "Outcome data and scoring systems." Bmj319.7204 (1999): 241-244.

Woodman, Richard, et al. Measuring and reporting mortality in hospital patientsAustralian Institute of Health and Welfare, 2009.

Vincent, J-L. "Is Mortality the Only Outcome Measure in ICU Patients?."Anaesthesia, Pain, Intensive Care and Emergency Medicine—APICE. Springer Milan, 1999. 113-117.

Rosenberg, Andrew L., et al. "Accepting critically ill transfer patients: adverse effect on a referral center's outcome and benchmark measures." Annals of internal medicine 138.11 (2003): 882-890.

Burack, Joshua H., et al. "Public reporting of surgical mortality: a survey of New York State cardiothoracic surgeons." The Annals of thoracic surgery 68.4 (1999): 1195-1200.

Hayes, J. A., et al. "Outcome measures for adult critical care: a systematic review." Health technology assessment (Winchester, England) 4.24 (1999): 1-111.

RUBENFELD, GORDON D., et al. "Outcomes research in critical care: results of the American Thoracic Society critical care assembly workshop on outcomes research." American journal of respiratory and critical care medicine 160.1 (1999): 358-367.

Turnbull, Alison E., et al. "Outcome Measurement in ICU Survivorship Research From 1970 to 2013: A Scoping Review of 425 Publications." Critical care medicine (2016).

Solomon, Patricia J., Jessica Kasza, and John L. Moran. "Identifying unusual performance in Australian and New Zealand intensive care units from 2000 to 2010." BMC medical research methodology 14.1 (2014): 1.

Liddell, F. D. "Simple exact analysis of the standardised mortality ratio." Journal of Epidemiology and Community Health 38.1 (1984): 85-88.

Ben-Tovim, David, et al. "Measuring and reporting mortality in hospital patients." Canberra: Australian Institute of Health and Welfare (2009).

McMichael, Anthony J. "Standardized Mortality Ratios and the'Healthy Worker Effect': Scratching Beneath the Surface." Journal of Occupational and Environmental Medicine 18.3 (1976): 165-168.

Wolfe, Robert A. "The standardized mortality ratio revisited: improvements, innovations, and limitations." American Journal of Kidney Diseases 24.2 (1994): 290-297.

Kramer, Andrew A., Thomas L. Higgins, and Jack E. Zimmerman. "Comparing observed and predicted mortality among ICUs using different prognostic systems: why do performance assessments differ?." Critical care medicine 43.2 (2015): 261-269.

Spiegelhalter, David J. "Funnel plots for comparing institutional performance." Statistics in medicine 24.8 (2005): 1185-1202.

Teres, Daniel. "The value and limits of severity adjusted mortality for ICU patients." Journal of critical care 19.4 (2004): 257-263.

Question 22 - 2019, Paper 1

Outline the features and list the advantages and disadvantages of each of the following clinical trial designs:

a)  Cluster randomised trial.    (50% marks)

a)  Non-inferiority trial.    (50% marks)
 

College answer

Cluster randomised trial (10%)

Unit of randomisation is the cluster (e.g. one hospital or ICU) rather than individual patients. Individual clusters may be matched / paired with similar clusters to increase power

Power increased more by increasing number of clusters rather than increased numbers of patients within clusters

Advantages (20%)

Ability to test interventions directed at systems rather than individuals (e.g. MET, SDD, education campaigns)

Where individual patients not consented may lead to recruitment of ‘all’ patients with the entry criteria

–increased recruitment and external validity

Disadvantages (20%)

Larger numbers of patients required when compared to conventional individual patient RCT i.e. reduced statistical efficiency

Complex statistics: power calculation require knowledge or estimate of intercluster correlation coefficient

Chance of getting imbalance is greater depending on the characteristics of the cluster

 

b)

Non-inferiority trial (10%)

The null hypothesis in a noninferiority study states that the primary end point for the experimental treatment is worse than that for the positive control

treatment by a specified margin. Rejection of the null hypothesis supports a claim of noninferiority the control treatment

Advantages: (20%)

Allows investigation of a new therapy to be compared to an existing accepted therapy Does not require a placebo group, where this may be unethical

Allows cheaper or less toxic therapies to the introduced in place of older therapies

Disadvantages: (20%)

Does not prove efficacy of tested therapy Relies upon known / accepted benefit of control

Needs to be performed under similar conditions in which the active control has demonstrated benefit No clear consensus on what margin of noninferiority should be accepted

Repeated noninferiority trial may lead to acceptance of inferior therapies ‘biocreep’

Examiners Comments:

Significant knowledge gap. Disappointing, since several important trials have followed these designs.
 

Discussion

The disappointment felt at the 4.5% pass rate for this question underscores the need to promote formal training in statistics and literature analysis. Other colleges have already moved to such a strategy, where their trainees may dispense with the increasingly pointless formal project (a mandatory requirement to generate meaningless papers) by satisfying their research requirements though a university unit of study in interpretation of evidence-based medicine.

In summary:

Features of a cluster-randomised trial:

  • Groups of patients rather than individuals are randomised
  • A group may be as large as a hospital or an ICU
  • This is done because sometimes, it would be totally impractical to randomise an intervention to each individual patient; for example where the intervention is a large scale organisational change
  • The number of patients in each cluster does not matter as much as the total number of clusters, and power design involves deciding how many clusters one requires (patients within a cluster are more likely to have similar outcomes).
  • The outcome for each patient can no longer be assumed to be independent of that for any other patient, 

Advantages of a cluster-randomised trial:

  • Able to test interventions applied to whole services or communities
  • Increased logistical convenience (less difficulty than individual randomisation)
  • Greater acceptability by participants (when something viewed as a worthwhile intervention is delivered to a large group rather than to individuals)
  • Both the direct and indirect effects of an intervention can be captured in a population, i.e. the study is more pragmatic (a good example is a study of infectious disease: not only do the randomised participants benefit from a decontaminatingtreatment, but also the population who are exposed to them)
  • This increases the external validity

Disadvantages of a cluster-randomised trial:

  • The statistical power of a cluster randomised trial is greatly reduced in comparison with a similar sized individually randomised trial (Campbell & Grimshaw, 1998)
  • The number of patients required  may be twice or thrice that of a comparable individually randomised trial
  • To calculate the power of such a trial requires a specialised approach. The intracluster correlation coefficient needs to be taken into account, as standard power calculations will lead to an underpowered trial if it is analysed taking clustering into account.
  • Analysis needs to take into account the cluser design: "If the clustering effect is ignored p values will be artificially extreme, and confidence intervals will be over-narrow, increasing the chances of spuriously significant findings and misleading conclusions". Apparently, this adjustment does not routinely happen.

Features of a non-inferiority trial

  • Non-inferiority trials aim to demonstrate that an experimental treatment is not worse than an active control by more than the equivalence margin. 
  • In superiority trials, the hypothesis is that the experimental treatment is different (better) to the standard treatment, and two-sided statistical tests are used to test the null hypothesis (because the experimental treatment could be better or worse). The null hypothesis is therefore that there really is no difference. In equivalence trials the null hypothesis is that the treatments are significantly different, by a specified margin (the "equivalence margin"). In non-inferiority trials the null hypothesis is that the experimental treatment is worse than the standard treatment - and the equivalence margin determines how much worse

Advantages of a non-inferiority trial:

A non-inferiority trial is appropriate when:

  • A placebo treatment is unethical
  • The standard treatment is exceptionally effective
  • The experimental treatment is thought to be equivalent or at least not worse but not superior to the current treatment (i.e. everybody is convinced that a superiority trial would show no difference)
  • The experimental treatment is expected to be similar to the standard treatment in terms of the primary outcome, but has other unrelated advantages (eg. is cheaper, less invasive or more convenient) in which case it would be helpful to demonstrate that its' efficacy is not worse.

Disadvantages of non-inferiority trials

  • The standard of care you test against may be more harmful than placebo.
  • Because you are not testing against placebo, a situation may arise where both treatments are similarly harmful, and you have merely demonstrated that your experimental treatment is not any more harmful than the current harmful standard of care.
  • Because you are not testing against placebo the effect size difference is smaller, and in order to achieve satisfactory power the sample size needs to be larger (and your trial becomes more expensive).
  • If the effect of the standard treatment is very close to the effect of a placebo, then the effect of the supposedly non-inferior experimental treatment may end up being very close to the placebo.
  • If you test one treatment and prove that it is not much worse, and then test another treatment proving that it is not much  worse than the last, you may eventually come to a point where after multiple noninferiority trials you have demonstrated that your terrible useless treatment is not  much worse than the other terrible useless treatment, something described as "biocreep", or the acceptance of progressively worse treatments.
  • Equipoise is ethically necessary to run these trials, but there may be no equipoise with regards to non-inferiority (i.e. some may genuinely believe that the standard treatment is substantially superior to the experimental treatment). Considering that the null hypothesis is that the experimental treatment is much worse, some ethicists may argue that true equipoise is impossible. You basically end up consenting your enrolled patients to agree that they may be randomised to a treatment which is believed to be inferior, or which at best might turn out to be no better.
  • A poorly conducted superiority trial (i.e. with many protocol violations and drop-outs) will have a result which trends towards non-inferiority because through intention-to-treat analysis the effect size of the experimental treatment will be diluted.
  • The investigators are in control of the equivalence margin, which means they could have decided on an inappropriately wide margin. If the margin is established after the results become available, the experimental treatment could appear not much worse by manipulating how much worse you would accept as a threshold. Even pre-specified margins might be completely arbitrary and inappropriate. There is some pressure to select an inappropriately wide limit - the wider the limit, the smaller the sample size you will require, and the cheaper your trial. This may lead to truly ridiculous conclusions. For instance, Silvio Garattini (2007) describes the COMPASS trial where "the thrombolytic saruplase was judged equivalent to streptokinase for post-myocardial infarction, even though the saruplase group had 50% more deaths than the control group".
  • For a drug company, to prove non-inferiority of a new drug is less risky than to try to demonstrate their superiority. Failure to demonstrate superiority may stop the product from making its way into the market, and doesn't look as good on the promotional literature.

References

Snapinn, Steven M. "Noninferiority trials." Trials 1.1 (2000): 19.

Lesaffre, Emmanuel. "Superiority, equivalence, and non-inferiority trials." Bulletin of the NYU hospital for joint diseases66.2 (2008): 150-154.

Murray, Gordon D. "Switching between superiority and non‐inferiority." British journal of clinical pharmacology 52.3 (2001): 219-219.

Scott, Ian A. "Non-inferiority trials: determining whether alternative treatments are good enough." Med J Aust 190.6 (2009): 326-330.

Piaggio, Gilda, et al. "Reporting of noninferiority and equivalence randomized trials: extension of the CONSORT 2010 statement." Jama 308.24 (2012): 2594-2604.

Garattini, Silvio. "Non-inferiority trials are unethical because they disregard patients' interests." The Lancet 370.9602 (2007): 1875-1877.

Fleming, Thomas R. "Current issues in non‐inferiority trials." Statistics in medicine 27.3 (2008): 317-332.

Campbell, Marion K., and Jeremy M. Grimshaw. "Cluster randomised trials: time for improvement: the implications of adopting a cluster design are still largely being ignored." (1998): 1171-1172.

Question 24 - 2019, Paper 2

In the context of clinical trials what is meant by the following terms:

a)    Stratification.    (20% marks)

b)    Intention to treat analysis.    (20% marks)

c)    Sensitivity analysis.    (20% marks)

d)    Kaplan-Meir curve.    (20% marks)

e)    Analysis of competing risk.    (20% marks)
 

College answer

a)    Stratification of clinical trials is the partitioning of subjects and results by a factor other than the treatment given

b)    Intention to treat analysis is the analysis of all participants allocated to a treatment group irrespective of whether they completed the treatment, withdrew, or deviated from protocol.

c)    A sensitivity analysis is the analysis of data from the trial with a change or alteration to one or more underlying assumptions used in the original analysis.

d)    A Kaplan-Meir curve is a plot of probability of survival against time.

e)    Analysis of competing risk is used when there are multiple endpoints of which the occurrence of one prevents the occurrence of another (e.g. death prevents the occurrence of shock reversal

Discussion

Stratification

  • Stratification is the partitioning of subjects and results by a factor other than the treatment given.
  • Stratification ensures that pre-identified confounding factors are equally distributed, to achieve balance. The objective is to remove "nuisance variables", eg. the presence of neutropenia in a trial performed on septic patients. One would want to ensure that the treatment group and the placebo group had equal numbers of these haematology disasters.
  • According to Question 19 from the first paper of 2016, the official Delaney definition of stratification is as follows:

"Stratification is a process that protects against imbalance in prognostic factors that are present at the time of randomisation.  

A separate randomisation list is generated for each prognostic subgroup. Usually limited to 23 variables because of increasing complexity with more variables"

Intention to treat analysis

  • "Once randomised, always analysed"
  • All enrolled patients have to be a part of the final analysis
  • This preserves the bias-protective effect of randomisation
  • Minimises Type 1 errors (false positives)
  • When intention-to-treat analysis agrees with per-protocol analysis, it increases the validity of the study

Sensitivity analysis

  • Analysis of the data from a clinical trial where some of the assumptions are intentionally changed
  • One example of this is to assume that all the patients lost to follow-up or who dropped out of the study have failed treatment.  

"Kaplan-Meir" curve (it's usually spelled "Meier", after Paul Meier):

  • A Kaplan-Meier curve is defined as the probability of surviving in a given length of time while considering time in many small intervals
  • The curve itself is a plot of the fraction of patients surviving in each group over time

Analysis of competing risk:

  • A competing risk is an event that either hinders the observation of the event of interest or modifies the chance that this event occurs
  • An example is death while on dialysis and getting a kidney transplant (the two eventsinterfere with one another)
  • Conventional methods (eg. Kaplan–Meier and standard Cox regression) ignore the competing events and may not be appropriate and competing risk analsysi methods must be employed

References

Morris, Tim P., Brennan C. Kahan, and Ian R. White. "Choosing sensitivity analyses for randomised trials: principles." BMC medical research methodology 14.1 (2014): 11.

Rich, Jason T., et al. "A practical guide to understanding Kaplan-Meier curves." Otolaryngology—Head and Neck Surgery 143.3 (2010): 331-336.

Noordzij, Marlies, et al. "When do we need competing risks methods for survival analysis in nephrology?." Nephrology Dialysis Transplantation 28.11 (2013): 2670-2677.

Question 26.1 - 2020, Paper 1

A randomised controlled trial examining a treatment for septic shock reports the following results:

"At 90 days after randomization, 27.9% patients who had been assigned to receive the treatment had died, as had 28.8% who had been assigned to receive placebo (odds ratio 0.95; 95% confidence interval [Cl], 0.82 to 1.10; P value= 0.50)."

 
a) Explain the meaning of the underlined terms. Interpret the result of the trial.

(40% marks)
 

College answer

Odds ratio: The odds of a patient in the treatment group dying within 90 days divided by the odds of patients in the placebo group dying within 90 days.

95% confidence interval: The range of values which is 95% certain to contain the population parameter of interest (in this case, Odds Ratio)

P Value: The probability of obtaining the observed, or more extreme results, assuming the null hypothesis is true.    (3 marks)
 

Discussion

In case it matters to anybody, in this SAQ the examiners are using the findings of the ADRENAL trial (Venkatesh et al, 2018)

Odds ratio:

  • The Odds Ratio represents the odds that an outcome will occur given a particular exposure, compared to the odds of the outcome occurring in the absence of that exposure.
  • An OR =1 suggests there is no association.
  • If the CI for an OR includes 1, then the OR is not significant (i.e. there might not be an association)

Confidence interval:

  • The range of values within which the "actual" result is found.
  • A CI of 95% means that if a trial was repeated an infinite number of times, 95% of the results would fall within this range of values.
  • The CI gives an indication of the precision of the sample mean as an estimate of the "true" population mean
  • A wide CI can be caused by small samples or by a large variance within a sample.

  p-value:

  • The probability of the observed result arising by chance
  • The p-value is the chance of getting the reported study result (or one even more extreme) when the null hypothesis is actually true.

Interpretation of results:

  • The OR is so close to 1 that there is probably no association if the results were significant, i.e. there is no treatment effect detected here. 

References

Question 26.2 - 2020, Paper 1

A randomised controlled trial examining a treatment for lung injury reports the following results:

"The primary outcome was change in SOFA score over 96 hours. The mean SOFA score from baseline to 96 hours decreased from 9.8 to 6.8 in the treatment group (3 points) and from 10.3 to 6.8 in the placebo group (3.5 points) (difference, -0.10; 95% CI, - 1. 23 to 1.03; P = 0.86).

There were 30 prespecified secondary outcomes. Twenty-nine were not significantly different between the treatment and the placebo group. In exploratory analyses that did not adjust for multiple comparisons, day 28 mortality was 46.3% in the placebo group vs 29.8% in the treatment group (P = 0.03; between-group difference, 16.58% [95% CI, 2% to 31.1%))."

a)  Interpret these results.    (30% marks)
 

College answer

a)
The primary outcome does not demonstrate a significant difference between the two groups and so the overall result of the trial is negative. A secondary outcome of day 28-day mortality does show a significant difference in favour of the treatment – however as this is one of 30 secondary outcomes, with no adjustment for multiplicity of testing this is likely a false positive result and should be interpreted cautiously. (3 marks)
 

Discussion

Thee findings borrowed for this SAQ come from the CITRIS-ALI trial (Truwit et al, 2019), in case anybody cares. 

The primary outcome is not statistically significant because of the high p-value (0.86 is pretty terrible) and because the confidence interval crosses over 1.0

As to the secondary outcome. If you have thirty (and ultimately CITRIS-ALI had forty-six) some of them are bound to produce some sort of publishable information. Day 28 mortality difference was statistically significant (p= 0.03), but because this is a secondary outcome, it should be viewed as hypothesis-generating. On an unrelated note, mortality of 46% in sepsis or ARDS is so 1990s. 

References

Question 26.3 - 2020, Paper 1

A prospective observational study examining the association between fluid therapy and outcome reports the following results:

"Crude 90-day mortality of patients who received colloids was higher than in  patients  treated  exclusively with crystalloids; (25.5% vs. 15.4%, odds ratio (OR) 1.84, 95% confidence interval (Cl) 1.56 to 2.18). After multiple logistic regression analysis, the adjusted OR was 0.923, 95% Cl (0.87 to 1.19), p = 0.09."

a)  Interpret these results.    (30% marks)
 

College answer

a)
There was a significantly higher mortality in patients who received colloids compared to those who received crystalloids. However, when other factors likely to influence mortality were taken into account by multiple logistic regression analysis, the difference was no longer statistically significant. The interpretation is that fluid choice is not significantly associated with 90-day mortality. (3 marks)
 

Discussion

The data here comes from Ertmer et al, 2011

The crude odds ratio here appears statistically significant as the CI is well away from 1.0. The effect size there is also significant. There is no p value reported, which is unhelpful. The adjusted OR is very different and is in fact the opposite of the crude OR, which raises major concerns. The "multiple logistic regression analysis" would have to be more carefully scrutinised to determine which variables they threw into the soup. Usually, the investigators just choose whichever variables had a p-value below 0.05 in the first univariate analysis. The more intelligent method would be to test the independent variables in pairs and in groups to understand the meaning behind their interaction, and then pick only the meaningful variables for your multivariate analysis. In short, there was no difference in mortality, according to the presented fragments of data.

References