The meaninglessness of published evidence

On the poor integrity of published evidence

LITFL has an excellent resouce, which they titled Dogma and Pseudoaxioms. It goes hand in hand with their explanation of how publication practices distort science.

I will not insult this article by reducing it into its key points. It must be digested by each individual person, alone and ideally in a quiet darkened room. In brief, one can summarise its message by saying that everything we know is wrong, and all published data is meaningless.

On the constantly changing opinion of the medical collective

Again, LITFL has an excellent summary of medical reversal. In brief it is a practive when one trial's finding is overturned by the fingings of the next, better designed trial. Apparently, it takes about 10 years between the initial study and its superior successor.

Why does this happen? What was wrong with the original trial?

  • It had positive findings, and thus was subject to publication bias
  • Its methodology was poor (eg. sample too small or not random enough)
  • It used surrogate outcomes, rather than patient-centred ones
  • There was a conflict of interest
  • The investigators were dishonest

But, with thanks to Benjamin Gladwin for his patience in crafting an excellent explanation, there is also a much more important reason for why medical reversal necessarily occurs on repeat publication:

  • Assume the null and alternative hypotheses are equally likely at the beginning of the experiment and we really have true equipoise (ie pre-test probability of 0.5 - true equipoise).
  • The post-test probability of the null hypothesis being false if we get a p-value of 0.05 is not 95% but in fact only 87%. 
  • This means that even if we just repeat the exact same experiment again there is a 13% (almost 1 in 8) chance that you will not get a result that is significant. 
  • Alternatively, you can say that if there really was equipoise before the initial experiment, by accepting a p>0.05 we accept a 1 in 8 chance (~ 13%) of a Type 1 error and are still happy to consider the outcome a "positive" study.  Not many are comfortable accepting that level of risk in other areas of practice. 
  • If you do want a post-test probability which gives you a "high" likelihood that repeating the experiment results in the same significant outcome (and therefore does not cause medical reversal), you need one of two conditions:
    • Either you need to have a very high pre-test probability that the null hypothesis is false (>76%).  In such a situation, there was no equipoise by any reasonable measure before the experiment and one wonders about the ethics of the whole thing.
    • Or you need to be shooting for a very low p-value (<0.01).  In this setting, the repeat experiments are much more likely to result in p-values which may still yield values >  0.01 but will be more likely to be below < 0.05.
  • Important points come from this for critical care research:
    • Few (or almost none) of the published studies in medicine have a p-value of under 0.01, we typically accept a p < 0.05.  It should therefore not be much of a surprise that once we repeat the study in a more rigorous/larger study there is a result that causes medical reversal.
    • If we are happy to change our practice on the basis of a study with a p-value of 0.05 then we should have a pre-test probability of rejecting the null of 76% or greater.  You almost need to be sure the null is wrong, before you read the outcome of the paper. 
    • Many of our studies are designed to test results which are positive (ie p < 0.05) in a secondary outcome or subgroup analysis.  This is how research questions should be done, but because any secondary outcome is only hypothesis generating it's even less likely to be true when studied rigorously. Altering our practice on the basis of the initial study (ie no Albumin in TBI from SAFE subgroup) should result in reversal in practice if that secondary outcome is studied again as a primary outcome.
  • In short, when we make changes to our practice on the basis of a positive trial, forming an opinion of its quality on the basis of a p ~ 0.05, then we do so on poor quality evidence, and it should not be surprising that we change the practice back once better evidence is obtained.

The Kehoe Principle and the Precautionary Principle

In Question 21 from the second paper of 2005, the candidates are nvited to explore the statement,  “The absence of evidence of effect does not imply evidence of absence of effect”. This is a rebuttal to the Argument from Ignorance, which (put simply) states that if something has not been proven true, then it must be false. The rebuttal addresses the third possibility, that the currently available evidence has failed to detect a phenomenon. In the interpretation of medical literature, this means that a study that has failed to demonstrate the evidence of a risk has not succeeded in demonstrating the absence of risk. Similarly, a study which has failed to demonstrate a significant difference between two treatments has not demonstrated the absence of difference, only the absence of evidence of a difference.


The idea that the absence of evidence for a phenomenon should imply that there is no such phenomenon is known in the form of the Kehoe principle, named after Robert Kehoe who argued that the use of leaded petrol was safe because at that stage there was no evidence to the contrary. The opposite view is known as the Precautionary Principle. It holds that in the absence of evidence, one must take a conservative stance and manage uncertain risks in a manner which most effectively serves human safety.


In the absence of evidence, the precautionary principle recommends that the clinician takes reasonable measures to avoid threats that are serious and plausible. In this, it may be a more humanistic principle than the alternatives (such as the Expected Utility Theory).

In brief:

  • Safest and most humanistic approach
  • Risk-averse
  • The burden of proof of safety is on the investigator
  • The burden of risk and benefit analysis is on the clinician


In its strongest formulation, the Precautionary Principle calls for absolute proof of safety before new treatments or techniques are adopted. Such stringent standards may result in an excessive regulation of potentially useful treatment strategies. One may envision a reductio ad absurdum where table salt is outlawed because there is insufficient evidence for its safety. Some authors have suggested that the precautionary principle "replaces the balancing of risks and benefits with what might best be described as pure pessimism". Furthermore, not all experimental questions can be answered with high-level evidence (eg. in the case of rare diseases with insufficient sample size for RCTs, or in the cases where it is unethical to randomise intervention).

Published data may not offer sufficient evidence. The power of a study influences its ability to discern an effect of a given size, and it is possible that small studies are inadequately powered to detect a small treatment effect. Type 2 errors can be committed in this way.

In brief:

  • Potentially useful treatments may be discarded for lack of evidence
  • Not all treatments can be the subject of RCTs, particularly
    • where sample size in by necessity small
    • where randomisation is unethical
    • where blinding is impossible
  • Not all studies of effective treatments are appropriately powered to detect an effect of appropriate size
  • Not all meta-analysis reviews are able to find all the available evidence due to publication bias

In summary:

There is a danger of misinterpreting "negative studies", because studies which have not found statistically significant differences in effect may have been inadequate to detect such an effect. In careful interpretation of medical literature one must be alert to the idea that not all negative studies are truly "negative". Decisonmaking in uncertainty should be guided by humanistic principles and careful risk-vs-benefit analysis.