Study power, population and sample size

Population size

How many patients does my trial need?

That depends on several factors.

The magnitude of the treatment effect: The larger the effect, the smaller the required sample size. For a truly tiny treatment effect, one would require truly massive numbers.

The control group outcome rate: How many of the control group are expected to develop the treatment effect.

The agreed-upon significance level (alpha): The level of probability you accept as "real", i.e. not due to chance. The greater your demands for significance, the larger the number of patients needs to be enrolled.

The power (beta): the percentage chance of detecting a treatment effect if there actually is one. This is something you decide upon before commencing the trial; the higher the power value, the more patients you will need. Typically, (1- beta) is 0.8, so there is a 20% chance (beta) of commiting a Type 2 error , or a "false negative".

Obviously, if your trial has too few patients, you are more likely to commit a Type 2 error. The negative results of this trial will force you to discard a treatment which does actually have a beneficial effect, an effect which you and your tiny useless trial have failed to reveal.

The concept of statistical efficiency demands that the randomised controlled trial achieve its goal (discriminating the treatment effect) with the smallest possible number of patients. However, there is probably a minimum.

In randomised controlled trials, there is an additional benefit to randomisation which develops above a certain sample size (N=200). This is the benefit of randomization, which ensures an approximately equal distribution of unknown confounding factors (such as weird genetic variations and other such unpredictable things). In trials smaller than N=200, this effect of randomisation can no longer be relied upon- one simply cannot guarantee that one group is sufficiently similar to the other group in its incidence of unpredictable features.

Study power and sample size

Question 23 from the second paper of 2008  and Question 25 from the first paper of 2006 asked the candidates to define "study power".

The commonest reason for a negative result is a small sample size. You need a sufficiently large sample to detect a given size of effect. The large the sample, the more likely you are to detect a true treatment effect.

The point of calculating power is that you can use it to calculate a sufficient sample size, and not run the risk of performing a pointless negative study (thus exposing patients to risk), nor performing a pointlessly expensive study (collecting data from an unneccessarily large group of patients)

Power

  • The power of a statistical test is the probability that it correctly rejects the null hypothesis, when the null hypothesis is false.
  • The chance of a study demonstrating a "true" result
  • Power = (1 - false negative rate)
  • Power = (1- beta error)
  • Normally, power is 80% (i.e. a 20% chance of a false negative result)

Sample size

Calculation of sample size involves the following factors:

  • Alpha value: the level of significance (normally 0.05)
  • Beta-value: the power (normally 0.2)
  • The statistical test you plan to use
  • The variance of the population (the greater the variance, the larger the sample size)
  • The effect size (the smaller the effect size, the larger the required sample)

References

Cohen, Jacob. "Statistical power analysis." Current directions in psychological science 1.3 (1992): 98-101.

Cook, Richard J., and David L. Sackett. "The number needed to treat: a clinically useful measure of treatment effect." Bmj 310.6977 (1995): 452-454.

Viera, Anthony J. "Odds ratios and risk ratios: what's the difference and why does it matter?." Southern medical journal 101.7 (2008): 730-734.

Malenka, David J., et al. "The framing effect of relative and absolute risk."Journal of General Internal Medicine 8.10 (1993): 543-548.

Gail, Mitchell H., and Ruth M. Pfeiffer. "On criteria for evaluating models of absolute risk." Biostatistics 6.2 (2005): 227-239.