With respect to a clinical trial, define Type 1 and Type 2 Error, the Point Estimate and Confidence Interval (40% marks). Discuss the relevance of Power to study design and interpretation of the results (60% marks)

Most candidates correctly distinguished between a type 1 and type 2 errors. Type I error is:
known as α and is when you incorrectly reject the null hypothesis – that is say a difference exists when it actually doesn’t. Type II error is: when you incorrectly accept the null hypothesis, is termed β, that is to say no difference exists when it actually does. The point estimate: Is a single value estimate of a population parameter. It represents a descriptive statistic for a summary measure, or a measure of central tendency from a given population sample. Confidence intervals: define a range of values that are likely to include a population parameter. They are derived from the standard error for a given population. The percentage given, eg. 95 % reflects the probability that the true value will be contained within that interval.

Few candidates gave a definition for power or a value. Power is the likelihood of detecting a
specified difference if it exists. It is important as it is a key determinant of the Sample size
required for a study and this is a vital aspect of experimental design and evaluation. A sample size can to be too small – so can’t adequately rule in / out an effect (ie. the study will lack the precision to provide reliable answers). If too large, a study will enrol unnecessary subjects to an experiment and this wastes time uses excess resources. It is unethical to conduct studies in both these cases.

## Discussion

This college answer possesses many of the characteristic of an "ideal" college comment, i.e. it is informative with regards to what was expected, and resembles a "model answer" in its content.

To answer the question in some detail:

Type 1 error

• This is a "false positive".
• The null hypothesis is incorrectly rejected (i.e. there really is no treatment effect, but the study finds one)
• The alpha value determines the risk of this happening. An alpha value of 0.05 - same as the p-value - so there is a 5% chance of making a Type 1 error.

Type 2 error

• This is a "false negative"
• The null hypothesis is incorrectly accepted (i.e. there really is a treatment effect, but you fail to find it)
• The (1-beta) determines the risk of this happening. Beta is 0.8 (the power of the study) - so at a beta of 0.8, there is a 20% chance of making a Type 2 error.

Point estimate

• Point estimate is a single value given as an estimate of a parameter of a population.  For example, in a population there exists a "true" average height; the point estimate of this average height is the average height of a sample group taken from that population.

Confidence interval

• In their answer to Fellowship Exam Question 23 from the second paper of 2011, the examiners defined CI as follows:
"The confidence intervals indicate the level of certainty that the true value for the parameter of interest lies between the reported limits. For example, the 95% confidence intervals for a value indicate a range where, with repeated sampling and analysis, these intervals would include the true value 95% of the time."

Power

• The power of a statistical test is the probability that it correctly rejects the null hypothesis, when the null hypothesis is false.
• The chance of a study demonstrating a "true" result
• Power = (1 - false positive rate)
• Power = (1- beta error)
• Normally, power is 80% (i.e. a 20% chance of a false negative result)

The influence of power on study design

• Power is a major determinant of sample size.
• If your trial has too few patients, you are more likely to commit a Type 2 error.
• If your trial has too many patients, it is inefficiently wasteful. The concept of statistical efficiency demands that the randomised controlled trial achieve its goal (discriminating the treatment effect) with the smallest possible number of patients.

The influence of power on study interpretation

• If a study is said to be "underpowered", its sample was too small to detect the difference which it purports to detect. A study may become underpowered if the expected difference in outcomes is larger tha the actual difference; for example, a treatment study which expected a high mortality from sepsis would become underpowered if the mortality from sepsis turned out to be unexpectedly low. In such a situation, the findings lose their statistical significance and validity.

### References

Higgins, Julian PT, and Sally Green, eds. Cochrane handbook for systematic reviews of interventions. Vol. 5. Chichester: Wiley-Blackwell, 2008.

Morris, Julie A., and Martin J. Gardner. "Statistics in Medicine: Calculating confidence intervals for relative risks (odds ratios) and standardised ratios and rates." British medical journal (Clinical research ed.) 296.6632 (1988): 1313.

Campbell, Michael J., and Martin J. Gardner. "Statistics in Medicine: Calculating confidence intervals for some non-parametric analyses." British medical journal (Clinical research ed.) 296.6634 (1988): 1454.

Cohen, Jacob. "Statistical power analysis." Current directions in psychological science 1.3 (1992): 98-101.