Qualitative data: Chi-square test and Fisher's Exact Test

The sadistic  Question 4 from the second paper of 2003 invided the candidates to "compare and contrast the use of the Chi-squared test, Fisher’s Exact Test and logistic regression when analysing data". This was a terrible idea, and the pass rate was 17%. Such questions have never been repeated since. 

Additional reading can be done, if one wishes to actually understand these concepts.

I recommend the following free online resources:

Additionally, I invite everybody to visit this page, where the author Steve Simon (presumably, somebody qualified in statistics) responds to an email he received which asked him to comment on the differences between a Chi-square test, Fisher's Exact test, and logistic regression.

Qualitiative data types

  • Categorical measurements based on descriptions, rather than numerical values.
  • Qualitative data comes in two flavours:
    • Ordinal data: numerical data assigned to subjective observations, which are ordered (eg. GCS scores)
    • Nominal data: Variables described in terms of quality, eg. colour of hair.
  • These are tested using the Chi-square and Fisher's Exact Test

 

Chi-square test

A statistical test commonly used to compare observed data with data we would expect to obtain according to a specific hypothesis. The chi-square test can be used to test for the "goodness to fit" between observed and expected data.

  • chi-square is the sum of the squared difference between 
    observed (o) and the expected (e) data: χ>2 χ(o-e)2/e
  • May be inappropriate if the sample numbers are small.
  • Cannot be calculated if the expected value in any category is less than 5.

Fisher's Exact Test

Another test like the Chi-square test, to compare observed data with expected data.

  • Used for small data sets (where Chi-square is useless)
  • Only applicable in a 2x2 contingency table

Logistic regression

  • Method of predicting a binary variable (eg. dead or alive) on the basis of numerous predictive factors, to compare observed and predicted data.
  • ICU mortality is predicted using logistic regression analysis
  • Regression coefficients allow the contribution of different predictor variables to be analysed.
  • Goodness of fit can be estimated using a variety of mathematic methods.
Chi Square, Fisher's Exact Test and Logistic Regression 
A Comparison of Methods
  Chi Square Fisher's Exact Test Logistic regression
Application "give a representation of the likelihood that a given spread of data occurs by chance"
Specific uses

Nominal data: large samples

Nominal data: small samples

Binary variables
Advantages
  • Able to analyse multiple tables and rows of data
  • Better suited to small data sets (sample size less than 20)
  • "Exact": does not rely on approximation
  • Useful to predict an outcome variable which is binary and categorical from predictor variables that are continuous
  • Used because having a categorical outcome variable violates the assumption of linearity in normal regression.
Limitations
  • Ineficient in handling ordinal data.
  • Cannot be calculated if the expected value in any category is less than 5.
  • Only suited to small data sets (sample size less than 20)
  • Computationally intense: calculations needed for this test increase rapidly as the sample size increases
  • Cannot adjust for possible confounding variables
  • Assumes an independence of errors
  • Assumes no outliers
  •  

References

The ideal reference for this is the BMJ, with their combination of rich statistics info and Old-World credibility. I link to the relevant sections of their Statistics at Square One, by T D V Swinscow.