Compare and contrast the use of the Chi-squared test, Fisher’s Exact Test and logistic regression when analysing data.
College Answer
All these tests are widely used in the statistical reporting of data and give a representation of the likelihood that a given spread of data occurs by chance.
The Chi-square(d) statistic is used when comparing categorical data (e.g. counts). Often, these data are simply displayed in a “contingency table” with R rows and C columns. It’s use is less appropriate where total numbers are small (e.g. N <20) or smallest expected value is less than 5.
Fisher’s Exact test is used when comparing categorical data (e.g. counts), but is only generally applicable in a 2 x 2 contingence table (2 columns and 2 rows). It is specifically indicated when total numbers are small (e.g. N <20) or smallest expected value is less than 5.
Logistic regression is used when comparing a binary outcome (e.g. yes/no, lived/died) with other potential variables. Logistic regression is most commonly used to perform multivariable analysis (“controlling for” various factors), and these variables can be either categorical (e.g. gender), orcontinuous (e.g. weight), or any combination of these. The standard ICU mortality predictions are based on logistic regression analysis.
Discussion
When one is invited to "compare and contrast" things, one is well served by a table structure.
First, the prose form: much of what follows is heavily borrowed from LITFL.
Additional reading can be done, if one wishes to actually understand these concepts.
I recommend the following free online resources:
- The Chi Square Statistic from The Mathbeans Project
- Fisher's Exact Test from Wolfram Mathworld
- What is (Multivariate) Logistic Regression from LogisticRegressionAnalysis.com (which is an excellent name).
Additionally, I invite everybody to visit this page, where the author Steve Simon (presumably, somebody qualified in statistics) responds to an email he received which asked him to comment on the differences between a Chi-square test, Fisher's Exact test, and logistic regression.
Chi-square test
A statistical test commonly used to compare observed data with data we would expect to obtain according to a specific hypothesis. The chi-square test can be used to test for the "goodness to fit" between observed and expected data.
- chi-square is the sum of the squared difference between
observed (o) and the expected (e) data: χ>2= χ(o-e)2/e - May be inappropriate if the sample numbers are small.
- Cannot be calculated if the expected value in any category is less than 5.
Fisher's Exact Test
Another test like the Chi-square test, to compare observed data with expected data.
- Used for small data sets (where Chi-square is useless)
- Only applicable in a 2x2 contingency table
Logistic regression
- Method of predicting a binary variable (eg. dead or alive) on the basis of numerous predictive factors, to compare observed and predicted data.
- ICU mortality is predicted using logistic regression analysis
- Regression coefficients allow the contribution of different predictor variables to be analysed.
- Goodness of fit can be estimated using a variety of mathematic methods.
Now that the prose is finished, let us tabulate the differences and similarities between these tests.
Chi Square | Fisher's Exact Test | Logistic regression | |
Application | "give a representation of the likelihood that a given spread of data occurs by chance" | ||
Specific uses |
Nominal data: large samples |
Nominal data: small samples |
Binary variables |
Advantages |
|
|
|
Limitations |
|
|
|
References
The ideal reference for this is the BMJ, with their combination of rich statistics info and Old-World credibility. I link to the relevant sections of their Statistics at Square One, by T D V Swinscow.