Fisher's exact test


In-depth Articles

Fisher's exact test

Fisher's exact test is a hypothesis testing test used in non-parametric statistics in situations with two dichotomous nominal variables and small samples, named after its creator Ronald Fisher.

This non-parametric test is used to verify whether the dichotomous data from two samples summarized in a 2X2 contingency table are compatible with the null hypothesis (\(H_0\)) that the original populations of the two samples have the same dichotomous subdivision and that the differences observed with the sample data are simply due to chance.

If the samples are sufficiently large (but no cell has a value less than 5) then the chi-square test with 1 degree of freedom can be used. While the latter test is exact only asymptotically for very large sample sizes, the present test proposed by Fisher is, as the name says, always exact, so it is suitable for small sample sizes.

Fisher's exact test requires having two nominal variables each divided into only two categories.

(Source)

Based on what was said above, let's look at an example and start by asking ourselves a question: "Do jelly beans (American candies) cause acne?"

jellybeans<-read.csv("https://giacomosaccaggi.github.io/jellybeans.csv")


n0 = sum(jellybeans$treatment=="jellybean")
n1 = sum(jellybeans$treatment=="placebo")
y = sum(jellybeans$acne[jellybeans$treatment=="jellybean"])
s = sum(jellybeans$acne) 

tab = matrix(c(
  y, n0-y,
  s-y, n1-s+y
), ncol=2, byrow=TRUE)
colnames(tab) = c("acne","noacne")
rownames(tab) = c("jellybean","placebo")
addmargins(tab)
##           acne noacne   Sum
## jellybean  988   4012  5000
## placebo    961   4039  5000
## Sum       1949   8051 10000

… now let's evaluate if the two random variables have the same dichotomous subdivision, that is, if the subjects who have acne are the same ones who eat jelly beans candies

Acne no acne Sum
Jelly beans \(Y\) \(N_{1} - Y\) \(N_{1}\)
Placebo \(S-Y\) \(N_{0} - S + Y\) \(N_{0}\)
Sum \(S\) \(N_{1} + N_{0}-S\) \(N_{1} + N_{0}\)

\[ = \frac{{N_{1}}! {N_{0}}! {S}! ({N_{1}}+ {N_{0}}- {S}!)} { Y!(S-Y)!({N_{1}}-Y)!({N_{0}}-S+Y)! }\]

plot(0:s,dhyper(0:s, m=n1, n=n0, k=s), type="h", xlab="y", ylab="Probability")
points(y,0,col=2,pch=19)

fisher.test(tab, alternative="greater")$p.value
## [1] 0.2557994

This result expresses the probability that the two populations do not have the same dichotomous subdivision; in fact, if I do the same test on acne only (it doesn't make sense but it's just to understand) I get:

fisher.test(table(jellybeans$acne,jellybeans$acne), alternative="greater")$p.value
## [1] 0

In-depth study: Tetrachoric correlation

In the case of ordinal variables with two categories (as in this case), the tetrachoric correlation coefficient is often used. This coefficient estimates the correlation between the two variables assuming that their dichotomous nature derives from the discretization of a process that is actually continuous. This means that the phenomenon subjected to measurement would by its nature be measured as a continuous variable, but for measurement purposes it has been reduced to only two values, i.e. dichotomized. (Source1;Source2)

library(psych)
tr<-ifelse(jellybeans$treatment=="jellybean",1,0)
tetrachoric(data.frame(jellybeans$acne,tr))
## Call: tetrachoric(x = data.frame(jellybeans$acne, tr))
## tetrachoric correlation 
##                 jlly. tr  
## jellybeans.acne 1.00      
## tr              0.01  1.00
## 
##  with tau of 
## jellybeans.acne              tr 
##            0.86            0.00