Classical (or frequentist) inference (born in the '20s with Fisher): \(\epsilon\) is a random or chance experiment whose result is unknown, making n independent trials I obtain \(x = (x_1,x_2,...,x_n)\), an n-tuple sample that contains the sample information, to which is added the principle of repeated sampling that allows me to pass from the estimation of a number to the estimator \(x \rightarrow X\).
Bayesian inference instead is more recent (50s, 60s with Bayes): sample information \(x\) is still used to which pre-experimental information is added, that is, you have something more before doing the experiment (I use the likelihood principle).
The Bayesian approach to inference is acquiring an increasingly important role in statistical literature: in fact, the number of data processing and studies in the medical-health, economic-financial, socio-political and experimental sciences fields is continuously increasing (think for example of the Bayesian model designed for gravitational waves).
The success of this approach began more or less in the 90s of the last century because: Bayesian logic is coherent and quite intuitive, over the years statistical applications have gradually increased in which there is a need to take into account extra-experimental information but the main reason that led to a recent surge in the use of these methods was undoubtedly the enormous development of new computational methodologies that allow analyzing very complex and computationally expensive statistical models.
To introduce the main concepts (Prior and Posterior) let's immediately make an example taken from the book: Introduction to Edwards, Lindman, and Savage (1963) Bayesian Statistical Inference for Psychological Research.
The sample with independent trials is \(\underline{x} = \{1,1, \space ... \space , 1 \}\) with \(n=10\), so the maximum likelihood estimate is the sample mean: \[\hat{\theta}= \frac{\sum{x_i}}{n} = \frac{10}{10} = 1 \] it is very likely to think that the unknown parameter is 1. Savage in his book states that before the experiment everyone has their own idea about what the value of \(\theta\) can be.
To add pre-experimental information to the model I pass from the likelihood function \( f ( x | \theta ) \)to the conditional law.
The law of variable \(\theta\) is called prior law (or prior) \( \pi(\theta)\), from which the joint distribution law can be derived:
\[\Psi (\underline x | \theta ) = f(\underline{x} | \theta) \pi(\theta) \]
Then the induced statistical model is: \[ \{S_x \space ; \space \Psi (\underline x | \theta ) \space ; \space S_\theta = \Theta \} \]
To calculate the posterior, that is the posterior distribution law, I use the joint and Bayes' theorem:
\[\pi(\underline{x} | \theta) = \frac{\Psi (\underline x | \theta )}{\int_{\Theta}{f(\underline{x} | \theta) \pi(\theta)}} = \frac{ f(\underline{x} | \theta) \pi(\theta)}{\int_{\Theta}{f(\underline{x} | \theta) \pi(\theta)}} = c \space f(\underline{x} | \theta) \pi(\theta) \]
Where \(c\) corresponds to the normalization constant while the rest constitutes the kernel of the posterior.