Given a model \(\{S_x \space ; \space \Psi (\underline x | \theta ) \space ; \space S_\theta = \Theta \}\) then the parametric class \(D\) of distribution for \(\theta\) is said to be conjugate to the model if, choosing the prior in \(D\), the posterior also belongs to it for every value of \(\underline{x}\).
In formal terms it is said that:
\(\underline{x} \sim f(\underline{x}; \theta)\) with N i.i.d. trials
we have an induced model:
\[ \{S_x \space ; \space \Psi (\underline x | \theta ) \space ; \space S_\theta = \Theta \} \]
If \(f(\underline{x}; \theta)\) belongs to the exponential family:
\[ f(\underline{x} | \theta) = D(\underline{x}) \space \exp{ \{b\space(\theta)\space \space g\space(\underline{x}) - c \space( \theta) \} } \]
Then the prior will have a "standard" density function (belonging to the exponential family: e.g. Normal, Gamma, Poisson, ...):
\[ \pi(\theta) \propto \exp{ \{\eta_{\space 1} \space \space b \space ( \theta) - \eta_{\space 2} \space \space c \space ( \theta) \}} \space | \partial \space b \space ( \theta) / \partial \space \theta | \sim_{the \space kernel} standard \space density \space function \] Then if the posterior will also have a "standard" density function equal to that of the prior but with different parameters:
\[\pi(\underline{x} | \theta) = \frac{ f(\underline{x} | \theta) \pi(\theta)}{\int_{\Theta}{f(\underline{x} | \theta) \pi(\theta)}} \propto \space f(\underline{x} | \theta) \pi(\theta) \sim_{the \space kernel} standard \space density \space function\]
We have thus found the conjugate distribution to the model.
| Distribution name | Base model \(\space f(\underline{x} | \theta)\) | Conjugate class \(\pi(\theta)\) | Hyperparameter update \(\pi(\underline{x} | \theta)\) |
|---|---|---|---|
| Uniform | \(Uniform(0 , \theta)\) | \(Pareto(\alpha, \beta)\) | \(Pareto(\alpha+ n, max\{\beta, x_{(n)} \})\) |
| Bernoulli | \(\space Be(\theta)\) | \(Beta(\alpha, \beta)\) | \(Beta(\alpha + \sum x_i, \beta + n - \sum x_i)\) |
| Poisson | \(Pois(\alpha , \beta)\) | \(Gamma(\alpha , \beta)\) | \(Gamma(\alpha + \sum x_i , \beta + n)\) |
| Exponential | \(Exp(\theta)\) | \(Gamma(\alpha , \beta)\) | \(Gamma(\alpha + n , \beta + \sum x_i )\) |
| Exponential | \(Exp(\frac{1}{\theta})\) | \(GammaInv(\alpha , \beta)\) | \(GammaInv(\alpha + n , \beta + \sum x_i )\) |
| Normal | \(Normal( \mu , \sigma^2 = KNOWN)\) | \(Normal( \mu_0, \sigma^2_0)\) | \(Normal( \frac{\mu_0 + \sum x_i \space \sigma^2_0}{ \sigma^2_0 \space n + \sigma^2}, \frac{\sigma^2 \space \sigma^2_0}{ \sigma^2_0 \space n + \sigma^2})\) |
| Normal | \(Normal( \mu = KNOWN , \sigma^2)\) | \(GammaInv(\alpha , \beta)\) | \(GammaInv(\alpha + \frac{n}{2} , \beta + \frac{\sum (x_i - \mu)^2}{2})\) |
\(X \sim Be(\theta)\) with n independent trials where: \[ f(x; \theta) = \theta^x \space ( 1 - \theta)^x \] So the induced model becomes:
\[\big\{ x=\{0,1\}^{(n)} \space \space ; \space \space \theta^{\sum_{i=1}^{n}x_i} \space (1-\theta)^{n-\sum_{i=1}^{n}x_i} \space \space ; \space \space \Theta = (0,1) \big\}\]
The maximum likelihood estimate is:
\[MLE =\hat\theta = \frac{\sum_{i=1}^{n}x_i}{n}\]
And the prior connected to this model is a \(Beta(\alpha,\beta)\)
First we must rewrite the Bernoulli density function so that we can recognize the different components of the exponential family formula:
\[f(x; \theta) = \theta^x \space ( 1 - \theta)^x \\ f(x; \theta) = \exp{\{\log(\theta^x \space ( 1 - \theta)^x)\}} \\ f(x; \theta) = \exp{\{x \space \log(\frac{\theta}{ 1 - \theta}) + \space \log( 1 - \theta)\}}\]
Review on the exponential family:
\[X \sim Exponential \space family \\ \\ f(x; \theta) = D\space(x) \space \space \exp{\{b\space(\theta)\space \space g\space(x) + \space c\space(\theta)\}}\]
Then the conjugate prior to the distribution results in this case:
\[\pi(\theta) \propto \exp{ \{\eta_{\space 1} \space \space b \space ( \theta) - \eta_{\space 2} \space \space c \space ( \theta) \}} \space | \partial \space b \space ( \theta) / \partial \space \theta | \sim_{the \space kernel} Beta \space density \space function\]
Indeed:
\[b(\theta) = \log \bigg( \frac{\theta}{1-\theta} \bigg) \space\space\space\space\space (i.e. \space the \space logit \space of \space \theta)\\ c(\theta) = - \log(1 - \theta) \\ \bigg{|} \frac{\partial \space b \space ( \theta) }{ \partial \space \theta } \bigg{|}= \frac{1}{\theta \space (1-\theta)}\]
We create the conjugate prior:
\[\pi(\theta) \propto \exp{ \{\eta_{\space 1} \space \space b \space ( \theta) - \eta_{\space 2} \space \space c \space ( \theta) \}} \space | \partial \space b \space ( \theta) / \partial \space \theta | = \\ = \exp{ \{\eta_{\space 1} \space \space \log \bigg( \frac{\theta}{1-\theta} \bigg) + \eta_{\space 2} \space \space \log(1 - \theta) \}} \space \frac{1}{\theta \space (1-\theta)} = \\ = \exp{ \{ \space \space \log \bigg( \frac{\theta}{1-\theta} \bigg)^{\eta_{\space 1}} \}} \space \space \exp{ \{ \space \space \log(1 - \theta)^{\eta_{\space 2}} \}} \space \frac{1}{\theta \space (1-\theta)} = \\= \bigg( \frac{\theta}{1-\theta} \bigg)^{\eta_{\space 1}} \space \space (1 - \theta)^{\eta_{\space 2}} \space \frac{1}{\theta \space (1-\theta)} = \\ = \theta^{ \space \eta_{\space 1}} \space \space (1-\theta)^{-\eta_{\space 1}} \space \space (1 - \theta)^{\space \eta_{\space 2}} \space \space \theta^ {\space -1} \space \space (1-\theta)^ {\space -1} = \\ = \theta^{ \space \eta_{\space 1} -1} \space \space (1-\theta)^{\eta_{\space 2}-\eta_{\space 1}-1}\]
This is the kernel of a \(Beta(\eta_{\space 1}, \eta_{\space 2}-\eta_{\space 1})\)
\[\pi(\underline{x} | \theta) = \frac{ f(\underline{x} | \theta) \pi(\theta)}{\int_{\Theta}{f(\underline{x} | \theta) \pi(\theta)}} \propto \space f(\underline{x} | \theta) \pi(\theta) = \\= \space \space \theta^{\sum_{i=1}^{n}x_i \space + \alpha -1} \space (1-\theta)^{n-\sum_{i=1}^{n}x_i \space + \beta - 1}\]
This is the kernel of a \(Beta( \alpha+\sum_{i=1}^{n}x_i \space , \space \space n+ \beta-\sum_{i=1}^{n}x_i \space )\)
\(X \sim N(\mu= m,\theta= \sigma^2)\) (m = known) with n independent trials where: \[ f(x; \theta) = \frac{1}{\sqrt{2 \space \pi \space \sigma^2 }}\space \exp{\bigg\{-\frac{(x-m)^2}{2 \space \sigma^2}\bigg\}} \] So the induced model becomes:
\[\biggl\{ x=R_+^{(n)} \space \space ; \space \space \bigg( \frac{1}{\sqrt{2 \space \pi \space \sigma^2 }}\bigg)^n\space \exp{\bigg\{-\frac{\sum_{i=1}^{n}(x-m)^2}{2 \space \sigma^2}\bigg\}} \space \space ; \space \space \Theta = R_+ \biggl\}\]
The maximum likelihood estimate is:
\[MLE =\hat{\sigma^2} = \frac{\sum_{i=1}^{n}(x_i- \hat\mu)^2}{n}\]
And the prior connected to this model is an \(InverseGamma(\alpha,\beta)\)
First we must rewrite the Bernoulli density function so that we can recognize the different components of the exponential family formula:
\[f(x; \theta) \propto \frac{1}{\sqrt{\sigma^2 }}\space \exp{\bigg\{-\frac{(x-m)^2}{2 \space \sigma^2}\bigg\}} = \\=\exp{\bigg\{-\frac{(x-m)^2}{2 \space \sigma^2}-\frac{1}{2} \space \log(\sigma^2)\bigg\}} \]
Review on the exponential family:
\[X \sim Exponential \space family \\ \\ f(x; \theta) = D\space(x) \space \space \exp{\{b\space(\theta)\space \space g\space(x) + \space c\space(\theta)\}}\]
Then the conjugate prior to the distribution results in this case:
\[\pi(\theta) \propto \exp{ \{\eta_{\space 1} \space \space b \space ( \theta) - \eta_{\space 2} \space \space c \space ( \theta) \}} \space | \partial \space b \space ( \theta) / \partial \space \theta | \sim_{the \space kernel} Inverse \space Gamma \space density \space function\]
Indeed:
\[b(\sigma^2) = -\frac{1}{2 \space \sigma^2}\\ c(\sigma^2) = \frac{1}{2} \log(\sigma^2) \\ \bigg{|} \frac{\partial \space b \space ( \sigma^2) }{ \partial \space \sigma^2 } \bigg{|}= \frac{1}{(\sigma^2)^2}\]
We create the conjugate prior:
\[\pi(\sigma^2) \propto \exp{ \{\eta_{\space 1} \space \space b \space ( \sigma^2) - \eta_{\space 2} \space \space c \space ( \sigma^2) \}} \space | \partial \space b \space ( \sigma^2) / \partial \space \sigma^2 | = \\ = \exp{ \{-\eta_{\space 1} \space \space \frac{1}{2 \space \sigma^2} - \eta_{\space 2} \space \space \frac{1}{2} \log(\sigma^2) \}} \space \frac{1}{(\sigma^2)^2} = \\ = (\sigma^2)^{- \frac{1}{2} \eta_{\space 2}} \space (\sigma^2)^{-2}\exp{ \{- \space \space \frac{\eta_{\space 1}}{2 \space \sigma^2}\}} = \\ = (\sigma^2)^{-2- \frac{1}{2} \eta_{\space 2}}\exp{ \{- \space \space \frac{\eta_{\space 1}}{2 \space \sigma^2}\}}\]
This is the kernel of an \(InverseGamma(1+ \frac{1}{2} \eta_{\space 2}, \space \space \space \frac{\eta_{\space 1}}{2})\)
\[\pi(\underline{x} | \sigma^2) = \frac{ f(\underline{x} | \sigma^2) \pi(\sigma^2)}{\int_{\sigma^2}{f(\underline{x} | \sigma^2) \pi(\sigma^2)}} \propto \space f(\underline{x} | \sigma^2) \pi(\sigma^2) = \\= \bigg( \frac{1}{\sqrt{2 \space \pi \space \sigma^2 }}\bigg)^n\space \exp{\bigg\{-\frac{\sum_{i=1}^{n}(x-m)^2}{2 \space \sigma^2}\bigg\}} \space \space (\sigma^2)^{-(\alpha +1)} \space \exp{\bigg\{-\frac{\beta}{\sigma^2}\bigg\}} \propto \\ \propto (\sigma^2)^{\frac{n}{2}}\space \exp{\bigg\{-\frac{\sum_{i=1}^{n}(x-m)^2}{2 \space \sigma^2}\bigg\}} \space \space (\sigma^2)^{-(\alpha +1)} \space \exp{\bigg\{-\frac{\beta}{\sigma^2}\bigg\}} = \\ = (\sigma^2)^{(-\alpha - \frac{n}{2} -1)} \space \space \exp{\bigg\{-\frac{\beta +\sum_{i=1}^{n}(x-m)^2}{2 \space \sigma^2}\bigg\}} \]
This is the kernel of an \(InverseGamma \bigg(\alpha + \frac{n}{2}, \space \space \space \beta +\frac{\sum_{i=1}^{n}(x-m)^2}{2} \bigg)\)