Usually some of the parameters of our state space models are unknown and must be estimated using the data. If the assumption of Gaussianity of the system disturbances is plausible, then the Kalman filter allows us to calculate the likelihood for any set of parameter values to be estimated and, therefore, it is possible to estimate the unknown parameters by numerically maximizing the (log) likelihood with respect to the parameters. If the Gaussianity assumption is not supported by the data, then maximum likelihood estimates are impossible or very complex to obtain. However, maximizing the (log) Gaussian likelihood even when the data are not normal leads to estimates whose asymptotic behavior is still known (under regularity conditions). Such estimates are called quasi-maximum likelihood (QML) estimates or pseudo-maximum likelihood estimates.
In general, when numerical maximum likelihood estimates must be produced, one must proceed as follows:
1. write an objective function, for example objective(pars, data) which, for each value of the parameter vector to be estimated pars and given the data data, returns the value of the log-likelihood;
2. assign initial values to the unknown parameters and pass them together with the objective function to an optimizer function (in R usually the optim() function is used, but there are others) which uses various algorithms (for our problems Newton or quasi-Newton type algorithms are used, often an algorithm called BFGS is used) to update the parameters to be estimated until an (approximate) maximum of the objective function is reached.
It is important to remember that under regularity conditions, the asymptotic behavior of maximum likelihood estimates is known (for asymptotic behavior use: Delta Method and Fisher Information).
For estimation there are two alternatives. The first, more artisanal in implementation, allows computationally more efficient estimates. The second is more convenient, but slightly slower in execution.
(Honestly I don't know why they differ [the difference is minimal] they should be exactly identical. I prefer the second for convenience and speed of implementation. That said, in some situations such as when a stochastic cycle is present, it is still necessary to use the more artisanal one)
For our purposes the only formal argument of the logLik(object) function is an object of class SSModel, i.e., the KFAS representation of a state space model. This function returns the log-likelihood of the state space model. Obviously, if some parameters of the model are to be estimated we must create an objective function that takes the parameters to be estimated and returns the value of the log-likelihood. This function must then be numerically maximized.
Let's take the Nile time series as an example and estimate a random walk plus noise model (N.B. read carefully the code and comments contained within).
Let's start by loading the library and data and creating the RW + WN model for the Nile:
library(KFAS)
data(Nile)
# State space model
modello1 <- SSModel(Nile~0+SSMtrend(1, NA), H = NA)
I create a function that estimates the three variances of the LLT + WN model:
fit_RW_plus_WN <- function(model, pars.init = rep(log(var(diff(model$y))), 2)) {
objective <- function(pars) {
# We create the objective function with parameters ordered as follows
# c(log(var(eta)), log(var(zeta)), log(var(epsilon)))
model$Q[1, 1, 1] <- exp(pars[1]) # taking the exp the variances will be positive
model$H[1, 1, 1] <- exp(pars[2])
-logLik(model) # change sign because optim() minimizes
}
# We minimize the objective function (using the BFGS algorithm)
# and ask to calculate the hessian at the maximum point
optout <- optim(pars.init, objective, method = "BFGS", hessian = TRUE)
param_estim <- exp(optout$par) # maximum likelihood estimate of variances
mI <- solve(optout$hessian) # inverse of Fisher information estimate
mV <- diag(param_estim) %*% mI %*% diag(param_estim) # delta method in action
sterr <- sqrt(diag(mV)) # vector of standard errors
# Give names to parameters
names(param_estim) <- names(sterr) <-colnames(mV) <- rownames(mV) <- c("Q", "H")
list(hatparameters = param_estim, stderr = sterr, cov = mV, logLik = -optout$value, optim.out = optout)
}
Let's now see our function in action.
fit1 <- fit_RW_plus_WN(modello1)
# first evaluate convergence: if 0 it's good
fit1$optim.out$convergence
## [1] 0
# look at parameter estimates:
print(fit1$hatparameters)
## Q H
## 1469.163 15098.651
# stderr of estimates:
print(fit1$stderr)
## Q H
## 1280.358 3145.560
KFAS offers a parameter estimation method through the fitSSM function.
fitSSM(model, inits, updatefn) requires three formal parameters:
The fitSSM() function returns a list with optim.out, the output of the optim() function used internally to maximize the log-likelihood, and model containing the state space model with the estimated parameters (as we did in the previous function).
updt <- function(pars, model) {
model$Q[1, 1, 1] <- exp(pars[1])
model$H[1, 1, 1] <- exp(pars[2])
model
}
# these two are equal
# updt <- function(pars, model) {
# model$Q[[1]] <- exp(pars[1])
# model$H[[1]] <- exp(pars[2])
# model
# }
fit2 <- fitSSM(modello1, rep(log(var(diff(Nile))), 2), updt)
# evaluate optimization convergence: if 0 it's good
fit2$optim.out$convergence
## [1] 0
# Let's see the estimates in the matrices
message("Q");fit2$model$Q[[1]];message("H");fit2$model$H[[1]]
## Q
## [1] 1466.32
## H
## [1] 15098.18
# stderr of estimates:
fit3 <- fitSSM(modello1, rep(log(var(diff(Nile))), 2), hessian = TRUE)
V <- solve(fit3$optim.out$hessian)
GVG <- diag(exp(fit3$optim.out$par)) %*% V %*% diag(exp(fit3$optim.out$par))
stderr <- sqrt(diag(GVG))
message("Q stderr");stderr[1];message("H stderr");stderr[2]
## Q stderr
## [1] 1279.011
## H stderr
## [1] 3145.357
Note that when the formal parameter updatefn is not specified, the fitSSM() function uses as default a function that assumes that the parameters to be estimated are the variances left at NA on the diagonals of matrices Q and H. To ensure positivity of variances, this function assumes that the parameters in pars must be transformed with \(\exp(\theta)\).
fit3 <- fitSSM(modello1, rep(log(var(diff(Nile))), 2))
# evaluate optimization convergence: if 0 it's good
fit3$optim.out$convergence
## [1] 0
# Let's see the estimates in the matrices
message("Q");fit3$model$Q;message("H");fit3$model$H
## Q
## , , 1
##
## [,1]
## [1,] 1466.32
## H
## , , 1
##
## [,1]
## [1,] 15098.18
# stderr of estimates:
fit3 <- fitSSM(modello1, rep(log(var(diff(Nile))), 2), hessian = TRUE)
V <- solve(fit3$optim.out$hessian)
GVG <- diag(exp(fit3$optim.out$par)) %*% V %*% diag(exp(fit3$optim.out$par))
stderr <- sqrt(diag(GVG))
message("Q stderr");stderr[1];message("H stderr");stderr[2]
## Q stderr
## [1] 1279.011
## H stderr
## [1] 3145.357