Estimation of unknown parameters in UCM models in KFAS


In-depth Articles

Estimation of unknown parameters in UCM models in KFAS

Usually some of the parameters of our state space models are unknown and must be estimated using the data. If the assumption of Gaussianity of the system disturbances is plausible, then the Kalman filter allows us to calculate the likelihood for any set of parameter values to be estimated and, therefore, it is possible to estimate the unknown parameters by numerically maximizing the (log) likelihood with respect to the parameters. If the Gaussianity assumption is not supported by the data, then maximum likelihood estimates are impossible or very complex to obtain. However, maximizing the (log) Gaussian likelihood even when the data are not normal leads to estimates whose asymptotic behavior is still known (under regularity conditions). Such estimates are called quasi-maximum likelihood (QML) estimates or pseudo-maximum likelihood estimates.

In general, when numerical maximum likelihood estimates must be produced, one must proceed as follows:
1. write an objective function, for example objective(pars, data) which, for each value of the parameter vector to be estimated pars and given the data data, returns the value of the log-likelihood;
2. assign initial values to the unknown parameters and pass them together with the objective function to an optimizer function (in R usually the optim() function is used, but there are others) which uses various algorithms (for our problems Newton or quasi-Newton type algorithms are used, often an algorithm called BFGS is used) to update the parameters to be estimated until an (approximate) maximum of the objective function is reached.



It is important to remember that under regularity conditions, the asymptotic behavior of maximum likelihood estimates is known (for asymptotic behavior use: Delta Method and Fisher Information).

For estimation there are two alternatives. The first, more artisanal in implementation, allows computationally more efficient estimates. The second is more convenient, but slightly slower in execution.

(Honestly I don't know why they differ [the difference is minimal] they should be exactly identical. I prefer the second for convenience and speed of implementation. That said, in some situations such as when a stochastic cycle is present, it is still necessary to use the more artisanal one)

First method

For our purposes the only formal argument of the logLik(object) function is an object of class SSModel, i.e., the KFAS representation of a state space model. This function returns the log-likelihood of the state space model. Obviously, if some parameters of the model are to be estimated we must create an objective function that takes the parameters to be estimated and returns the value of the log-likelihood. This function must then be numerically maximized.

Let's take the Nile time series as an example and estimate a random walk plus noise model (N.B. read carefully the code and comments contained within).

Let's start by loading the library and data and creating the RW + WN model for the Nile:

library(KFAS) 
data(Nile)

# State space model 
modello1 <- SSModel(Nile~0+SSMtrend(1, NA), H = NA)

I create a function that estimates the three variances of the LLT + WN model:

fit_RW_plus_WN <- function(model, pars.init = rep(log(var(diff(model$y))), 2)) {
  objective <- function(pars) { 
    
    # We create the objective function with parameters ordered as follows 
    # c(log(var(eta)), log(var(zeta)), log(var(epsilon)))
   model$Q[1, 1, 1] <- exp(pars[1]) # taking the exp the variances will be positive 
   model$H[1, 1, 1] <- exp(pars[2]) 
   -logLik(model) # change sign because optim() minimizes 
   }
   
  # We minimize the objective function (using the BFGS algorithm) 
  # and ask to calculate the hessian at the maximum point 
  optout <- optim(pars.init, objective, method = "BFGS", hessian = TRUE)
  param_estim <- exp(optout$par) # maximum likelihood estimate of variances 
  mI <- solve(optout$hessian) # inverse of Fisher information estimate 
  mV <- diag(param_estim) %*% mI %*% diag(param_estim) # delta method in action 
  sterr <- sqrt(diag(mV)) # vector of standard errors
  # Give names to parameters 
  names(param_estim) <- names(sterr) <-colnames(mV) <- rownames(mV) <- c("Q", "H")
  
  list(hatparameters = param_estim, stderr = sterr, cov = mV, logLik = -optout$value, optim.out = optout)
}

Let's now see our function in action.

fit1 <- fit_RW_plus_WN(modello1)
# first evaluate convergence: if 0 it's good 
fit1$optim.out$convergence
## [1] 0
# look at parameter estimates:
print(fit1$hatparameters)
##         Q         H 
##  1469.163 15098.651
# stderr of estimates:
print(fit1$stderr)
##        Q        H 
## 1280.358 3145.560

Second method

KFAS offers a parameter estimation method through the fitSSM function.

fitSSM(model, inits, updatefn) requires three formal parameters:

  • model is an object of class SSModel containing the state space model,
  • inits is a vector with the initial values of the parameters to be estimated and updatefn is a function that inserts the parameters to be estimated into the state space matrices and must respect the following specifications.
  • updatefn in this parameter we must insert a function with two parameters: pars and model. In the first we declare the parameters to be estimated (usually they are set to exponential because the series is often analyzed with logarithm but it is not necessary to do both transformations); the second refers to the SSModel model.

The fitSSM() function returns a list with optim.out, the output of the optim() function used internally to maximize the log-likelihood, and model containing the state space model with the estimated parameters (as we did in the previous function).

updt <- function(pars, model) { 
  model$Q[1, 1, 1] <- exp(pars[1])
  model$H[1, 1, 1] <- exp(pars[2])
  model 
}
# these two are equal
#  updt <- function(pars, model) { 
#    model$Q[[1]] <- exp(pars[1])
#    model$H[[1]] <- exp(pars[2])
#    model 
#  }

fit2 <- fitSSM(modello1, rep(log(var(diff(Nile))), 2), updt)

# evaluate optimization convergence: if 0 it's good 
fit2$optim.out$convergence
## [1] 0
# Let's see the estimates in the matrices 
message("Q");fit2$model$Q[[1]];message("H");fit2$model$H[[1]]
## Q
## [1] 1466.32
## H
## [1] 15098.18
# stderr of estimates:
fit3 <- fitSSM(modello1, rep(log(var(diff(Nile))), 2),  hessian = TRUE)
V <- solve(fit3$optim.out$hessian) 
GVG <- diag(exp(fit3$optim.out$par)) %*% V %*% diag(exp(fit3$optim.out$par)) 
stderr <- sqrt(diag(GVG)) 

message("Q stderr");stderr[1];message("H stderr");stderr[2]
## Q stderr
## [1] 1279.011
## H stderr
## [1] 3145.357

Note that when the formal parameter updatefn is not specified, the fitSSM() function uses as default a function that assumes that the parameters to be estimated are the variances left at NA on the diagonals of matrices Q and H. To ensure positivity of variances, this function assumes that the parameters in pars must be transformed with \(\exp(\theta)\).

fit3 <- fitSSM(modello1, rep(log(var(diff(Nile))), 2)) 

# evaluate optimization convergence: if 0 it's good 
fit3$optim.out$convergence
## [1] 0
# Let's see the estimates in the matrices 
message("Q");fit3$model$Q;message("H");fit3$model$H
## Q
## , , 1
## 
##         [,1]
## [1,] 1466.32
## H
## , , 1
## 
##          [,1]
## [1,] 15098.18
# stderr of estimates:
fit3 <- fitSSM(modello1, rep(log(var(diff(Nile))), 2),  hessian = TRUE)
V <- solve(fit3$optim.out$hessian) 
GVG <- diag(exp(fit3$optim.out$par)) %*% V %*% diag(exp(fit3$optim.out$par)) 
stderr <- sqrt(diag(GVG)) 

message("Q stderr");stderr[1];message("H stderr");stderr[2]
## Q stderr
## [1] 1279.011
## H stderr
## [1] 3145.357