In Tutorial I and II, we mainly focus on the use of MOGP model for isotropic data, or when all \(L\) populations have the same set of inputs. In this tutorials, we will illustrate how to generalize this framework when we have missing data or when populations have different set of inputs.

1 Packages and user-defined functions

Loading all packages into the current working directory:

library(DiceKriging)
library(rgenoud)
library(kergp) 
library(nloptr)
library(data.table)

DiceKriging: kriging methods for single-output GP (SOGP).
rgenoud: genetic optimization solver used for Maximum Likelihood Estimation of GP hyperparameters.
kergp: kriging methods for user-defined kernel, needed for both Full-rank and ICM Multi-output GP.
nloptr: solve nonlinear optimization using R interface to NLopt.
data.table: extension of data.frame to enhance data manipulation.

Loading all user-defined functions:

source("createMortData.R")
source("covICM_Notched.R")
source("gp_predictionNotched.R")

\(\color{blue}{\texttt{createMortData.R}}\): to import datasets and select populations prior to model fitting. We can specify the age groups, calendar years, and whether both Males and Females are modeled together. All the datasets are downloaded from the Human Mortality Database. Currently, we have 16 European countries (seperated by gender) in Death Count and Exposures folders. These countries are:
- Austria
- Belarus
- Czech
- Denmark
- Estonia
- France
- Germany
- Hungary
- Latvia
- Lithuania
- Netherlands
- Poland
- Spain
- Sweden
- Switzerland
- UK

\(\color{blue}{\texttt{covICM_Notched.Rda}}\): to define ICM with partially heterotropic data where only a portion of \(L\) inputs are available. This is useful as the reported data from different countries in the HMD arrives non-synchronously. For instance, as of March 2020 some countries already have 2018 data added, most have data up to 2017, and a few are still lagging and only have data up to 2016.
Let \(M'\) be the number of distinct inputs across \(L\) populations and \(M=N_1+...+N_L\) be the number of observations in training data. We consider the setting that \(M'<ML\) so that for some inputs, not all \(L\) outputs are observed. Define the vector-valued “complete data” function \(\mathbf{f(x)}\), with \(\mathbf{f(x)} \in \mathbb{R}^{LM' \times 1}\). We further introduce \(\mathbf{f}^o\mathbf{(x)}\) as the vector-valued function corresponding to the observed outputs, \(\mathbf{f}^o\mathbf{(x)} \in \mathbb{R}^{M \times 1}\). The relation between \(\mathbf{f(x)}\) and \(\mathbf{f}^o\mathbf{(x)}\) is formulated through the “communication” matrix \(S\), \(\mathbf{f}^o\mathbf{(x)} = S^T\mathbf{f(x)}\), where \(S \in \mathbb{R}^{LM' \times M}\). The column vectors in \(S\) are orthonormal with values of 0 and 1 to eliminate the unobserved outputs. Applying linear transformation to a MVN vector, we can then identify the distribution of \(\mathbf{\mathbf{f}^o\mathbf{(x)}}\) as a GP with covariance: \[\text{Cov}(\mathbf{f}^o\mathbf{(x)},\mathbf{f}^o\mathbf{(x')})=S^T \text{Cov}(\mathbf{f(x)},\mathbf{f(x')})S = S^T(B \otimes K)S,\] recovering the Kronecker structure.
R function to create the transpose of the communication matrix, \(S^T\):

## create ST (to transform the vector of latent functions to observed functions)
createST = function(df){
  unq.loc = unique(df[,1:2]) # loc = (age,year)
  npop = length(unique(df[,3])) # integer coding for different populations
  vec.pop = rep(1:npop,each=nrow(unq.loc))
  
  # re-create a full data:
  full = cbind(do.call("rbind", replicate(npop, unq.loc, simplify = FALSE)), vec.pop)
  
  lab.notch = paste(df[,1],df[,2],df[,3],sep="")
  lab.full = paste(full[,1],full[,2],full[,3],sep="")
  
  out = outer(lab.notch,lab.full,"==")
  out = ifelse(out==T,1,0)
  
  return(out)
}

For example, we have the “observed” data for two populations (coded as 1 and 2). Population 2 has a missing data at the coordinate (Age,Year)=(71,1991).

Observed data
	age	year	popN
r1	70	1990	1
r2	71	1991	1
r3	70	1990	2

Complete data
	age	year	popN
r1	70	1990	1
r2	71	1991	1
r3	70	1990	2
r4	71	1991	2

Thus, the transpose of the communication matrix \(S^T\) is:

   r1 r2 r3 r4
r1  1  0  0  0
r2  0  1  0  0
r3  0  0  1  0

\(\color{blue}{\texttt{gp_predictionICM.R}}\): to perform prediction. Outputs contain the predicted mean and the posterior variance at new inputs \(\mathbf{x_*}\) for the latent function \(\mathbf{f_*}\) and observed outputs \(\mathbf{y_*}\). The cross-covariance matrix between two sets of locations are also provided.

Below, we write a function to create integer coding for populations in MOGP. This is equivalent to create dummy variables for categorical input.

intCtry = function(ctry){
  # ctry: list of countries (or populations) in the model
  subdt = mortData[country %in% ctry] 
  ctry = ctry[order(ctry)]
  for (i in 1:length(ctry)){
    subdt[country==ctry[i], popN:=i]
  }
  return(subdt)
}

2 Two-population ICM Model

For notched setup, let’s pretend that Hungary has not yet had 2016 data while Estonia already does. The goal is to build a joint model for Hungary and Estonia and apply the model to predict 2016 log-mortality rates in Hungary. We then compare the prediction performance of this model, in terms of SMAPE and the standard deviation of \(\mathbf{f_*}\) with ones in the single model for Hungary, using data in 1990–2015.

2.1 Fitting 2-population ICM model

The initial step is to prepare an aggregated data for Hungarian and Estonian mortality for Males, on Ages 70–84.

mortData = createMortData(year_start=1990, year_end=2016, 
                          age_start=70, age_end=84, 
                          sex="m", sex_cat="no")

ctry = c("Estonia","Hungary")
mortData = mortData[country %in% ctry]
mortData = intCtry(ctry)
mortData = mortData[-which(country=="Hungary" & year==2016)] # remove 2016 Hungarian data

The output vector \((y)\) and input vector \((X)\) must be provided:

y = mortData[,y]
X = mortData[,.(age,year,popN)]
X = as.matrix(X,ncol=3)
X = as.data.frame(X)
names(X) = c("x1","x2","x3")

Prior to fitting the model, we need to specify the number of populations, choice of rank, and more importaly provide the transpose of the communication matrix, \(S^T\). Similar to Part I and Par II, the function \(\color{blue}{\texttt{gp()}}\) can be called to fit the model. Specifically, we define the ICM kernel through the \(\color{blue}{\texttt{covICM_Notched()}}\) function with \(\color{blue}{\texttt{rank = 2}}\) and \(\color{blue}{\texttt{num_pop = 2}}\).

rank = 2
num_pop = 2
ST = createST(X)
gpFit <- gp(formula = y ~ X$x1 + as.factor(X$x3), 
            data = data.frame(y, X), 
            inputs = names(X), 
            cov = covICM_Notched(rank,num_pop),
            compGrad = FALSE,
            estim = TRUE, 
            noise = TRUE, 
            varNoiseIni = 1.0e-3, 
            varNoiseLower = 1.0e-5, 
            varNoiseUpper = 1.0e-2,
            parCovIni = c(10,10,rep(0.15,rank*num_pop)),
            # multistart = 4,
            optimMethod = "NLOPT_LN_COBYLA",
            opts=list("xtol_rel" = 1.0e-5,
                      "check_derivatives_tol" = 1.0e-5,
                      "population" = 120, 
                      # "print_level" = 2, 
                      "maxeval" = 150))

The summary output of the fitted model:

summary(gpFit)


Call:
gp(formula = y ~ X$x1 + as.factor(X$x3), data = data.frame(y, 
    X), inputs = names(X), cov = covICM_Notched(rank, num_pop), 
    estim = TRUE, compGrad = FALSE, noise = TRUE, varNoiseIni = 0.001, 
    varNoiseLower = 1e-05, varNoiseUpper = 0.01, parCovIni = c(10, 
        10, rep(0.15, rank * num_pop)), optimMethod = "NLOPT_LN_COBYLA", 
    opts = list(xtol_rel = 1e-05, check_derivatives_tol = 1e-05, 
        population = 120, maxeval = 150))


Number of observations: 795 

Trend coef.:
                    Value
(Intercept)      -9.25709
X$x1              0.08858
as.factor(X$x3)2  0.02404

Covariance whith class "covMan"
'User' covariance kernel
o Description: myGauss 
o Dimension 'd' (nb of inputs): 3
o Parameters: "theta.ag", "theta.yr", "a11", "a21", "a12", "a22"
o Number of parameters: 6
o Accept matrix inputs.
o Param. values: 
            Value Lower Upper
theta.ag 14.48103 5e+00  30.0
theta.yr  8.00399 5e+00  30.0
a11       0.18953 1e-06   0.5
a21       0.12657 1e-06   0.5
a12       0.05620 1e-06   0.5
a22       0.06429 1e-06   0.5

Noise variance: 0.004

With the ICM covariance kernel, we can extract the factor loadings and compute the cross-covariance matrix \(B\):

a.loadings <- gpFit$covariance@par[3:length(gpFit$covariance@par)]
group.loadings <- split(a.loadings,ceiling(seq_along(a.loadings)/num_pop))
B <- Reduce("+",lapply(group.loadings,function(x) x%*%t(x)))
rownames(B) <- ctry; colnames(B) <- ctry
B

        Estonia Hungary
Estonia 0.03908 0.02760
Hungary 0.02760 0.02015

The inferred cross-correlation matrix \(R\) from \(B\):

R <- cov2cor(B)
R

        Estonia Hungary
Estonia  1.0000  0.9835
Hungary  0.9835  1.0000

Hungary and Estonia are highly correlated with the correlation coefficient of \(r_{EST,~HUN} \approx 0.98\).

2016 out-of-sample prediction for Hungary via 2-population ICM model

We apply the fitted model to predict 2016 log-mortality rates for Ages 70–84 in Hungary. First, we create a test set:

# new data frame for prediction 
yearsForecast = 2016
agesForecast  = 70:84
nYr = length(yearsForecast)
nAg = length(agesForecast)
npop = 2
    
xPred = data.frame(x1 = rep(agesForecast, npop),
                   x2 = rep(yearsForecast, each = npop),
                   x3 = rep(1:npop, each = nAg))

We further extract \(\sigma^2_l\) from single-population models: \(\sigma^2_{EST}\) from single-population for Estonia (Ages 70–84, Years 1990–2016) and \(\sigma^2_{HUN}\) from single-population for Hungary (Ages 70–84, Years 1990–2015).

load("nugget_mortData.Rda")
nugget = nugget[country %in% ctry]
nugget = nugget[(country=="Hungary" & nug.year==2015)|(country=="Estonia" & nug.year==2016)]
(list.nug = nugget$nug)

[1] 0.001980 0.005019

Finally, we apply the user-defined function \(\color{blue}{\texttt{gp.predictionNotch()}}\) from the source \(\color{blue}{\texttt{gp_predictionNotched}}\) for prediction. Note that we have to provide the covariance matrix for new inputs in the test set (\(\color{blue}{\texttt{covXnew}}\)). Prior to estimate \(\color{blue}{\texttt{covXnew}}\), we must re-compute the transpose of the communication matrix, \(S^T\), for the test set.

ST = createST(xPred); 
covXnew = covMat(gpFit$covariance, xPred)
    
ST = createST(gpFit$X)
out = gp.predictionNotch(newdata = xPred, 
                         gpmodel = gpFit, 
                         list.noise = list.nug, 
                         meanTr = "linearAg",
                         typePred = "f",
                         covXnew = covXnew)
predicted = out$res

Prediction results for Age 70 and 84 in Hungary:

	age	year	popN	mean	vr	std
16	70	2016	2	-3.236	3e-04	0.0176
30	84	2016	2	-2.050	3e-04	0.0176

We further compute SMAPE to assess model performance:

# data with observed values:
mortData = createMortData(year_start=2016, year_end=2016, age_start=70, age_end=84, sex="m", sex_cat="no")

yearsForecast = c(2013,2015,2016)
agesForecast = 70:84
observed= mortData[age %in% agesForecast & year %in% yearsForecast & country %in% ctry,]
observed = intCtry(ctry)

# merge observed data with predicted one:
compareGP = as.data.table(merge(predicted, observed, all.x=T, by=c("age","year","popN")))
smape = compareGP[,.(smape = mean(200*abs(y-mean)/(abs(y)+abs(mean)))),
                  by=.(country,year)]

country	year	smape
Estonia	2016	1.938
Hungary	2016	0.947

Lastly, we extract the standard deviation of \(f_*\) via the ICM model at new inputs:

hungaryNot.std = predicted$std[predicted$popN==2]

2.2 Fitting single-population model for Hungary

Assume that Hungary doesn’t have 2016 mortality data available, one approach is to fit a single-population model, Ages 70–84 and Years 1990–2015 and perform one-year out forecast for 2016. The \(\color{blue}{\texttt{km()}}\) function in package DiceKriging is employed to fit the model:

mortData = createMortData(year_start=1990,year_end=2015,age_start=70,age_end=84, sex="m",sex_cat="no")
mortData = mortData[country=="Hungary"]

# Single models:

xMort = data.frame(age = mortData$age, 
                   year = mortData$year)
  yMort = mortData$y
  mortSingle_nug = km(formula = ~x.age, 
                      design = data.frame(x = xMort), 
                      response = yMort,
                      nugget.estim=TRUE,
                      covtype="gauss",
                      optim.method="gen",
                      upper = c(45,45),
                      control=list(max.generations=100,pop.size=100,
                                   wait.generations=10,
                                   solution.tolerance=1e-5,
                                   print.level = 0))


optimisation start
------------------
* estimation method   : MLE 
* optimisation method : gen 
* analytical gradient : used
* trend model : ~x.age
* covariance model : 
  - type :  gauss 
  - nugget : unknown homogenous nugget effect 
  - parameters lower bounds :  1e-10 1e-10 
  - parameters upper bounds :  45 45 
  - upper bound for alpha   :  1 
  - best initial criterion value(s) :  626.9

  nug = mortSingle_nug@covariance@nugget 
  mortSingle = km(formula = ~x.age, 
                  design = mortSingle_nug@X, 
                  response = mortSingle_nug@y,
                  noise.var = rep(nug,mortSingle_nug@n), 
                  coef.trend = mortSingle_nug@trend.coef,  
                  coef.cov = mortSingle_nug@covariance@range.val,
                  coef.var = mortSingle_nug@covariance@sd2, 
                  covtype = mortSingle_nug@covariance@name)

The output of the single model for Hungary:

show(mortSingle_nug)


Call:
km(formula = ~x.age, design = data.frame(x = xMort), response = yMort, 
    covtype = "gauss", nugget.estim = TRUE, optim.method = "gen", 
    upper = c(45, 45), control = list(max.generations = 100, 
        pop.size = 100, wait.generations = 10, solution.tolerance = 1e-05, 
        print.level = 0))

Trend  coeff.:
               Estimate
 (Intercept)    -8.6385
       x.age     0.0806

Covar. type  : gauss 
Covar. coeff.:
                Estimate
 theta(x.age)     8.6522
theta(x.year)     6.5035

Variance estimate: 0.01263

Nugget effect estimate: 0.00198

Let’s predict 2016 log-mortality rates for Ages 70–84 and compute SMAPE value:

yearsForecast = 2016
agesForecast  = 70:84
nYr = length(yearsForecast)
nAg = length(agesForecast)

mortData = createMortData(year_start=2016,year_end=2016,age_start=70,age_end=84, sex="m",sex_cat="no")
observed = mortData[age %in% agesForecast & year %in% yearsForecast 
                    & country=="Hungary"]
observed = observed[order(year)]
  
xPred = data.table(age = rep(agesForecast, nYr),
                   year = rep(yearsForecast, each = nAg))
mortPred = predict(mortSingle, newdata=data.frame(x=xPred),
                   cov.compute=TRUE, se.compute=TRUE,type="UK")
xPred$mean = mortPred$mean
xPred$y = observed$y

# smape:
smape = xPred[,.(smape = mean(200*abs(y-mean)/(abs(y)+abs(mean)))),by=.(year)]

year	smape
2016	1.575

\(\text{SMAPE}_{SOGP}=1.575 > \text{SMAPE}_{MOGP}=0.947\). Recall that lower SMAPE is better, so the joint Estonia-Hungary model is significantly more accurate than the model fitted to Hungary data only. This illustrates how we boost the model’s predictive power by incorporating most recent data from an actuarially-similar population.

The standard deviation of \(f_*\) via single model at new inputs:

hungarySin.std = mortPred$sd

We compare the standard deviation of \(f_*\) at new inputs between two models:

The strong correlation between Estonia and Hungary (\(r_{EST,~HUN} \approx 0.98\)) helps reduce the uncertainty in Hungarian prediction. This is equivalent to increase the forecast credibility in Hungary as the standard deviation of the latent function \(f\) at the new inputs via MOGP model are significantly smaller than ones via SOGP. We recommend adding more data that are highly correlated with Hungary to further observe improvement in the prediction accuracy and credibility.

MOGP for multi-population longevity modeling
Part III - ICM with Notched Data

Mike Ludkovski, UCSB

Nhan Huynh, UCSB

June 08, 2020

1 Packages and user-defined functions

2 Two-population ICM Model

2.1 Fitting 2-population ICM model

2.2 Fitting single-population model for Hungary

MOGP for multi-population longevity modeling Part III - ICM with Notched Data

Mike Ludkovski, UCSB

Nhan Huynh, UCSB

June 08, 2020

1 Packages and user-defined functions

2 Two-population ICM Model

2.1 Fitting 2-population ICM model

2.2 Fitting single-population model for Hungary

MOGP for multi-population longevity modeling
Part III - ICM with Notched Data