This tutorial addresses modeling of mortality across multiple populations using the classical full-rank kernel for factor covariates. We present a case study written in RMarkdown.

1 Packages and user-defined functions

Loading all packages into the current working directory:

library(DiceKriging)
library(rgenoud)
library(kergp) 
library(nloptr)
library(data.table)

DiceKriging: kriging methods for single-output GP (SOGP).
rgenoud: genetic optimization solver used for Maximum Likelihood Estimation of GP hyperparameters.
kergp: kriging methods for user-defined kernel, needed for both Full-rank and ICM Multi-output GP.
nloptr: solve nonlinear optimization using R interface to NLopt.
data.table: extension of data.frame to enhance data manipulation.

Loading all user-defined functions:

source("createMortData.R")
source("covFull.R")
source("gp_prediction.R")

\(\color{blue}{\texttt{createMortData.R}}\): to import datasets and select populations prior to model fitting. We can specify the age groups, calendar years, and whether both Males and Females are modeled together. All the datasets are downloaded from the Human Mortality Database. Currently, we have 16 European countries (seperated by gender) in Death Count and Exposures folders. These countries are:
- Austria
- Belarus
- Czech
- Denmark
- Estonia
- France
- Germany
- Hungary
- Latvia
- Lithuania
- Netherlands
- Poland
- Spain
- Sweden
- Switzerland
- UK
Illustration on the historical evolution of the log- mortality rates for Age 80 from 1984–2016 in 16 European countries, separated by Males (left figure) and Females (right figure). We observe that: Female populations have lower mortality rates than Male groups and within each gender, Western European countries have lower mortality trends than Eastern European countries.

\(\color{blue}{\texttt{covFull.R}}\): to define the full-rank kernel with pre-specified number of populations. Users can directly modify this script according to their own analyses (E.g: define different covariance kernel over Age-Year inputs, adjust the lower and upper bounds of the hyperparameters, rename the hyperparameters, etc.). Let: \[\Gamma_{(l_1,i),(l_2,j)}=\exp{\big[-\theta_{l_1,l_2}\delta_{l_1,l_2}^{ij}\big]}~~\text{where }~l_1,l_2 \in \{1,...,L\},\] with \[\delta_{l_1,l_2}^{ij} = \begin{cases} 1 & i\text{th and } j\text{th observation come from population } l_1 \text{ and } l_2; \\ 0 & \text{otherwise} \end{cases}\] Note that \(\delta_{l_1,l_2}^{ij} = 1_{\{x^i_{l_1} \neq x^j_{l_1}\}}.1_{\{x^i_{l_2} \neq x^j_{l_2}\}}\) is symmetric in \(i\) and \(j\). Then, the covariance between input rows \(x^i\) and \(x^j\) is set as follows: \[\begin{align*} C(x^i,x^j) &:= \eta^2\exp{\Bigg[-\dfrac{(x^i_{ag}-x^j_{ag})^2}{2\theta_{ag}^2}-\dfrac{(x^i_{yr}-x^j_{yr})^2}{2\theta^2_{yr}}\Bigg]}\prod_{\{l_1,l_2\}}\exp{\Big[-\theta_{l_1,l_2}\delta_{l_1,l_2}^{ij}\Big]} \\ & = \begin{cases} \tilde{C}_{i,j} & \text{if observations are from the same population}; \\ \tilde{C}_{i,j}\Gamma_{(l_1,i),(l_2,j)} & \text{if observations are from population } l_1,l_2. \end{cases} \end{align*}\]
- The hyperparameters \(\theta_{ag}\) and \(\theta_{yr}\) are the characteristic lengthscales in Age and Year, respectively. They determine how much influence one observation has on others in Age and Year dimensions.
- The hyperparameter \(\theta_{l_1,l_2}\): large value of \(\theta_{l_1,l_2}\) implies low correlation \(r_{l_1,l_2}:=\exp{(-\theta_{l_1,l_2})}\) between the two populations.

Two important assumptions are made in the above equation:

There is separability between the cross-population covariance and the covariance over the Age-Year inputs.
Observations across \(L\) populations share the same spatial covariance kernel. This assumption is useful to examine the commonality in the mortality across populations via the lengthscales in Age and Year dimensions.

\(\color{blue}{\texttt{gp_prediction.R}}\): to perform prediction. Outputs contain the predicted mean and the posterior variance at new inputs \(\mathbf{x_*}\) for the latent function \(\mathbf{f_*}\) and observed outputs \(\mathbf{y_*}\). The cross-covariance matrix between two sets of locations are also provided.

Lastly, we create an R function to assign integer labels for all populations in a MOGP model. It is equivalent to create dummy variables for a categorical variable.

intCtry = function(ctry){
  # ctry: list of countries (or populations) in the model
  subdt = mortData[country %in% ctry] 
  ctry = ctry[order(ctry)]
  for (i in 1:length(ctry)){
    subdt[country==ctry[i], popN:=i]
  }
  return(subdt)
}

2 Four-population Full-rank MOGP

2.1 Model fitting

We first prepare an aggregated data that combines Danish, French, Swiss, and UK mortality for Male observations, on Ages 70–84 and Years 1990–2016. Then, we assign integer labels for these populations such that: \(\text{Denmark}\equiv 1\), \(\text{France}\equiv 2\), \(\text{Sweden}\equiv 3\), and \(\text{UK}\equiv 4\).

mortData = createMortData(year_start=1990, year_end=2016, 
                          age_start=70, age_end=84, 
                          sex="m", sex_cat="no")
# contain mortality data for all 16 Male populations
ctry = c("Sweden","Denmark","France","UK")
mortData = mortData[country %in% ctry] # select data for 4 populations above
mortData = intCtry(ctry)

The output vector (\(y\)) and input vector (\(X\)) must be provided. We change names of the columns in \(X\) to match the input names defined in \(\color{blue}{\texttt{covFull.R}}\) script.

y = mortData[,y] # output values
X = mortData[,.(age,year,popN)] # input values 
X = as.matrix(X,ncol=3)
X = as.data.frame(X)
names(X) = c("x1","x2","x3") # age: x1,

Prior to fitting the model, we also need to specify the number of populations. In this example, we have 4 populations. The function \(\color{blue}{\texttt{gp()}}\) within kergp package can be called to fit the model.

num_pop = 4 
gpFit <- gp(formula = y ~ X$x1 + as.factor(X$x3), 
            data = data.frame(y, X), 
            inputs = names(X), 
            cov = covFull(num_pop),
            compGrad = FALSE,
            estim = TRUE, 
            noise = TRUE, 
            varNoiseIni = 1.0e-3, 
            varNoiseLower = 1.0e-5, 
            varNoiseUpper = 1.0e-2,
            parCovIni = c(15,15,
                          rep(0.10,ncol(combn(1:num_pop,2))),
                          0.02),
            optimMethod = "NLOPT_LN_COBYLA",
            opts=list("xtol_rel" = 1.0e-5,
                      "check_derivatives_tol" = 1.0e-5,
                      "population" = 150, 
                      # "print_level" = 2, 
                      "maxeval" = 150))

There are several important arguments in \(\color{blue}{\texttt{gp()}}\) function:

\(\color{blue}{\texttt{formula}}\): the left-hand side of ~ is the response name and right-hand side is the trend covariates. Within a multi-population model, we use a linear mean function to take into account the different trends across populations: \[\begin{equation} m(x^n)=\beta_0+\beta_1^{ag}x^n_{ag}+\sum_{l=2}^{L} \beta_{pop,l} x^n_{pop,l}. \end{equation}\] Analogous to the coefficients of categorical covariates in regression, \(\beta_{pop,l}\) can be interpreted as the mean difference between log mortality in population \(l\) and the baseline. It also implies the shared Age structure—mortality rates rising exponentially in \(x_{ag}\) with slope \(\beta_1^{ag}\) in all populations. Note that the trend can be either known or unknown.
\(\color{blue}{\texttt{data}}\): contains the response \(y\) and the inputs \(X\).
\(\color{blue}{\texttt{cov}}\): user-defined covariance kernel. In this case, we define the full-rank kernel through the \(\color{blue}{\texttt{covFull()}}\) function with \(\color{blue}{\texttt{num_pop = 4}}\).
\(\color{blue}{\texttt{estim}}\): logical argument, if \(\color{blue}{\texttt{TRUE}}\): the model parameters are estimated by Maximum Likelihood.
\(\color{blue}{\texttt{noise}}\): logical argument, if \(\color{blue}{\texttt{TRUE}}\): estimate noise in the error term.
\(\color{blue}{\texttt{multistart}}\): different starting points for optimization process. Users are encouraged to load parallel backend (doParallel).
\(\color{blue}{\texttt{opts}}\): controls the optimization properties. Look up \(\color{blue}{\texttt{nloptr::nloptr.get.default.options()}}\) for more information.

The summary output of the fitted model shows the parameters in the mean and covariance function:

summary(gpFit)


Call:
gp(formula = y ~ X$x1 + as.factor(X$x3), data = data.frame(y, 
    X), inputs = names(X), cov = covFull(num_pop), estim = TRUE, 
    compGrad = FALSE, noise = TRUE, varNoiseIni = 0.001, varNoiseLower = 1e-05, 
    varNoiseUpper = 0.01, parCovIni = c(15, 15, rep(0.1, ncol(combn(1:num_pop, 
        2))), 0.02), optimMethod = "NLOPT_LN_COBYLA", opts = list(xtol_rel = 1e-05, 
        check_derivatives_tol = 1e-05, population = 150, maxeval = 150))


Number of observations: 1620 

Trend coef.:
                     Value
(Intercept)      -10.48524
X$x1               0.10002
as.factor(X$x3)2  -0.11554
as.factor(X$x3)3  -0.15023
as.factor(X$x3)4  -0.02777

Covariance whith class "covMan"
'User' covariance kernel
o Description: myGauss 
o Dimension 'd' (nb of inputs): 3
o Parameters: "theta.ag", "theta.yr", "pop12", "pop13", "pop14", "pop23", "pop24", "pop34", "eta2"
o Number of parameters: 9
o Accept matrix inputs.
o Param. values: 
            Value Lower Upper
theta.ag 11.19665 5e+00 30.00
theta.yr  8.13307 5e+00 30.00
pop12     0.18673 1e-06  0.25
pop13     0.15395 1e-06  0.25
pop14     0.14343 1e-06  0.25
pop23     0.20068 1e-06  0.25
pop24     0.19093 1e-06  0.25
pop34     0.16848 1e-06  0.25
eta2      0.03441 1e-06  0.25

Noise variance: 0.001

The mean function can be extracted from the model output:

beta = gpFit$betaHat

Intercept	-10.4852
Age	0.1000
FRA (vs. DEN)	-0.1155
SWE (vs. DEN)	-0.1502
GBR (vs. DEN)	-0.0278

Denmark is chosen to be the baseline in the mean function. Across all four populations, the common linear trend in Age is approximately 0.1, meaning that mortality rate increases by 10% for each additional year of Age.

The covariance \(\Gamma_{l_1,l_2}\) is driven by the hyperparameters \(\theta_{l_1,l_2}\): large value of \(\theta_{l_1,l_2}\) implies low correlation \(r_{l_1,l_2}:=\exp{(-\theta_{l_1,l_2})}\) between the two populations. We first extract the hyperparameters \(\theta_{l_1,l_2}\)s’ and then compute the cross-correlation:

cross.pop = gpFit$covariance@par[3:(length(gpFit$covariance@par)-1)]
r = exp(-cross.pop)

DEN & FRA	0.8297
DEN & SWE	0.8573
DEN & GBR	0.8664
FRA & SWE	0.8182
FRA & GBR	0.8262
SWE & GBR	0.8449

2.2 Mortality improvement factors

In this section, we will illustrate how to derive the distribution of the backward-looking annual improvements for Ages 70–84 in 2016: \[MI_{back}^{obs}(x_{ag};2016)=1-\dfrac{\exp{\big(y(x_{ag};2016)\big)}}{\exp{\big(y(x_{ag};2015)\big)}}=1-\exp{\big(y(x_{ag};2016)-y(x_{ag};2015)\big)}\] with \(y(x_{ag},2016)\) and \(y(x_{ag},2015)\) the raw log-mortality rate for \((x_{ag},2016)\) and \((x_{ag},2015)\). Furthermore, we will compare the smoothed improvement factors and their 95% credible bands derived from the fitted full-rank MOGP model in the previous section in contrast to the single-population model.

Improvement rates via 4-population full-rank MOGP

Let’s first create a test set that contains forecasted Ages in two calendar years 2015 and 2016 for 4 countries: Denmark, France, Sweden, and UK. Then, we apply the function \(\color{blue}{\texttt{gp.predict()}}\) in \(\color{blue}{\texttt{gp_predictionFull.R}}\) to obtain \(\mathbf{f_*}\) (the predicted means) along with the cross-covariance at new input \(\mathbf{x_*}\).

agesForecast = 70:84
year = 2016

t1 = year; t0 = year-1
nAg = length(agesForecast)

xPred = data.frame(x1 = rep(agesForecast,2*num_pop),
                   x2 = rep(c(rep(t0,nAg),rep(t1,nAg)),num_pop),
                   x3 = rep(1:num_pop,each=nAg*2))

We then extract \(\sigma^2_l\) from single-population models. In this case, we want \(\sigma^2_{DEN}\), \(\sigma^2_{FRA}\), \(\sigma^2_{FRA}\), and \(\sigma^2_{GBR}\) resulted from the single-population models, fitted on Male populations in these 4 populations, Ages 70–84 and Years 1990–2016. The nugget_mortData.Rda is the R dataset that contains the estimated \(\sigma^2_l\) in SOGP models for 16 Male populations in this study.

# nugget extracted from individual models:
load("nugget_mortData.Rda")
nugget = nugget[country %in% ctry & nug.year==2016]
nugget = nugget[order(country)]
(list.nug = nugget$nug)

[1] 0.0016012 0.0003384 0.0008631 0.0006554

Finally, we can apply the function \(\color{blue}{\texttt{gp.predict()}}\) in \(\color{blue}{\texttt{gp_prediction.R}}\) for prediction:

# prediction using 4-population rull-rank MOGP
out = gp.predict(newdata = xPred, 
                 gpmodel = gpFit, 
                 list.noise = list.nug, 
                 meanTr = "linearAg", # linear in Age only
                 typePred = "f") # fhat

In this example, we will focus on the improvement rates in Sweden. The results for other countries can be easily replicated. Below, we extract \(\mathbf{f_*}\) and the covariance for new input \(\mathbf{x_*}\) in Sweden.

# extract the predicted mean and variance:
res = out$res
res_Swe = res[res$popN==3,] # results for Sweden

Examples of Swedish prediction, including the predicted means and variances:

	age	year	popN	mean	vr	std
61	70	2015	3	-4.049	1e-04	0.0094
76	70	2016	3	-4.057	1e-04	0.0117
62	71	2015	3	-3.950	1e-04	0.0074
77	71	2016	3	-3.960	1e-04	0.0097

# extract the covariance matrix:
cov = out$covMa
cov_Swe = cov[61:90,61:90] # covariance for Sweden

Examples of the covariance matrix at a few input \(\mathbf{x_*}\) for Sweden:

	(70, 2016)	(71, 2016)	(72, 2016)	(73, 2016)	(74, 2016)
(70, 2015)	1.02e-04	7.78e-05	5.64e-05	3.81e-05	2.32e-05
(71, 2015)	7.78e-05	6.53e-05	5.30e-05	4.12e-05	3.04e-05
(72, 2015)	5.64e-05	5.30e-05	4.81e-05	4.21e-05	3.53e-05
(73, 2015)	3.81e-05	4.12e-05	4.21e-05	4.09e-05	3.78e-05
(74, 2015)	2.32e-05	3.04e-05	3.53e-05	3.78e-05	3.81e-05

We take advantage of GP properties such that a difference of two GPs is a GP with new mean and covariance function (E.g: \(\mathbb{E}(\mu_1-\mu_2)=\mathbb{E}(\mu_1)-\mathbb{E}(\mu_2)\) and \(\mathbb{V}(\mu_1-\mu_2)=\mathbb{V}(\mu_1)+\mathbb{V}(\mu_2)-2Cov(\mu_1,\mu_2)\)). Then, compute the 95% quantiles of a normal distribution with a given mean and variance:

# mean and variance of the difference:
predictedlogMortDiff = data.frame(age = agesForecast, year = rep(year,nAg))
predictedlogMortDiff$m = res_Swe$mean[(nAg+1):(2*nAg)] - res_Swe$mean[1:nAg]
predictedlogMortDiff$sd2 = res_Swe$vr[1:nAg] + res_Swe$vr[(nAg+1):(2*nAg)] - 
                                2*diag(cov_Swe[1:nAg,(nAg+1):(2*nAg)])
# 95% quantiles (lower and upper) given the mean and variance of the difference:
predictedlogMortDiff$lower <- qnorm(0.025, mean=predictedlogMortDiff$m, 
                                    sd=sqrt(predictedlogMortDiff$sd2))
predictedlogMortDiff$upper <- qnorm(0.975, mean=predictedlogMortDiff$m, 
                                    sd=sqrt(predictedlogMortDiff$sd2))

Given the distribution of the difference (follows a normal distribution), we can derive the distribution of a transformed variable, which is a one-on-one function of the difference.

If \(U\) has a normal distribution (\(U \sim \mathcal{N}(m,\sigma^2)\)), then the exponential function of \(U\), \(W=\exp{(U)}\), has a log-normal distribution with \(\mathbb{E}(W)=\exp{\Big(m+\dfrac{\sigma^2}{2}\Big)}\) and \(\mathbb{V}(W)=\exp{(\sigma^2-1)\exp{(2m+\sigma^2)}}\).

# mean and variance of a differnece in log mortality (U)
mu <- predictedlogMortDiff$m; sd2 <- predictedlogMortDiff$sd2

predictedMortDiffFull <- data.frame(age = agesForecast, year = rep(year,nAg))
# mean, variance, and 95% CB of transformed variable: W = 1-exp(W)
predictedMortDiffFull$m <- 1-exp(mu+sd2/2)
predictedMortDiffFull$sd2 <- exp(2*mu+sd2)*(exp(sd2)-1)
predictedMortDiffFull$upper <- 1-exp(predictedlogMortDiff$lower)
predictedMortDiffFull$lower <- 1-exp(predictedlogMortDiff$upper)

3 Single-population Gaussian process (SOGP)

Multi-output GP is the generalized framework based on Single-population Gaussian Process study, written by Michael Ludkovski, Jimmy Risk, and Howard Zail. The paper can be found here with an R notebook publicly available at: https://github.com/jimmyrisk/GPmortalityNotebook.

We employ \(\color{blue}{\texttt{km()}}\) function in the package DiceKriging to fit single-population model for Male observations, Ages 70–84 and Years 1990–2016 in Sweden.

mortData = createMortData(year_start=1990, year_end=2016,
                          age_start=70, age_end=84,
                          sex="m", sex_cat="no")
mortData = mortData[country=="Sweden"]

xMort = data.frame(age = mortData$age, 
                   year = mortData$year)
yMort = mortData$y
# model fitting:
mortSingle_nug = km(formula = ~x.age, 
                              design = data.frame(x = xMort), response = yMort,
                              nugget.estim=TRUE,
                              covtype="gauss",
                              optim.method="gen",
                              upper = c(45,45),
                              control=list(max.generations=100,pop.size=100,
                                           wait.generations=10,
                                           solution.tolerance=1e-5,
                                           print.level = 0))


optimisation start
------------------
* estimation method   : MLE 
* optimisation method : gen 
* analytical gradient : used
* trend model : ~x.age
* covariance model : 
  - type :  gauss 
  - nugget : unknown homogenous nugget effect 
  - parameters lower bounds :  1e-10 1e-10 
  - parameters upper bounds :  45 45 
  - upper bound for alpha   :  1 
  - best initial criterion value(s) :  830.0952

nug = mortSingle_nug@covariance@nugget 
mortSingle = km(formula = ~x.age, 
                     design = mortSingle_nug@X, 
                     response = mortSingle_nug@y,
                     noise.var = rep(nug,mortSingle_nug@n), 
                     coef.trend = mortSingle_nug@trend.coef,  
                     coef.cov = mortSingle_nug@covariance@range.val,
                     coef.var = mortSingle_nug@covariance@sd2, 
                     covtype = mortSingle_nug@covariance@name)

Summary of the fitted SOGP model:

show(mortSingle_nug)


Call:
km(formula = ~x.age, design = data.frame(x = xMort), response = yMort, 
    covtype = "gauss", nugget.estim = TRUE, optim.method = "gen", 
    upper = c(45, 45), control = list(max.generations = 100, 
        pop.size = 100, wait.generations = 10, solution.tolerance = 1e-05, 
        print.level = 0))

Trend  coeff.:
               Estimate
 (Intercept)   -10.9148
       x.age     0.1040

Covar. type  : gauss 
Covar. coeff.:
                Estimate
 theta(x.age)    19.2481
theta(x.year)    26.7457

Variance estimate: 0.09330939

Nugget effect estimate: 0.0008630776

Our goal is to estimate the smoothed improvement rates in Sweden via SOGP model and compare the results with the full-rank MOGP model in the previous section. Similar to what we have done earlier, we first derive the distribution of the difference between two log-mortality variables:

agesForecast = 70:84
year = 2016

t1 = year; t0 = year-1
nAg = length(agesForecast)

xPred <- data.frame(age=rep(agesForecast, 2),year=c(rep(t0,nAg),rep(t1,nAg)))
pred <- predict(mortSingle, newdata = data.frame(x=xPred),cov.compute=TRUE, type="UK")
predictedlogMortDiff <- data.frame(age = agesForecast, year = rep(year,nAg))
predictedlogMortDiff$m <- pred$m[(nAg+1):(2*nAg)] - pred$m[1:nAg]
predictedlogMortDiff$sd2 <- pred$sd[1:nAg]^2 + pred$sd[(nAg+1):(2*nAg)]^2 - 
                            2*diag(pred$cov[1:nAg,(nAg+1):(2*nAg)])

predictedlogMortDiff$lower <- qnorm(0.025, mean=predictedlogMortDiff$m, 
                                    sd=sqrt(predictedlogMortDiff$sd2))
predictedlogMortDiff$upper <- qnorm(0.975, mean=predictedlogMortDiff$m, 
                                    sd=sqrt(predictedlogMortDiff$sd2))

Using variable transfromation, we derive the mean and 95% credible bands for the improvement rate factors:

mu <- predictedlogMortDiff$m; sd2 <- predictedlogMortDiff$sd2
predictedMortDiff <- data.frame(age = agesForecast, year = rep(year,nAg))
predictedMortDiff$m <- 1-exp(mu+sd2/2)
predictedMortDiff$sd2 <- exp(2*mu+sd2)*(exp(sd2)-1)
predictedMortDiff$upper <- 1-exp(predictedlogMortDiff$lower)
predictedMortDiff$lower <- 1-exp(predictedlogMortDiff$upper)

We are now ready to visualize the year-over-year smoothed improvement rate factors for Ages 70–84 in 2016 via two different models: the full-rank MOGP and the SOGP model for Swedish Males, see interactive figure below.

The large lengthscale in Age (\(\theta_{ag}\)) in SOGP model lead to essentially linear improvement rate factors (blue curves). In the four-population model (orange curves), the Age lengthscale decreases, so the improvement rate factors become more Age-dependent.

MOGP for multi-population longevity modeling
Part I - A Full Rank Kernel

Mike Ludkovski, UCSB

Nhan Huynh, UCSB

June 06, 2020

1 Packages and user-defined functions

2 Four-population Full-rank MOGP

2.1 Model fitting

2.2 Mortality improvement factors

3 Single-population Gaussian process (SOGP)

MOGP for multi-population longevity modeling Part I - A Full Rank Kernel

Mike Ludkovski, UCSB

Nhan Huynh, UCSB

June 06, 2020

1 Packages and user-defined functions

2 Four-population Full-rank MOGP

2.1 Model fitting

2.2 Mortality improvement factors

3 Single-population Gaussian process (SOGP)

MOGP for multi-population longevity modeling
Part I - A Full Rank Kernel