This tutorial addresses modeling of mortality across multiple populations using the classical full-rank kernel for factor covariates. We present a case study written in RMarkdown.

1 Packages and user-defined functions

Loading all packages into the current working directory:

  • DiceKriging: kriging methods for single-output GP (SOGP).
  • rgenoud: genetic optimization solver used for Maximum Likelihood Estimation of GP hyperparameters.
  • kergp: kriging methods for user-defined kernel, needed for both Full-rank and ICM Multi-output GP.
  • nloptr: solve nonlinear optimization using R interface to NLopt.
  • data.table: extension of data.frame to enhance data manipulation.

Loading all user-defined functions:

  • \(\color{blue}{\texttt{createMortData.R}}\): to import datasets and select populations prior to model fitting. We can specify the age groups, calendar years, and whether both Males and Females are modeled together. All the datasets are downloaded from the Human Mortality Database. Currently, we have 16 European countries (seperated by gender) in Death Count and Exposures folders. These countries are:
    • Austria
    • Belarus
    • Czech
    • Denmark
    • Estonia
    • France
    • Germany
    • Hungary
    • Latvia
    • Lithuania
    • Netherlands
    • Poland
    • Spain
    • Sweden
    • Switzerland
    • UK
    Illustration on the historical evolution of the log- mortality rates for Age 80 from 1984–2016 in 16 European countries, separated by Males (left figure) and Females (right figure). We observe that: Female populations have lower mortality rates than Male groups and within each gender, Western European countries have lower mortality trends than Eastern European countries.

  • \(\color{blue}{\texttt{covFull.R}}\): to define the full-rank kernel with pre-specified number of populations. Users can directly modify this script according to their own analyses (E.g: define different covariance kernel over Age-Year inputs, adjust the lower and upper bounds of the hyperparameters, rename the hyperparameters, etc.). Let: \[\Gamma_{(l_1,i),(l_2,j)}=\exp{\big[-\theta_{l_1,l_2}\delta_{l_1,l_2}^{ij}\big]}~~\text{where }~l_1,l_2 \in \{1,...,L\},\] with \[\delta_{l_1,l_2}^{ij} = \begin{cases} 1 & i\text{th and } j\text{th observation come from population } l_1 \text{ and } l_2; \\ 0 & \text{otherwise} \end{cases}\] Note that \(\delta_{l_1,l_2}^{ij} = 1_{\{x^i_{l_1} \neq x^j_{l_1}\}}.1_{\{x^i_{l_2} \neq x^j_{l_2}\}}\) is symmetric in \(i\) and \(j\). Then, the covariance between input rows \(x^i\) and \(x^j\) is set as follows: \[\begin{align*} C(x^i,x^j) &:= \eta^2\exp{\Bigg[-\dfrac{(x^i_{ag}-x^j_{ag})^2}{2\theta_{ag}^2}-\dfrac{(x^i_{yr}-x^j_{yr})^2}{2\theta^2_{yr}}\Bigg]}\prod_{\{l_1,l_2\}}\exp{\Big[-\theta_{l_1,l_2}\delta_{l_1,l_2}^{ij}\Big]} \\ & = \begin{cases} \tilde{C}_{i,j} & \text{if observations are from the same population}; \\ \tilde{C}_{i,j}\Gamma_{(l_1,i),(l_2,j)} & \text{if observations are from population } l_1,l_2. \end{cases} \end{align*}\]
    • The hyperparameters \(\theta_{ag}\) and \(\theta_{yr}\) are the characteristic lengthscales in Age and Year, respectively. They determine how much influence one observation has on others in Age and Year dimensions.
    • The hyperparameter \(\theta_{l_1,l_2}\): large value of \(\theta_{l_1,l_2}\) implies low correlation \(r_{l_1,l_2}:=\exp{(-\theta_{l_1,l_2})}\) between the two populations.

  Two important assumptions are made in the above equation:

  • There is separability between the cross-population covariance and the covariance over the Age-Year inputs.
  • Observations across \(L\) populations share the same spatial covariance kernel. This assumption is useful to examine the commonality in the mortality across populations via the lengthscales in Age and Year dimensions.
  • \(\color{blue}{\texttt{gp_prediction.R}}\): to perform prediction. Outputs contain the predicted mean and the posterior variance at new inputs \(\mathbf{x_*}\) for the latent function \(\mathbf{f_*}\) and observed outputs \(\mathbf{y_*}\). The cross-covariance matrix between two sets of locations are also provided.

Lastly, we create an R function to assign integer labels for all populations in a MOGP model. It is equivalent to create dummy variables for a categorical variable.

2 Four-population Full-rank MOGP

2.1 Model fitting

We first prepare an aggregated data that combines Danish, French, Swiss, and UK mortality for Male observations, on Ages 70–84 and Years 1990–2016. Then, we assign integer labels for these populations such that: \(\text{Denmark}\equiv 1\), \(\text{France}\equiv 2\), \(\text{Sweden}\equiv 3\), and \(\text{UK}\equiv 4\).

The output vector (\(y\)) and input vector (\(X\)) must be provided. We change names of the columns in \(X\) to match the input names defined in \(\color{blue}{\texttt{covFull.R}}\) script.

Prior to fitting the model, we also need to specify the number of populations. In this example, we have 4 populations. The function \(\color{blue}{\texttt{gp()}}\) within kergp package can be called to fit the model.

There are several important arguments in \(\color{blue}{\texttt{gp()}}\) function:

  • \(\color{blue}{\texttt{formula}}\): the left-hand side of ~ is the response name and right-hand side is the trend covariates. Within a multi-population model, we use a linear mean function to take into account the different trends across populations: \[\begin{equation} m(x^n)=\beta_0+\beta_1^{ag}x^n_{ag}+\sum_{l=2}^{L} \beta_{pop,l} x^n_{pop,l}. \end{equation}\] Analogous to the coefficients of categorical covariates in regression, \(\beta_{pop,l}\) can be interpreted as the mean difference between log mortality in population \(l\) and the baseline. It also implies the shared Age structure—mortality rates rising exponentially in \(x_{ag}\) with slope \(\beta_1^{ag}\) in all populations. Note that the trend can be either known or unknown.
  • \(\color{blue}{\texttt{data}}\): contains the response \(y\) and the inputs \(X\).
  • \(\color{blue}{\texttt{cov}}\): user-defined covariance kernel. In this case, we define the full-rank kernel through the \(\color{blue}{\texttt{covFull()}}\) function with \(\color{blue}{\texttt{num_pop = 4}}\).
  • \(\color{blue}{\texttt{estim}}\): logical argument, if \(\color{blue}{\texttt{TRUE}}\): the model parameters are estimated by Maximum Likelihood.
  • \(\color{blue}{\texttt{noise}}\): logical argument, if \(\color{blue}{\texttt{TRUE}}\): estimate noise in the error term.
  • \(\color{blue}{\texttt{multistart}}\): different starting points for optimization process. Users are encouraged to load parallel backend (doParallel).
  • \(\color{blue}{\texttt{opts}}\): controls the optimization properties. Look up \(\color{blue}{\texttt{nloptr::nloptr.get.default.options()}}\) for more information.

The summary output of the fitted model shows the parameters in the mean and covariance function:


Call:
gp(formula = y ~ X$x1 + as.factor(X$x3), data = data.frame(y, 
    X), inputs = names(X), cov = covFull(num_pop), estim = TRUE, 
    compGrad = FALSE, noise = TRUE, varNoiseIni = 0.001, varNoiseLower = 1e-05, 
    varNoiseUpper = 0.01, parCovIni = c(15, 15, rep(0.1, ncol(combn(1:num_pop, 
        2))), 0.02), optimMethod = "NLOPT_LN_COBYLA", opts = list(xtol_rel = 1e-05, 
        check_derivatives_tol = 1e-05, population = 150, maxeval = 150))


Number of observations: 1620 

Trend coef.:
                     Value
(Intercept)      -10.48524
X$x1               0.10002
as.factor(X$x3)2  -0.11554
as.factor(X$x3)3  -0.15023
as.factor(X$x3)4  -0.02777

Covariance whith class "covMan"
'User' covariance kernel
o Description: myGauss 
o Dimension 'd' (nb of inputs): 3
o Parameters: "theta.ag", "theta.yr", "pop12", "pop13", "pop14", "pop23", "pop24", "pop34", "eta2"
o Number of parameters: 9
o Accept matrix inputs.
o Param. values: 
            Value Lower Upper
theta.ag 11.19665 5e+00 30.00
theta.yr  8.13307 5e+00 30.00
pop12     0.18673 1e-06  0.25
pop13     0.15395 1e-06  0.25
pop14     0.14343 1e-06  0.25
pop23     0.20068 1e-06  0.25
pop24     0.19093 1e-06  0.25
pop34     0.16848 1e-06  0.25
eta2      0.03441 1e-06  0.25

Noise variance: 0.001

The mean function can be extracted from the model output:

Intercept -10.4852
Age 0.1000
FRA (vs. DEN) -0.1155
SWE (vs. DEN) -0.1502
GBR (vs. DEN) -0.0278

Denmark is chosen to be the baseline in the mean function. Across all four populations, the common linear trend in Age is approximately 0.1, meaning that mortality rate increases by 10% for each additional year of Age.

The covariance \(\Gamma_{l_1,l_2}\) is driven by the hyperparameters \(\theta_{l_1,l_2}\): large value of \(\theta_{l_1,l_2}\) implies low correlation \(r_{l_1,l_2}:=\exp{(-\theta_{l_1,l_2})}\) between the two populations. We first extract the hyperparameters \(\theta_{l_1,l_2}\)s’ and then compute the cross-correlation:

DEN & FRA 0.8297
DEN & SWE 0.8573
DEN & GBR 0.8664
FRA & SWE 0.8182
FRA & GBR 0.8262
SWE & GBR 0.8449

2.2 Mortality improvement factors

In this section, we will illustrate how to derive the distribution of the backward-looking annual improvements for Ages 70–84 in 2016: \[MI_{back}^{obs}(x_{ag};2016)=1-\dfrac{\exp{\big(y(x_{ag};2016)\big)}}{\exp{\big(y(x_{ag};2015)\big)}}=1-\exp{\big(y(x_{ag};2016)-y(x_{ag};2015)\big)}\] with \(y(x_{ag},2016)\) and \(y(x_{ag},2015)\) the raw log-mortality rate for \((x_{ag},2016)\) and \((x_{ag},2015)\). Furthermore, we will compare the smoothed improvement factors and their 95% credible bands derived from the fitted full-rank MOGP model in the previous section in contrast to the single-population model.

Improvement rates via 4-population full-rank MOGP

Let’s first create a test set that contains forecasted Ages in two calendar years 2015 and 2016 for 4 countries: Denmark, France, Sweden, and UK. Then, we apply the function \(\color{blue}{\texttt{gp.predict()}}\) in \(\color{blue}{\texttt{gp_predictionFull.R}}\) to obtain \(\mathbf{f_*}\) (the predicted means) along with the cross-covariance at new input \(\mathbf{x_*}\).

We then extract \(\sigma^2_l\) from single-population models. In this case, we want \(\sigma^2_{DEN}\), \(\sigma^2_{FRA}\), \(\sigma^2_{FRA}\), and \(\sigma^2_{GBR}\) resulted from the single-population models, fitted on Male populations in these 4 populations, Ages 70–84 and Years 1990–2016. The nugget_mortData.Rda is the R dataset that contains the estimated \(\sigma^2_l\) in SOGP models for 16 Male populations in this study.

[1] 0.0016012 0.0003384 0.0008631 0.0006554

Finally, we can apply the function \(\color{blue}{\texttt{gp.predict()}}\) in \(\color{blue}{\texttt{gp_prediction.R}}\) for prediction:

In this example, we will focus on the improvement rates in Sweden. The results for other countries can be easily replicated. Below, we extract \(\mathbf{f_*}\) and the covariance for new input \(\mathbf{x_*}\) in Sweden.

Examples of Swedish prediction, including the predicted means and variances:

age year popN mean vr std
61 70 2015 3 -4.049 1e-04 0.0094
76 70 2016 3 -4.057 1e-04 0.0117
62 71 2015 3 -3.950 1e-04 0.0074
77 71 2016 3 -3.960 1e-04 0.0097

Examples of the covariance matrix at a few input \(\mathbf{x_*}\) for Sweden:

(70, 2016) (71, 2016) (72, 2016) (73, 2016) (74, 2016)
(70, 2015) 1.02e-04 7.78e-05 5.64e-05 3.81e-05 2.32e-05
(71, 2015) 7.78e-05 6.53e-05 5.30e-05 4.12e-05 3.04e-05
(72, 2015) 5.64e-05 5.30e-05 4.81e-05 4.21e-05 3.53e-05
(73, 2015) 3.81e-05 4.12e-05 4.21e-05 4.09e-05 3.78e-05
(74, 2015) 2.32e-05 3.04e-05 3.53e-05 3.78e-05 3.81e-05

We take advantage of GP properties such that a difference of two GPs is a GP with new mean and covariance function (E.g: \(\mathbb{E}(\mu_1-\mu_2)=\mathbb{E}(\mu_1)-\mathbb{E}(\mu_2)\) and \(\mathbb{V}(\mu_1-\mu_2)=\mathbb{V}(\mu_1)+\mathbb{V}(\mu_2)-2Cov(\mu_1,\mu_2)\)). Then, compute the 95% quantiles of a normal distribution with a given mean and variance:

Given the distribution of the difference (follows a normal distribution), we can derive the distribution of a transformed variable, which is a one-on-one function of the difference.

  If \(U\) has a normal distribution (\(U \sim \mathcal{N}(m,\sigma^2)\)), then the exponential function of \(U\), \(W=\exp{(U)}\), has a log-normal distribution with \(\mathbb{E}(W)=\exp{\Big(m+\dfrac{\sigma^2}{2}\Big)}\) and \(\mathbb{V}(W)=\exp{(\sigma^2-1)\exp{(2m+\sigma^2)}}\).

3 Single-population Gaussian process (SOGP)

Multi-output GP is the generalized framework based on Single-population Gaussian Process study, written by Michael Ludkovski, Jimmy Risk, and Howard Zail. The paper can be found here with an R notebook publicly available at: https://github.com/jimmyrisk/GPmortalityNotebook.

We employ \(\color{blue}{\texttt{km()}}\) function in the package DiceKriging to fit single-population model for Male observations, Ages 70–84 and Years 1990–2016 in Sweden.


optimisation start
------------------
* estimation method   : MLE 
* optimisation method : gen 
* analytical gradient : used
* trend model : ~x.age
* covariance model : 
  - type :  gauss 
  - nugget : unknown homogenous nugget effect 
  - parameters lower bounds :  1e-10 1e-10 
  - parameters upper bounds :  45 45 
  - upper bound for alpha   :  1 
  - best initial criterion value(s) :  830.0952 

Summary of the fitted SOGP model:


Call:
km(formula = ~x.age, design = data.frame(x = xMort), response = yMort, 
    covtype = "gauss", nugget.estim = TRUE, optim.method = "gen", 
    upper = c(45, 45), control = list(max.generations = 100, 
        pop.size = 100, wait.generations = 10, solution.tolerance = 1e-05, 
        print.level = 0))

Trend  coeff.:
               Estimate
 (Intercept)   -10.9148
       x.age     0.1040

Covar. type  : gauss 
Covar. coeff.:
                Estimate
 theta(x.age)    19.2481
theta(x.year)    26.7457

Variance estimate: 0.09330939

Nugget effect estimate: 0.0008630776

Our goal is to estimate the smoothed improvement rates in Sweden via SOGP model and compare the results with the full-rank MOGP model in the previous section. Similar to what we have done earlier, we first derive the distribution of the difference between two log-mortality variables:

Using variable transfromation, we derive the mean and 95% credible bands for the improvement rate factors:

We are now ready to visualize the year-over-year smoothed improvement rate factors for Ages 70–84 in 2016 via two different models: the full-rank MOGP and the SOGP model for Swedish Males, see interactive figure below.

 

 

The large lengthscale in Age (\(\theta_{ag}\)) in SOGP model lead to essentially linear improvement rate factors (blue curves). In the four-population model (orange curves), the Age lengthscale decreases, so the improvement rate factors become more Age-dependent.