In Tutorial I, we have discussed a full-rank kernel for MOGP model. However, estimating the cross-covariance kernel requires \(L(L-1)/2\) parameters \(\theta_{l_1,l_2}, 1 \leq l_1 \leq l_2 \leq L\) which imposes challenges in both statistical and computational aspects when modelling MOGP with many outputs (e.g. \(L \geq 4\)). In this section, we will discuss an attractive dimension reduction approach, the intrinsic coregionalization model (ICM), to keep the number of correlation parameters low.
Loading all packages into the current working directory:
DiceKriging
: kriging methods for single-output GP (SOGP).rgenoud
: genetic optimization solver used for Maximum Likelihood Estimation of GP hyperparameters.kergp
: kriging methods for user-defined kernel, needed for both Full-rank and ICM Multi-output GP.nloptr
: solve nonlinear optimization using R interface to NLopt.data.table
: extension of data.frame
to enhance data manipulation.Loading all user-defined functions:
Under ICM, the number of hyperparameters in the cross-population covariance is reduced from \(L(L-1)/2\) to \(Q \times L\). Therefore, taking \(Q<L/2\) allows to reduce the hyper-parameter space and alleviate the computational budget compared to the full- rank setup. Notably the assumption in ICM that all \(L\) populations share the common spatial covariance suits our interest in the inference of joint lengthscales in Age and Year dimensions. The computational complexity required in ICM is greatly simplified due to the properties of the Kronecker product. Finally, ICM allows the process variance \(\eta^2_l\) to vary by populations, i.e., the l\(th\) element in the diagonal in \(B\) is the process variance for population \(l\). This further bolsters the flexibility in MOGP to excel in out-of-sample forecasts.
Lastly, we write an R function to create integer coding for populations in MOGP model. This is equivalent to create dummy variables for a categorical variable.
intCtry = function(ctry){
# ctry: list of countries (or populations) in the model
subdt = mortData[country %in% ctry]
ctry = ctry[order(ctry)]
for (i in 1:length(ctry)){
subdt[country==ctry[i], popN:=i]
}
return(subdt)
}
For the remaining of this ICM tutorials, we will illustrate several examples how to fit ICM-MOGP models with different populations. We will show that borrowing information from other populations boosts the predictive power for out-of-sample forecasts. Finally, we explain how MOGP models achieve the coherence in long-term forecats and compare the results with SOGP models.
We first prepare an aggregated data that combines Danish and Swedish mortality for Males, on Ages 70–84 and Years 1990–2012. Denmark is coded as 1 and Sweden is coded as 2.
mortData = createMortData(year_start=1990, year_end=2012,
age_start=70, age_end=84,
sex="m", sex_cat="no")
# contain mortality data [ages 70--84 & years 1990--2012] for all 16 Male populations
ctry = c("Denmark","Sweden")
mortData = mortData[country %in% ctry]
mortData = intCtry(ctry)
The output vector (\(y\)) and input vector (\(X\)) must be provided. We have to change names of the columns in \(X\) to match input names defined in \(\color{blue}{\texttt{covICM.R}}\) script.
y = mortData[,y] # output values
X = mortData[,.(age,year,popN)] # input values
X = as.matrix(X,ncol=3)
X = as.data.frame(X)
names(X) = c("x1","x2","x3")
Prior to fitting the model, we also need to specify the number of populations and choice of rank. In this example, we have 2 populations (Male Danish and Male Swedish) and the rank value is 2. The function \(\color{blue}{\texttt{gp()}}\) within kergp
package can be called to fit the model.
rank = 2
num_pop = 2
gpFit <- gp(formula = y ~ X$x1 + as.factor(X$x3), # mean function (linear in Age)
data = data.frame(y, X),
inputs = names(X),
cov = covICM(rank,num_pop),
compGrad = FALSE,
estim = TRUE,
noise = TRUE,
varNoiseIni = 1.0e-3,
varNoiseLower = 1.0e-5,
varNoiseUpper = 1.0e-2,
parCovIni = c(10,10,rep(0.15,rank*num_pop)),
# multistart = 4,
optimMethod = "NLOPT_LN_COBYLA",
opts=list("xtol_rel" = 1.0e-5,
"check_derivatives_tol" = 1.0e-5,
"population" = 120,
# "print_level" = 2,
"maxeval" = 150))
There are several important arguments in \(\color{blue}{\texttt{gp()}}\) function:
doParallel
).The summary output of the fitted model contains parameters in both the mean and the covariance kernel:
Call:
gp(formula = y ~ X$x1 + as.factor(X$x3), data = data.frame(y,
X), inputs = names(X), cov = covICM(rank, num_pop), estim = TRUE,
compGrad = FALSE, noise = TRUE, varNoiseIni = 0.001, varNoiseLower = 1e-05,
varNoiseUpper = 0.01, parCovIni = c(10, 10, rep(0.15, rank *
num_pop)), optimMethod = "NLOPT_LN_COBYLA", opts = list(xtol_rel = 1e-05,
check_derivatives_tol = 1e-05, population = 120, maxeval = 150))
Number of observations: 690
Trend coef.:
Value
(Intercept) -11.22364
X$x1 0.10730
as.factor(X$x3)2 0.07649
Covariance whith class "covMan"
'User' covariance kernel
o Description: myGauss
o Dimension 'd' (nb of inputs): 3
o Parameters: "theta.ag", "theta.yr", "a11", "a21", "a12", "a22"
o Number of parameters: 6
o Accept matrix inputs.
o Param. values:
Value Lower Upper
theta.ag 22.40037 1e+00 30
theta.yr 23.46850 1e+00 30
a11 0.33739 1e-06 1
a21 0.17090 1e-06 1
a12 0.05628 1e-06 1
a22 0.24297 1e-06 1
Noise variance: 0.001
Within the covariance ICM, we can extract the factor loadings and compute the cross-covariance matrix \(B\):
a.loadings <- gpFit$covariance@par[3:length(gpFit$covariance@par)]
group.loadings <- split(a.loadings,ceiling(seq_along(a.loadings)/num_pop))
B <- Reduce("+",lapply(group.loadings,function(x) x%*%t(x)))
rownames(B) <- ctry; colnames(B) <- ctry
B
Denmark Sweden
Denmark 0.11700 0.07133
Sweden 0.07133 0.08824
We can infer cross-correlation matrix \(R\) from \(B\):
Denmark Sweden
Denmark 1.0000 0.7021
Sweden 0.7021 1.0000
Thus, the correlation between Denmark and Sweden is: \(r_{DEN,~SWE} \approx 0.70\).
Out-of-sample prediction via Denmark-Sweden ICM
We first create a test set to predict log-mortality for Age group 70–84 in year 2013 (one-year out), 2015 (3-year out) and 2016 (4-year out):
yearsForecast = c(2013, 2015, 2016)
agesForecast = 70:84
nYr = length(yearsForecast)
nAg = length(agesForecast)
npop = num_pop
xPred = data.frame(x1 = rep(agesForecast, npop * nYr),
x2 = rep(rep(yearsForecast, each = nAg), npop),
x3 = rep(1:npop, each = nYr * nAg))
# 1 is Denmark and 2 is Sweden
We then extract \(\sigma^2_l\) from single-population models. In this case, we want \(\sigma^2_{DEN}\) and \(\sigma^2_{SWE}\) resulted from two single-population models, fitted on Male populations in Denmark and Sweden, Ages 70–84 and Years 1990–2012. For covenience purpose, the nugget_mortData.Rda
is the R dataset that contains the estimated \(\sigma^2_l\) in SOGP models fitted on Ages 70–84 and different periods for 16 Male populations in this study.
load("nugget_mortData.Rda")
nugget = nugget[country %in% ctry & nug.year==2012]
nugget = nugget[order(country)]
(list.nug = nugget$nug)
[1] 0.0015163 0.0008022
Finally, we can apply the user-defined function \(\color{blue}{\texttt{gp.predict()}}\) in \(\color{blue}{\texttt{gp_predictionICM.Rda}}\) script for prediction:
out = gp.predict(newdata = xPred,
gpmodel = gpFit,
list.noise = list.nug,
meanTr = "linearAg", # linear in Age only
typePred = "fnoise") # yhat and std(y)
# if typePred != "fnoise", variance and std of latent function f
predicted = out$res
The table below displays a few examples of the prediction on the test set for Denmark (\(popN=1\)) and Sweden (\(popN=2\)). The prediction results are the predicted mean, standard deviation and the posterior variance (we don’t show the cross-covariance here). Users have the option to estimate either the variance of the underlying function \(f_*()\) or the observed values \(y_*\). Here, we are showing the variance (and standard deviation) of \(y_*\):
age | year | popN | mean | vr | std | |
---|---|---|---|---|---|---|
1 | 70 | 2013 | 1 | -3.822 | 0.0017 | 0.0413 |
2 | 71 | 2013 | 1 | -3.713 | 0.0017 | 0.0408 |
46 | 70 | 2013 | 2 | -4.039 | 0.0009 | 0.0303 |
47 | 71 | 2013 | 2 | -3.928 | 0.0009 | 0.0299 |
To assess model performance, we can compute the symmetric mean absolute percentage error (SMAPE):
# data with observed values:
mortData = createMortData(year_start=1990, year_end=2016, age_start=70, age_end=84,
sex="m", sex_cat="no")
yearsForecast = c(2013,2015,2016)
agesForecast = 70:84
observed= mortData[age %in% agesForecast & year %in% yearsForecast & country %in% ctry,]
observed = intCtry(ctry)
# merge observed data with predicted one:
compareGP = as.data.table(merge(predicted, observed, all.x=T, by=c("age","year","popN")))
smape = compareGP[,.(smape = mean(200*abs(y-mean)/(abs(y)+abs(mean)))),
by=.(country,year)]
country | year | smape |
---|---|---|
Denmark | 2013 | 1.4350 |
Sweden | 2013 | 0.8248 |
Denmark | 2015 | 1.2842 |
Sweden | 2015 | 1.0939 |
Denmark | 2016 | 1.1924 |
Sweden | 2016 | 0.8971 |
Next, we employ \(\color{blue}{\texttt{km()}}\) function in the package DiceKriging
to fit single-population models for Male observations, Ages 70–84 and Years 1990–2012 in Denmark and Sweden.
mortData = createMortData(year_start=1990,year_end=2012,age_start=70,age_end=84,
sex="m",sex_cat="no")
mortData = mortData[country %in% ctry]
# Single models:
countryUnq = unique(mortData$country)
mortSingle_nug = list()
nug = list()
mortSingle = list()
for (i in 1:length(countryUnq))
{
xMort = data.frame(age = mortData$age[mortData$country==countryUnq[i]],
year = mortData$year[mortData$country==countryUnq[i]])
yMort = mortData$y[mortData$country==countryUnq[i]]
mortSingle_nug[[i]] = km(formula = ~x.age,
design = data.frame(x = xMort), response = yMort,
nugget.estim=TRUE,
covtype="gauss",
optim.method="gen",
upper = c(45,45),
control=list(max.generations=100,pop.size=100,
wait.generations=10,
solution.tolerance=1e-5,
print.level = 0))
show(mortSingle_nug[[i]])
nug[i] = mortSingle_nug[[i]]@covariance@nugget
mortSingle[[i]] = km(formula = ~x.age,
design = mortSingle_nug[[i]]@X,
response = mortSingle_nug[[i]]@y,
noise.var = rep(nug[i],mortSingle_nug[[i]]@n),
coef.trend = mortSingle_nug[[i]]@trend.coef,
coef.cov = mortSingle_nug[[i]]@covariance@range.val,
coef.var = mortSingle_nug[[i]]@covariance@sd2,
covtype = mortSingle_nug[[i]]@covariance@name)
}
optimisation start
------------------
* estimation method : MLE
* optimisation method : gen
* analytical gradient : used
* trend model : ~x.age
* covariance model :
- type : gauss
- nugget : unknown homogenous nugget effect
- parameters lower bounds : 1e-10 1e-10
- parameters upper bounds : 45 45
- upper bound for alpha : 1
- best initial criterion value(s) : 609.3
Call:
km(formula = ~x.age, design = data.frame(x = xMort), response = yMort,
covtype = "gauss", nugget.estim = TRUE, optim.method = "gen",
upper = c(45, 45), control = list(max.generations = 100,
pop.size = 100, wait.generations = 10, solution.tolerance = 1e-05,
print.level = 0))
Trend coeff.:
Estimate
(Intercept) -10.5624
x.age 0.0984
Covar. type : gauss
Covar. coeff.:
Estimate
theta(x.age) 30.9560
theta(x.year) 19.5353
Variance estimate: 0.122
Nugget effect estimate: 0.001516
optimisation start
------------------
* estimation method : MLE
* optimisation method : gen
* analytical gradient : used
* trend model : ~x.age
* covariance model :
- type : gauss
- nugget : unknown homogenous nugget effect
- parameters lower bounds : 1e-10 1e-10
- parameters upper bounds : 45 45
- upper bound for alpha : 1
- best initial criterion value(s) : 713.4
Call:
km(formula = ~x.age, design = data.frame(x = xMort), response = yMort,
covtype = "gauss", nugget.estim = TRUE, optim.method = "gen",
upper = c(45, 45), control = list(max.generations = 100,
pop.size = 100, wait.generations = 10, solution.tolerance = 1e-05,
print.level = 0))
Trend coeff.:
Estimate
(Intercept) -11.2168
x.age 0.1086
Covar. type : gauss
Covar. coeff.:
Estimate
theta(x.age) 19.4663
theta(x.year) 10.7145
Variance estimate: 0.04152
Nugget effect estimate: 0.0008022
We apply the two single GP models fitted above to predict out-of-sample prediction for Ages 70–84 in Year 2013, 2015, and 2016. Then, we compute SMAPE values and compare the results with MOGP models.
yearsForecast = c(2013, 2015, 2016)
agesForecast = 70:84
nYr = length(yearsForecast)
nAg = length(agesForecast)
for (i in 1:length(countryUnq)){
mortData = createMortData(year_start=2013,year_end=2016,age_start=70,age_end=84,
sex="m",sex_cat="no")
observed = mortData[age %in% agesForecast & year %in% yearsForecast
& country==countryUnq[i]]
observed = observed[order(year)]
xPred = data.table(age = rep(agesForecast, nYr),
year = rep(yearsForecast, each = nAg))
mortPred = predict(mortSingle[[i]], newdata=data.frame(x=xPred),
cov.compute=TRUE, se.compute=TRUE,type="UK")
xPred$mean = mortPred$mean
xPred$y = observed$y
# smape:
print(countryUnq[i])
smape = xPred[,.(smape = mean(200*abs(y-mean)/(abs(y)+abs(mean)))),by=.(year)]
print(smape)
}
[1] "Denmark"
year smape
1: 2013 1.580
2: 2015 1.344
3: 2016 1.258
[1] "Sweden"
year smape
1: 2013 1.046
2: 2015 1.977
3: 2016 2.533
MOGP forecast is more accurate (smaller SMAPE) than SOGP for both populations.
Visualization of the prediction performance
To compare the prediction performance between single-population models and 2-population ICM, one way is to visualize the predicted values and the 95% credible bands of the prediction. Here, we illustrate how to compare the results in Sweden.
Prediction and 95% credible bands via Denmark-Sweden ICM:
yearsForecast = 1990:2020
agesForecast = c(70,84)
nYr = length(yearsForecast)
nAg = length(agesForecast)
npop = num_pop
xPred = data.frame(x1 = rep(agesForecast, each = nYr),
x2 = rep(yearsForecast, nAg),
x3 = rep(1:npop, each = nYr*nAg))
out = gp.predict(newdata = xPred,
gpmodel = gpFit,
list.noise = list.nug,
meanTr = "linearAg",
typePred = "fnoise")
predictedJ = out$res
predictedJ$l95 = predictedJ$mean - 1.96*predictedJ$std
predictedJ$u95 = predictedJ$mean + 1.96*predictedJ$std
predictedJ$rate = exp(predictedJ$mean)
predictedJ$l95r = exp(predictedJ$l95)
predictedJ$u95r = exp(predictedJ$u95)
Prediction and 95% credible bands via single-population GP for Sweden:
yearsForecast = 1990:2020
agesForecast = c(70,84)
nYr = length(yearsForecast)
nAg = length(agesForecast)
swe = data.frame(age = rep(agesForecast, each=nYr),
year= rep(yearsForecast, nAg))
mortPred = predict(mortSingle[[2]],
newdata=data.frame(x=swe),
cov.compute=TRUE, se.compute=TRUE,type="UK")
swe$m = mortPred$mean
y_sd = sqrt((mortPred$sd ^ 2) + nug[[2]])
swe$lower95 = mortPred$mean - 1.96 * y_sd # predictive CI for y
swe$upper95 = mortPred$mean + 1.96 * y_sd
swe$rate = exp(swe$m)
swe$l95r = exp(swe$lower95)
swe$u95r = exp(swe$upper95)
The interactive plots to visualize the predicted means and their 95% credible bands for Age 70 & 84 are below (R code for non-interactive plots are also provided). Note that the forecast period includes both in-sample (1990-2012) and out-of-sample (2013-2020).
The model prediction starts to diverge for \(x_{yr} \geq 2013\). We see that the predicted curves produced by a joint model are closer to the observed values in the test period from 2013–2016. We also observe that the posterior variance in MOGP is smaller than one from SOGP model (the 95% credible bands are smaller in MOGP).
Similar to Example 1, we create an aggregated mortality data to combine Danish Males and Females, on Ages 70–84 and Years 1990–2012. Females are coded as 0 and Males are coded as 1.
mortData = createMortData(year_start=1990, year_end=2012, age_start=70, age_end=84,
sex="m", sex_cat="yes")
mortData = mortData[country=="Denmark"]
mortData = mortData[order(sex)]
# sex=0: Females and 1: Males
y = mortData[,y] # output values
X = mortData[,.(age,year,sex)] # input values
X = as.matrix(X,ncol=3)
X = as.data.frame(X)
names(X) = c("x1","x2","x3")
Fitting ICM model for 2 populations and rank 2:
rank = 2
num_pop = 2
gpFit <- gp(formula = y ~ X$x1 + as.factor(X$x3), # mean function (linear in Age)
data = data.frame(y, X),
inputs = names(X),
cov = covICM(rank,num_pop),
compGrad = FALSE,
estim = TRUE,
noise = TRUE,
varNoiseIni = 1.0e-3,
varNoiseLower = 1.0e-5,
varNoiseUpper = 1.0e-2,
parCovIni = c(10,10,rep(0.15,rank*num_pop)),
# multistart = 4,
optimMethod = "NLOPT_LN_COBYLA",
opts=list("xtol_rel" = 1.0e-5,
"check_derivatives_tol" = 1.0e-5,
"population" = 120,
# "print_level" = 2,
"maxeval" = 150))
summary(gpFit)
Call:
gp(formula = y ~ X$x1 + as.factor(X$x3), data = data.frame(y,
X), inputs = names(X), cov = covICM(rank, num_pop), estim = TRUE,
compGrad = FALSE, noise = TRUE, varNoiseIni = 0.001, varNoiseLower = 1e-05,
varNoiseUpper = 0.01, parCovIni = c(10, 10, rep(0.15, rank *
num_pop)), optimMethod = "NLOPT_LN_COBYLA", opts = list(xtol_rel = 1e-05,
check_derivatives_tol = 1e-05, population = 120, maxeval = 150))
Number of observations: 690
Trend coef.:
Value
(Intercept) -11.0492
X$x1 0.1008
as.factor(X$x3)1 0.4304
Covariance whith class "covMan"
'User' covariance kernel
o Description: myGauss
o Dimension 'd' (nb of inputs): 3
o Parameters: "theta.ag", "theta.yr", "a11", "a21", "a12", "a22"
o Number of parameters: 6
o Accept matrix inputs.
o Param. values:
Value Lower Upper
theta.ag 15.01363 1e+00 30
theta.yr 11.69492 1e+00 30
a11 0.33616 1e-06 1
a21 0.09877 1e-06 1
a12 0.01610 1e-06 1
a22 0.15687 1e-06 1
Noise variance: 0.001
We can extract the factor loadings and compute the cross-covariance \(B\):
a.loadings <- gpFit$covariance@par[3:length(gpFit$covariance@par)]
group.loadings <- split(a.loadings,ceiling(seq_along(a.loadings)/num_pop))
B <- Reduce("+",lapply(group.loadings,function(x) x%*%t(x)))
rownames(B) <- c("Females","Males"); colnames(B) <- c("Females","Males")
B
Females Males
Females 0.11326 0.03573
Males 0.03573 0.03436
The cross-correlation \(R\) is:
Females Males
Females 1.0000 0.5727
Males 0.5727 1.0000
The correlation between Danish males and females is: \(r_{F,~M} \approx 0.57\).
Out-of-sample prediction via Danish Males-Females ICM
We create a test set to predict log-mortality for Ages 70–84 in Year 2013, 2015, and 2016:
yearsForecast = c(2013, 2015, 2016)
agesForecast = 70:84
nYr = length(yearsForecast)
nAg = length(agesForecast)
xPred = data.frame(x1 = rep(agesForecast, npop * nYr),
x2 = rep(rep(yearsForecast, each = nAg), npop),
x3 = rep(0:1, each = nYr * nAg))
For prediction, we need to provide the nugget from single-population models, fitted on Female and Male population in Denmark. We then apply the \(\color{blue}{\texttt{gp.predict()}}\) function for prediction.
list.nug = c(0.001487166, 0.001516271)
out = gp.predict(newdata = xPred,
gpmodel = gpFit,
list.noise = list.nug,
meanTr = "linearAg", # linear in Age only
typePred = "fnoise") # yhat and std(y)
# if typePred != "fnoise", variance and std of latent function f
predicted = out$res
Below, the table displays examples of the prediction on the test set for Danish Females (\(popN=1\)) and Danish Males (\(popN=2\)):
age | year | popN | mean | vr | std | |
---|---|---|---|---|---|---|
1 | 70 | 2013 | 0 | -4.288 | 0.0019 | 0.0437 |
2 | 71 | 2013 | 0 | -4.163 | 0.0018 | 0.0425 |
46 | 70 | 2013 | 1 | -3.808 | 0.0018 | 0.0425 |
47 | 71 | 2013 | 1 | -3.701 | 0.0017 | 0.0418 |
Compute SMAPE values:
# data with observed values:
mortData = createMortData(year_start=1990, year_end=2016, age_start=70, age_end=84,
sex="m", sex_cat="yes")
yearsForecast = c(2013,2015,2016)
agesForecast = 70:84
observed = mortData[age %in% agesForecast & year %in% yearsForecast & country=="Denmark",]
names(observed)[which(names(observed)=="sex")] = "popN"
# merge observed data with predicted one:
compareGP = as.data.table(merge(predicted, observed, all.x=T, by=c("age","year","popN")))
smape = compareGP[,.(smape = round(mean(200*abs(y-mean)/(abs(y)+abs(mean))),4)),
by=.(popN,year)]
popN | year | smape |
---|---|---|
0 | 2013 | 0.8488 |
1 | 2013 | 1.5697 |
0 | 2015 | 1.7786 |
1 | 2015 | 1.2976 |
0 | 2016 | 1.2391 |
1 | 2016 | 1.2046 |
We are using \(\color{blue}{\texttt{km()}}\) function in package DiceKriging
to fit single-population models for Males and Females in Denmark, Ages from 70–84 and Years 1990–2012:
mortData = createMortData(year_start=1990,year_end=2012,age_start=70,age_end=84,
sex="m",sex_cat="yes")
mortData = mortData[country=="Denmark"]
# Single models:
mortSingle_nug = list()
nug = list()
mortSingle = list()
for (i in 1:2){
xMort = data.frame(age = mortData$age[mortData$sex==(i-1)],
year = mortData$year[mortData$sex==(i-1)])
yMort = mortData$y[mortData$sex==(i-1)]
mortSingle_nug[[i]] = km(formula = ~x.age,
design = data.frame(x = xMort), response = yMort,
nugget.estim=TRUE,
covtype="gauss",
optim.method="gen",
upper = c(45,45),
control=list(max.generations=100,pop.size=100,
wait.generations=10,
solution.tolerance=1e-5,
print.level = 0))
show(mortSingle_nug[[i]])
nug[i] = mortSingle_nug[[i]]@covariance@nugget
mortSingle[[i]] = km(formula = ~x.age,
design = mortSingle_nug[[i]]@X,
response = mortSingle_nug[[i]]@y,
noise.var = rep(nug[i],mortSingle_nug[[i]]@n),
coef.trend = mortSingle_nug[[i]]@trend.coef,
coef.cov = mortSingle_nug[[i]]@covariance@range.val,
coef.var = mortSingle_nug[[i]]@covariance@sd2,
covtype = mortSingle_nug[[i]]@covariance@name)
}
optimisation start
------------------
* estimation method : MLE
* optimisation method : gen
* analytical gradient : used
* trend model : ~x.age
* covariance model :
- type : gauss
- nugget : unknown homogenous nugget effect
- parameters lower bounds : 1e-10 1e-10
- parameters upper bounds : 45 45
- upper bound for alpha : 1
- best initial criterion value(s) : 601.6
Call:
km(formula = ~x.age, design = data.frame(x = xMort), response = yMort,
covtype = "gauss", nugget.estim = TRUE, optim.method = "gen",
upper = c(45, 45), control = list(max.generations = 100,
pop.size = 100, wait.generations = 10, solution.tolerance = 1e-05,
print.level = 0))
Trend coeff.:
Estimate
(Intercept) -11.6755
x.age 0.1095
Covar. type : gauss
Covar. coeff.:
Estimate
theta(x.age) 11.5563
theta(x.year) 9.5257
Variance estimate: 0.04287
Nugget effect estimate: 0.001487
optimisation start
------------------
* estimation method : MLE
* optimisation method : gen
* analytical gradient : used
* trend model : ~x.age
* covariance model :
- type : gauss
- nugget : unknown homogenous nugget effect
- parameters lower bounds : 1e-10 1e-10
- parameters upper bounds : 45 45
- upper bound for alpha : 1
- best initial criterion value(s) : 608.6
Call:
km(formula = ~x.age, design = data.frame(x = xMort), response = yMort,
covtype = "gauss", nugget.estim = TRUE, optim.method = "gen",
upper = c(45, 45), control = list(max.generations = 100,
pop.size = 100, wait.generations = 10, solution.tolerance = 1e-05,
print.level = 0))
Trend coeff.:
Estimate
(Intercept) -10.5622
x.age 0.0984
Covar. type : gauss
Covar. coeff.:
Estimate
theta(x.age) 30.9379
theta(x.year) 19.5254
Variance estimate: 0.1218
Nugget effect estimate: 0.001516
Compute SMAPE values for out-of-sample prediction:
yearsForecast = c(2013, 2015, 2016)
agesForecast = 70:84
nYr = length(yearsForecast)
nAg = length(agesForecast)
for (i in 1:2){
mortData = createMortData(year_start=2013,year_end=2016,age_start=70,age_end=84,
sex="m",sex_cat="yes")
observed = mortData[age %in% agesForecast & year %in% yearsForecast
& country=="Denmark" & sex==(i-1)]
observed = observed[order(year)]
xPred = data.table(age = rep(agesForecast, nYr),
year = rep(yearsForecast, each = nAg))
mortPred = predict(mortSingle[[i]], newdata=data.frame(x=xPred),
cov.compute=TRUE, se.compute=TRUE,type="UK")
xPred$mean = mortPred$mean
xPred$y = observed$y
# smape:
print(paste("sex = ",i-1))
smape = xPred[,.(smape = round(mean(200*abs(y-mean)/(abs(y)+abs(mean))),4)),by=.(year)]
print(smape)
}
[1] "sex = 0"
year smape
1: 2013 0.9422
2: 2015 1.8972
3: 2016 1.4004
[1] "sex = 1"
year smape
1: 2013 1.580
2: 2015 1.344
3: 2016 1.258
Again, MOGP model produces more accurate prediction than SOGP for both Males and Females in Denmark.
Coherence mortality forecasts
Fitting GP models for individual populations tends to generate divergence in long-range forecast that are inconsistent with historical patterns. To illustrate, we apply SOGP and MOGP models for out-of-sample forecasts, Age 70. Then, we compute the log-mortality difference between Danish Males and Females.
Difference in log-mortality between Males and Females for Age 70, projected from 1990–2050, based on two SOGP models:
yearsForecast = 1990:2050
agesForecast = 70:84
nAg = length(agesForecast)
nYr = length(yearsForecast)
Fem = matrix(rep(NA, nYr * nAg), ncol = nAg, nrow = nYr)
Mal = matrix(rep(NA, nYr * nAg), ncol = nAg, nrow = nYr)
for (i in 1:nAg){
agesPred = agesForecast[i]
nYr = length(yearsForecast)
nAg = length(agesPred)
xPred = data.table(age=rep(agesPred,nYr),year=rep(yearsForecast,nAg))
mortPred = predict(mortSingle[[1]],newdata=data.frame(x=xPred),
cov.compute=TRUE, se.compute=TRUE, type="UK")
Fem[,i] = mortPred$mean
xPred = data.table(age=rep(agesPred,nYr),year=rep(yearsForecast,nAg))
mortPred = predict(mortSingle[[2]],newdata=data.frame(x=xPred),
cov.compute=TRUE, se.compute=TRUE, type="UK")
Mal[,i] = mortPred$mean
}
DiffLog70 = round(Mal-Fem,3)
Difference in log-mortality between Males and Females for Age 70, projected from 1990–2050, based on MOGP model:
yearsForecast = 1990:2050
agesForecast = 70:84
nAg = length(agesForecast)
nYr = length(yearsForecast)
FemJ = matrix(rep(NA, nYr * nAg), ncol = nAg, nrow = nYr)
MalJ = matrix(rep(NA, nYr * nAg), ncol = nAg, nrow = nYr)
for (i in 1:nAg){
agesPred = agesForecast[i]
nYr = length(yearsForecast)
nAg = length(agesPred)
npop = num_pop
xPred = data.frame(x1 = rep(agesPred, npop * nYr),
x2 = rep(rep(yearsForecast, each = nAg), npop),
x3 = rep(0:1, each = nYr * nAg))
list.nug = c(0.001487166, 0.001516271)
out = gp.predict(newdata = xPred,
gpmodel = gpFit,
list.noise = list.nug,
meanTr = "linearAg", # linear in Age only
typePred = "fnoise") # yhat and std(y)
FemJ[,i] = out$res$mean[out$res$popN==0]
MalJ[,i] = out$res$mean[out$res$popN==1]
}
DiffLog70J = round(MalJ-FemJ,3)
The first heatmap displays the differences via SOGP models. It implies that as early as 2030, Males will have lower mortality than females. In contrast, the second heatmap is the MOGP forecasted differences: it is coherent as Females are projected to maintain higher mortality than Males.
In this section, we expand the ICM framework to fit many more populations. Specifically, we build a 7-population ICM with Rank 3, on Ages 70–84 and Years 1990–2016 for Male observations. Three scenarios with different assumptions on the mortality improvement factors are fitted.
Mortality Improvement Factors: measures longevity changes year-over-year. In terms of the observations, the raw annual percentage mortality improvement is: \(1-\dfrac{\exp{\big(y(x_{ag};x_{yr})\big)}}{\exp{\big(y(x_{ag};x_{yr}-1)\big)}}\). The smoothed improvement factor is obtained by replacing \(y\)’s by the GP model posterior \(m_*\)’s: \[1-\dfrac{\exp{\big(m_*(x_{ag};x_{yr})\big)}}{\exp{\big(m_*(x_{ag};x_{yr}-1)\big)}}\]
Prior to fitting, we prepare the aggregated data from 7 countries: Austria, Denmark, Germany, Netherlands, Sweden, Switzerland, and UK.
mortData = createMortData(year_start=1990, year_end=2016, age_start=70, age_end=84,
sex="m", sex_cat="no")
ctry = c("Austria","Denmark","Germany","Netherlands","Sweden","Switzerland","UK")
mortData = mortData[country %in% ctry]
mortData = intCtry(ctry)
y = mortData[,y]
X = mortData[,.(age,year,popN)]
X = as.matrix(X,ncol=3)
X = as.data.frame(X)
names(X) = c("x1","x2","x3")
Case I: Zero long-term mortality improvement, captured by the linear mean function: \[m(x^n)=\beta_0+\beta^{ag}_1x_{ag}^n+\sum_{l=2}^L\beta_{pop,l}\big(x_{pop,l}^n\big)\]
rank = 3
num_pop = 7
gpFitI <- gp(formula = y ~ X$x1 + as.factor(X$x3), # mean function (linear in Age)
data = data.frame(y, X),
inputs = names(X),
cov = covICM(rank,num_pop),
compGrad = FALSE,
estim = TRUE,
noise = TRUE,
varNoiseIni = 1.0e-3,
varNoiseLower = 1.0e-5,
varNoiseUpper = 1.0e-2,
parCovIni = c(10,10,rep(0.10,rank*num_pop)),
optimMethod = "NLOPT_LN_COBYLA",
opts=list("xtol_rel" = 1.0e-5,
"check_derivatives_tol" = 1.0e-5,
"population" = 150,
# "print_level" = 2,
"maxeval" = 150))
Case II: Long-term mortality improvement based on a historical pattern, captured by: \[m(x^n)=\beta_0 + \beta^{ag}_1x_{ag}^n+\beta^{yr}_1x_{yr}^n + \sum_{l=2}^L \beta_{pop,l}x_{pop,l}^n\]
gpFitII <- gp(formula = y ~ X$x1 + X$x2 + as.factor(X$x3), # linear in Age and Yr
data = data.frame(y, X),
inputs = names(X),
cov = covICM(rank,num_pop),
compGrad = FALSE,
estim = TRUE,
noise = TRUE,
varNoiseIni = 1.0e-3,
varNoiseLower = 1.0e-5,
varNoiseUpper = 1.0e-2,
parCovIni = c(10,10,rep(0.10,rank*num_pop)),
optimMethod = "NLOPT_LN_COBYLA",
opts=list("xtol_rel" = 1.0e-5,
"check_derivatives_tol" = 1.0e-5,
"population" = 150,
# "print_level" = 2,
"maxeval" = 150))
Case III: Long-term mortality improvement based on expert judgement. We again use \(m(x^n)=\beta_0 + \beta^{ag}_1x_{ag}^n+\beta^{yr}_1x_{yr}^n + \sum_{l=2}^L \beta_{pop,l}x_{pop,l}^n\) but this time the \(\beta_1^{yr}\) coefficients is chosen by the modeler.
betaI = gpFitI$betaHat
betaI.noY = c(betaI[1:2],0,betaI[3:length(betaI)])
betaII = gpFitII$betaHat
betaIII = (betaI.noY+betaII)/2
gpFitIII <- gp(formula = y ~ X$x1 + X$x2 + as.factor(X$x3),
beta = betaIII,
data = data.frame(y, X),
inputs = names(X),
cov = covICM(rank,num_pop),
compGrad = FALSE,
estim = TRUE,
noise = TRUE,
varNoiseIni = 1.0e-3,
varNoiseLower = 1.0e-5,
varNoiseUpper = 1.0e-2,
parCovIni = c(10,10,rep(0.10,rank*num_pop)),
optimMethod = "NLOPT_LN_COBYLA",
opts=list("xtol_rel" = 1.0e-5,
"check_derivatives_tol" = 1.0e-5,
"population" = 150,
# "print_level" = 2,
"maxeval" = 150))
After fitting these models, we create a test set for all cases to predict both log-mortality and improvement rates for Age 70.
fordwardYears = 1990:2060
backwardYears = 1989:2059
agesForecast = 70
nYr = length(fordwardYears)
nAg = length(agesForecast)
npop = num_pop
xPredFord = data.frame(x1 = rep(agesForecast, npop*nYr),
x2 = rep(fordwardYears, npop),
x3 = rep(1:npop, each = nYr))
xPredBack = data.frame(x1 = rep(agesForecast, npop*nYr),
x2 = rep(backwardYears, npop),
x3 = rep(1:npop, each = nYr))
# nugget extracted from individual models:
load("nugget_mortData.Rda")
nugget = nugget[country %in% ctry & nug.year==2016]
nugget = nugget[order(country)]
list.nug = nugget$nug
Prediction via Case I:
# log-mortality prediction:
out = gp.predict(newdata = xPredFord,
gpmodel = gpFitI,
list.noise = list.nug,
meanTr = "linearAg",
typePred = "fnoise")
logmortI = out$res
# improvement rate factors:
out = gp.predict(newdata = xPredBack,
gpmodel = gpFitI,
list.noise = list.nug,
meanTr = "linearAg",
typePred = "fnoise")
logmortBackI = out$res
logmortI$impr = 1 - exp(logmortI$mean)/exp(logmortBackI$mean)
Prediction via Case II:
# log-mortality prediction:
out = gp.predict(newdata = xPredFord,
gpmodel = gpFitII,
list.noise = list.nug,
meanTr = "linearAgYr",
typePred = "fnoise")
logmortII = out$res
# improvement rate factors:
out = gp.predict(newdata = xPredBack,
gpmodel = gpFitII,
list.noise = list.nug,
meanTr = "linearAgYr",
typePred = "fnoise")
logmortBackII = out$res
logmortII$impr = 1 - exp(logmortII$mean)/exp(logmortBackII$mean)
Prediction via Case III:
# log-mortality prediction:
out = gp.predict(newdata = xPredFord,
gpmodel = gpFitIII,
list.noise = list.nug,
meanTr = "linearAgYr",
typePred = "fnoise")
logmortIII = out$res
# improvement rate factors:
out = gp.predict(newdata = xPredBack,
gpmodel = gpFitIII,
list.noise = list.nug,
meanTr = "linearAgYr",
typePred = "fnoise")
logmortBackIII = out$res
logmortIII$impr = 1 - exp(logmortIII$mean)/exp(logmortBackIII$mean)
The choice of mean function \(m(.)\) has minimal impact on in-sample forecasts that are largely driven by the training data covering 1990–2016. In contrast, the long-term levels of mortality improvement are completely driven by \(m(.)\). When visualizing the forecasts for all cases, we see a strong coherence so that mortality rates across populations all move roughly in unison over time, matching our intuition about the persistent commonality of their future mortality experiences.