In Tutorial I and II, we mainly focus on the use of MOGP model for isotropic data, or when all \(L\) populations have the same set of inputs. In this tutorials, we will illustrate how to generalize this framework when we have missing data or when populations have different set of inputs.
Loading all packages into the current working directory:
DiceKriging
: kriging methods for single-output GP (SOGP).rgenoud
: genetic optimization solver used for Maximum Likelihood Estimation of GP hyperparameters.kergp
: kriging methods for user-defined kernel, needed for both Full-rank and ICM Multi-output GP.nloptr
: solve nonlinear optimization using R interface to NLopt.data.table
: extension of data.frame
to enhance data manipulation.Loading all user-defined functions:
## create ST (to transform the vector of latent functions to observed functions)
createST = function(df){
unq.loc = unique(df[,1:2]) # loc = (age,year)
npop = length(unique(df[,3])) # integer coding for different populations
vec.pop = rep(1:npop,each=nrow(unq.loc))
# re-create a full data:
full = cbind(do.call("rbind", replicate(npop, unq.loc, simplify = FALSE)), vec.pop)
lab.notch = paste(df[,1],df[,2],df[,3],sep="")
lab.full = paste(full[,1],full[,2],full[,3],sep="")
out = outer(lab.notch,lab.full,"==")
out = ifelse(out==T,1,0)
return(out)
}
age | year | popN | |
---|---|---|---|
r1 | 70 | 1990 | 1 |
r2 | 71 | 1991 | 1 |
r3 | 70 | 1990 | 2 |
age | year | popN | |
---|---|---|---|
r1 | 70 | 1990 | 1 |
r2 | 71 | 1991 | 1 |
r3 | 70 | 1990 | 2 |
r4 | 71 | 1991 | 2 |
Thus, the transpose of the communication matrix \(S^T\) is:
r1 r2 r3 r4
r1 1 0 0 0
r2 0 1 0 0
r3 0 0 1 0
Below, we write a function to create integer coding for populations in MOGP. This is equivalent to create dummy variables for categorical input.
For notched setup, let’s pretend that Hungary has not yet had 2016 data while Estonia already does. The goal is to build a joint model for Hungary and Estonia and apply the model to predict 2016 log-mortality rates in Hungary. We then compare the prediction performance of this model, in terms of SMAPE and the standard deviation of \(\mathbf{f_*}\) with ones in the single model for Hungary, using data in 1990–2015.
The initial step is to prepare an aggregated data for Hungarian and Estonian mortality for Males, on Ages 70–84.
mortData = createMortData(year_start=1990, year_end=2016,
age_start=70, age_end=84,
sex="m", sex_cat="no")
ctry = c("Estonia","Hungary")
mortData = mortData[country %in% ctry]
mortData = intCtry(ctry)
mortData = mortData[-which(country=="Hungary" & year==2016)] # remove 2016 Hungarian data
The output vector \((y)\) and input vector \((X)\) must be provided:
y = mortData[,y]
X = mortData[,.(age,year,popN)]
X = as.matrix(X,ncol=3)
X = as.data.frame(X)
names(X) = c("x1","x2","x3")
Prior to fitting the model, we need to specify the number of populations, choice of rank, and more importaly provide the transpose of the communication matrix, \(S^T\). Similar to Part I and Par II, the function \(\color{blue}{\texttt{gp()}}\) can be called to fit the model. Specifically, we define the ICM kernel through the \(\color{blue}{\texttt{covICM_Notched()}}\) function with \(\color{blue}{\texttt{rank = 2}}\) and \(\color{blue}{\texttt{num_pop = 2}}\).
rank = 2
num_pop = 2
ST = createST(X)
gpFit <- gp(formula = y ~ X$x1 + as.factor(X$x3),
data = data.frame(y, X),
inputs = names(X),
cov = covICM_Notched(rank,num_pop),
compGrad = FALSE,
estim = TRUE,
noise = TRUE,
varNoiseIni = 1.0e-3,
varNoiseLower = 1.0e-5,
varNoiseUpper = 1.0e-2,
parCovIni = c(10,10,rep(0.15,rank*num_pop)),
# multistart = 4,
optimMethod = "NLOPT_LN_COBYLA",
opts=list("xtol_rel" = 1.0e-5,
"check_derivatives_tol" = 1.0e-5,
"population" = 120,
# "print_level" = 2,
"maxeval" = 150))
The summary output of the fitted model:
Call:
gp(formula = y ~ X$x1 + as.factor(X$x3), data = data.frame(y,
X), inputs = names(X), cov = covICM_Notched(rank, num_pop),
estim = TRUE, compGrad = FALSE, noise = TRUE, varNoiseIni = 0.001,
varNoiseLower = 1e-05, varNoiseUpper = 0.01, parCovIni = c(10,
10, rep(0.15, rank * num_pop)), optimMethod = "NLOPT_LN_COBYLA",
opts = list(xtol_rel = 1e-05, check_derivatives_tol = 1e-05,
population = 120, maxeval = 150))
Number of observations: 795
Trend coef.:
Value
(Intercept) -9.25709
X$x1 0.08858
as.factor(X$x3)2 0.02404
Covariance whith class "covMan"
'User' covariance kernel
o Description: myGauss
o Dimension 'd' (nb of inputs): 3
o Parameters: "theta.ag", "theta.yr", "a11", "a21", "a12", "a22"
o Number of parameters: 6
o Accept matrix inputs.
o Param. values:
Value Lower Upper
theta.ag 14.48103 5e+00 30.0
theta.yr 8.00399 5e+00 30.0
a11 0.18953 1e-06 0.5
a21 0.12657 1e-06 0.5
a12 0.05620 1e-06 0.5
a22 0.06429 1e-06 0.5
Noise variance: 0.004
With the ICM covariance kernel, we can extract the factor loadings and compute the cross-covariance matrix \(B\):
a.loadings <- gpFit$covariance@par[3:length(gpFit$covariance@par)]
group.loadings <- split(a.loadings,ceiling(seq_along(a.loadings)/num_pop))
B <- Reduce("+",lapply(group.loadings,function(x) x%*%t(x)))
rownames(B) <- ctry; colnames(B) <- ctry
B
Estonia Hungary
Estonia 0.03908 0.02760
Hungary 0.02760 0.02015
The inferred cross-correlation matrix \(R\) from \(B\):
Estonia Hungary
Estonia 1.0000 0.9835
Hungary 0.9835 1.0000
Hungary and Estonia are highly correlated with the correlation coefficient of \(r_{EST,~HUN} \approx 0.98\).
2016 out-of-sample prediction for Hungary via 2-population ICM model
We apply the fitted model to predict 2016 log-mortality rates for Ages 70–84 in Hungary. First, we create a test set:
# new data frame for prediction
yearsForecast = 2016
agesForecast = 70:84
nYr = length(yearsForecast)
nAg = length(agesForecast)
npop = 2
xPred = data.frame(x1 = rep(agesForecast, npop),
x2 = rep(yearsForecast, each = npop),
x3 = rep(1:npop, each = nAg))
We further extract \(\sigma^2_l\) from single-population models: \(\sigma^2_{EST}\) from single-population for Estonia (Ages 70–84, Years 1990–2016) and \(\sigma^2_{HUN}\) from single-population for Hungary (Ages 70–84, Years 1990–2015).
load("nugget_mortData.Rda")
nugget = nugget[country %in% ctry]
nugget = nugget[(country=="Hungary" & nug.year==2015)|(country=="Estonia" & nug.year==2016)]
(list.nug = nugget$nug)
[1] 0.001980 0.005019
Finally, we apply the user-defined function \(\color{blue}{\texttt{gp.predictionNotch()}}\) from the source \(\color{blue}{\texttt{gp_predictionNotched}}\) for prediction. Note that we have to provide the covariance matrix for new inputs in the test set (\(\color{blue}{\texttt{covXnew}}\)). Prior to estimate \(\color{blue}{\texttt{covXnew}}\), we must re-compute the transpose of the communication matrix, \(S^T\), for the test set.
ST = createST(xPred);
covXnew = covMat(gpFit$covariance, xPred)
ST = createST(gpFit$X)
out = gp.predictionNotch(newdata = xPred,
gpmodel = gpFit,
list.noise = list.nug,
meanTr = "linearAg",
typePred = "f",
covXnew = covXnew)
predicted = out$res
Prediction results for Age 70 and 84 in Hungary:
age | year | popN | mean | vr | std | |
---|---|---|---|---|---|---|
16 | 70 | 2016 | 2 | -3.236 | 3e-04 | 0.0176 |
30 | 84 | 2016 | 2 | -2.050 | 3e-04 | 0.0176 |
We further compute SMAPE to assess model performance:
# data with observed values:
mortData = createMortData(year_start=2016, year_end=2016, age_start=70, age_end=84, sex="m", sex_cat="no")
yearsForecast = c(2013,2015,2016)
agesForecast = 70:84
observed= mortData[age %in% agesForecast & year %in% yearsForecast & country %in% ctry,]
observed = intCtry(ctry)
# merge observed data with predicted one:
compareGP = as.data.table(merge(predicted, observed, all.x=T, by=c("age","year","popN")))
smape = compareGP[,.(smape = mean(200*abs(y-mean)/(abs(y)+abs(mean)))),
by=.(country,year)]
country | year | smape |
---|---|---|
Estonia | 2016 | 1.938 |
Hungary | 2016 | 0.947 |
Lastly, we extract the standard deviation of \(f_*\) via the ICM model at new inputs:
Assume that Hungary doesn’t have 2016 mortality data available, one approach is to fit a single-population model, Ages 70–84 and Years 1990–2015 and perform one-year out forecast for 2016. The \(\color{blue}{\texttt{km()}}\) function in package DiceKriging
is employed to fit the model:
mortData = createMortData(year_start=1990,year_end=2015,age_start=70,age_end=84, sex="m",sex_cat="no")
mortData = mortData[country=="Hungary"]
# Single models:
xMort = data.frame(age = mortData$age,
year = mortData$year)
yMort = mortData$y
mortSingle_nug = km(formula = ~x.age,
design = data.frame(x = xMort),
response = yMort,
nugget.estim=TRUE,
covtype="gauss",
optim.method="gen",
upper = c(45,45),
control=list(max.generations=100,pop.size=100,
wait.generations=10,
solution.tolerance=1e-5,
print.level = 0))
optimisation start
------------------
* estimation method : MLE
* optimisation method : gen
* analytical gradient : used
* trend model : ~x.age
* covariance model :
- type : gauss
- nugget : unknown homogenous nugget effect
- parameters lower bounds : 1e-10 1e-10
- parameters upper bounds : 45 45
- upper bound for alpha : 1
- best initial criterion value(s) : 626.9
nug = mortSingle_nug@covariance@nugget
mortSingle = km(formula = ~x.age,
design = mortSingle_nug@X,
response = mortSingle_nug@y,
noise.var = rep(nug,mortSingle_nug@n),
coef.trend = mortSingle_nug@trend.coef,
coef.cov = mortSingle_nug@covariance@range.val,
coef.var = mortSingle_nug@covariance@sd2,
covtype = mortSingle_nug@covariance@name)
The output of the single model for Hungary:
Call:
km(formula = ~x.age, design = data.frame(x = xMort), response = yMort,
covtype = "gauss", nugget.estim = TRUE, optim.method = "gen",
upper = c(45, 45), control = list(max.generations = 100,
pop.size = 100, wait.generations = 10, solution.tolerance = 1e-05,
print.level = 0))
Trend coeff.:
Estimate
(Intercept) -8.6385
x.age 0.0806
Covar. type : gauss
Covar. coeff.:
Estimate
theta(x.age) 8.6522
theta(x.year) 6.5035
Variance estimate: 0.01263
Nugget effect estimate: 0.00198
Let’s predict 2016 log-mortality rates for Ages 70–84 and compute SMAPE value:
yearsForecast = 2016
agesForecast = 70:84
nYr = length(yearsForecast)
nAg = length(agesForecast)
mortData = createMortData(year_start=2016,year_end=2016,age_start=70,age_end=84, sex="m",sex_cat="no")
observed = mortData[age %in% agesForecast & year %in% yearsForecast
& country=="Hungary"]
observed = observed[order(year)]
xPred = data.table(age = rep(agesForecast, nYr),
year = rep(yearsForecast, each = nAg))
mortPred = predict(mortSingle, newdata=data.frame(x=xPred),
cov.compute=TRUE, se.compute=TRUE,type="UK")
xPred$mean = mortPred$mean
xPred$y = observed$y
# smape:
smape = xPred[,.(smape = mean(200*abs(y-mean)/(abs(y)+abs(mean)))),by=.(year)]
year | smape |
---|---|
2016 | 1.575 |
\(\text{SMAPE}_{SOGP}=1.575 > \text{SMAPE}_{MOGP}=0.947\). Recall that lower SMAPE is better, so the joint Estonia-Hungary model is significantly more accurate than the model fitted to Hungary data only. This illustrates how we boost the model’s predictive power by incorporating most recent data from an actuarially-similar population.
The standard deviation of \(f_*\) via single model at new inputs:
We compare the standard deviation of \(f_*\) at new inputs between two models:
The strong correlation between Estonia and Hungary (\(r_{EST,~HUN} \approx 0.98\)) helps reduce the uncertainty in Hungarian prediction. This is equivalent to increase the forecast credibility in Hungary as the standard deviation of the latent function \(f\) at the new inputs via MOGP model are significantly smaller than ones via SOGP. We recommend adding more data that are highly correlated with Hungary to further observe improvement in the prediction accuracy and credibility.