The analysis below applies the SOGP framework to investigate Covid-19 excess deaths. To do so we utilize the recently published short-term mortality fluctuation (STMF) dataset published by the Human Mortality Database. The STMF was expressly created to accomodate the critical need of short-term mortality analyses in response to the Covid-19 pandamic. The STMF contains detailed country-specific mortality, segregated by week (52 weeks per year), sex, and 3 age groups in 32 countries:
Austria (AUT)
Belgium (BEL)
Bulgaria (BGR)
Switzerland (CHE)
Czech Republic (CZE)
Germany (DEUTNP)
Denmark (DNK)
Spain (ESP)
Estonia (EST)
Finland (FIN)
France (FRATNP)
England and Wales (GBRTENW)
Northern Ireland (GBR_NIR)
Scotland (GBR_SCO)
Greece (GRC)
Croatia (HRV)
Hungary (HUN)
Iceland (ISL)
Israel (ISR)
Italy (ITA)
Lithuania (LTU)
Luxembourg (LUX)
Latvia (LVA)
Netherlands (NLD)
Norway (NOR)
Poland (POL)
Portugal (PRT)
Russia (RUS)
Slovakia (SVK)
Slovenia (SVN)
Sweden (SWE)
United States (USA)
The dataset is frequently updated. The analysis below is based on datasets last updated on September 17, 2020:
For our analysis we work with mortality rates (provided in STMF as the RTotal field) transformed to the log scale. We furthermore apply a few adjustments:
We ignore Week 22 and 35 in England & Wales due to a persistent dip in reported deaths during those weeks due to national holidays.
We drop the very last reported week of 2020 for CHE, FIN, NOR, SWE, and USA which have steep declines due to incomplete reporting.
We drop Iceland and Luxembourg which have different seasonality and much more noisy data due to very small population (more than 4 times smaller than all other countries considered).
We drop Russia which has limited data available as the most recent data is only up to 2018.
For purposes of illustration, we work with all-age, all-gender mortality
deaths <- data.table(deaths)
deaths <- deaths[Sex=="b",.(Year,Week,RTotal,DTotal,CountryCode)]
deaths <- deaths[,y:=log(RTotal)]
\(\color{#c64329}{\textbf{Gaussian Process Regression in estimating 2020 excess deaths}}\)
Our primary goal is to estimate excess mortality during 2020. This is defined as the difference between the observed and the expected number of deaths. Epidemiologists use excess mortality as a standard tool to understand the full impact of the pandemic. It does not only reflect deaths dirctly due to Covid-19 but also other causes associated with the pandemic, such as delayed or forgone care due to hospitals prioritizing Covid-19 emergencies or patients themselves staying away.
To define excess mortality requires agreeing on the baseline, i.e. the expected number of deaths if Covid-19 did not occur. A very simple method is to look at historical weekly mortality and then average. For example, one common metric is the median mortality over the previous 5 years, 2015-2019. However, this approach ignores trends in the data (specifically the mortality improvement factors whereby mortality gets better YoY), is sensitive to data outliers, and most importantly does not provide good uncertainty quantification. UQ is critical to answer the question of whether observed death counts are statistically significant or can be attributed to “random” fluctuations. Mortality data is intrinsically volatile and it is not straightforward to distinguish between frequent fluctuations that could happen versus an unambiguous Covid-19 spike.
Our analysis uses statistical learning, namely GP models to answer the above. To do so, we build a GP model on the historical data, taking for our training set Years 2010-2019, as well as Weeks 1-5 of 2020. We then use this training set to make probabilistic forecasts for Weeks 6-present of 2020 and compare this forecast to the observed mortality. The vertical blue line in the figure below delineates our training set from the forecast period.
The Figure below displays the raw weekly log-mortality rates (after adjustments listed above) for each country indexed by weeks of the year. The 2020 observed log-mortality series is highlighted in red.
Most countries display strong annual seasonality, with mortality highest in Winter. In many countries, there is a smaller peak in Summer.
Not all countries have data available as early as 2010 in the STMF; for instance, starting year in Germany is 2016 and in the US is 2013.
Most countries have started to experience the outbreaks of Covid-19 in early to late March, so it is reasonable to assume that the first 5 weeks in 2020 don’t include any excess deaths.
To apply the Gaussian process modeling, we treat data as a 2-dimensional table, indexed by Weeks (1-52, coordinate 1) and Years (2010-2020, coordinate 2):
We use the Squared-Exponential anisotropic kernel with two lengthscales \(\theta_{Wk}, \theta_{Yr}\).
For the mean function we use a periodic sinusoidal trend in Week with an annual and semi-annual frequency to capture the aforementioned effects, as well as a linear trend in Year to capture a baseline constant mortality improvement factor: \[m(x^n)= \beta_0 + \beta_{yr}x^n_{yr} + A_1 \cos\bigg(\dfrac{2\pi}{52}x^n_{wk}\bigg) + A_2\sin\bigg(\dfrac{2\pi}{52}x^n_{wk}\bigg)+ A_3\cos\bigg(\dfrac{4\pi}{52}x^n_{wk}\bigg)+ A_4\sin\bigg(\dfrac{4\pi}{52}x^n_{wk}\bigg)\] where:
We fit a constant observation noise \(\sigma_\epsilon\) that is estimated as part of the MLE procedure
We apply a genetic optimizer to fit the GP via MLE as implemented in the DiceKriging R package.
Within the 2-dim representation of our data, our training set has a notched shape (since we include a few weeks of 2020 in addition to full-year observations prior to that). This would be difficult to use in a time-series model but is straightforward in the spatial GP paradigm.
Our goal is to use a Gaussian Process model to:
smooth log-mortality rates in Year and Week dimensions;
perform out-of-sample predictions in 2020 starting from Week 5 to Week 37 (middle of September 2020).
Note that GP is effectively a spatial model and does not have any time-series concepts embedded; it treats retrospective and future forecasts identically.
\(\color{#c64329}{\textbf{Sample Gaussian Process Regression for Sweden all-Age Mortality}}\) Below is an illustration on the application of Gaussian Process to analyze and forecast time-series data in Sweden.
\(\color{#c64329}{\textbf{Excess mortality}}\)
We can compare the GP-based mean forecast for 2020 against the observed weekly log-mortality rates to identify excess mortality. In the Figure below, the shaded region is the excess mortality starting at Week 11 (early March). Some countries have experienced much more severe impact than others. Observe that 2020 data ends at different weeks in different nations. We also note that in most countries there is a strong seasonal effect whereby mortality is expected to decrease as we enter the Summer months.
\(\color{#c64329}{\textbf{Credible Bands}}\)
To highlight the uncertainty quantification available with a GP model, we plot below the credible intervals around our GP-based 2020 forecasts. Below, we plot the mean forecasts for each country along with the 80% and 95% credible bands. We also show the 5-year medians and the observed 2020 rate for comparison purpose.
We note that in several countries, there is no statistically significant deviation from expected mortality. In other cases, such as Spain and USA, the Covid-19 spikes are clear.
A few remarks can be made relative to the historical 5-year median:
The historical median is much noisier and hence is problematic as a baseline compared to the smoothed GP forecast
In most countries there is a strong annual trend, namely forecasted mortality is above the historical median. This occurs due to overall aging of the populations driving up total mortality counts.
As a result of this gap, the estimated excess deaths in our framework are lower compared to relying on historical median.
\(\color{#c64329}{\textbf{Excess deaths}}\)
Country | Total Excess Deaths in 2020 | Last Week Reported |
---|---|---|
Austria | 363 | 35 |
Belgium | 8328 | 35 |
Bulgaria | -944 | 36 |
Switzerland | 674 | 33 |
Czech Republic | -411 | 30 |
Germany | 4011 | 33 |
Denmark | 363 | 36 |
Spain | 43614 | 35 |
Estonia | 148 | 36 |
Finland | 297 | 34 |
France | 18415 | 33 |
England and Wales | 46768 | 36 |
Nothern Ireland | 724 | 32 |
Scotland | 3622 | 37 |
Greece | -506 | 26 |
Croatia | -599 | 14 |
Hungary | -2242 | 33 |
Israel | 321 | 34 |
Italy | 48789 | 26 |
Lithuania | 188 | 31 |
Latvia | -386 | 37 |
Netherlands | 7430 | 35 |
Norway | 305 | 34 |
Poland | 184 | 26 |
Portugal | 3641 | 35 |
Slovakia | -746 | 30 |
Slovenia | -106 | 13 |
Sweden | 5741 | 35 |
USA | 216381 | 33 |
\(\color{#c64329}{\textbf{GP Diagnostics}}\)
For completeness, we report below the fitted GP mean functions for each country, as well as the GP hyperparameters. We note that our basic model had difficulty finding appropriate lengthscales for E&W and was fitted manually for now.
Intercept | Year trend | A_1 | A_2 | A_3 | A_4 | |
---|---|---|---|---|---|---|
AUT | -6.010 | 0.0007 | 0.0785 | 0.0531 | 0.0036 | 0.0306 |
BEL | -1.517 | -0.0016 | 0.0917 | 0.0720 | 0.0095 | 0.0261 |
BGR | -18.014 | 0.0069 | 0.0937 | 0.0651 | 0.0275 | 0.0154 |
CHE | -1.193 | -0.0018 | 0.0879 | 0.0456 | 0.0006 | 0.0293 |
CZE | -12.113 | 0.0037 | 0.0643 | 0.0478 | -0.0024 | 0.0274 |
DEUTNP | -16.387 | 0.0059 | 0.0787 | 0.0646 | -0.0067 | 0.0335 |
DNK | 3.331 | -0.0040 | 0.0677 | 0.0569 | 0.0047 | 0.0194 |
ESP | -26.612 | 0.0108 | 0.1048 | 0.0805 | 0.0365 | 0.0419 |
EST | -5.406 | 0.0005 | 0.0758 | 0.0575 | -0.0046 | 0.0245 |
FIN | -13.872 | 0.0046 | 0.0636 | 0.0436 | -0.0009 | 0.0116 |
FRATNP | -22.259 | 0.0087 | 0.0988 | 0.0526 | 0.0130 | 0.0233 |
GBRTENW | -13.563 | 0.0044 | 0.1077 | 0.0152 | 0.0161 | -0.0019 |
GBR_NIR | 2.126 | -0.0034 | 0.1048 | 0.0604 | 0.0116 | 0.0175 |
GBR_SCO | -20.849 | 0.0081 | 0.0913 | -0.0189 | 0.0243 | -0.0157 |
GRC | -36.836 | 0.0160 | 0.0769 | 0.0526 | 0.0595 | 0.0442 |
HRV | -22.022 | 0.0087 | 0.0705 | 0.0583 | 0.0141 | 0.0301 |
HUN | -12.419 | 0.0040 | 0.0845 | 0.0573 | -0.0009 | 0.0284 |
ISR | 2.278 | -0.0037 | 0.1123 | 0.0675 | 0.0131 | 0.0383 |
ITA | -2.048 | -0.0013 | 0.1022 | 0.0638 | 0.0397 | 0.0481 |
LTU | -9.550 | 0.0026 | 0.0669 | 0.0626 | 0.0037 | 0.0112 |
LVA | -12.836 | 0.0043 | 0.0798 | 0.0636 | -0.0029 | 0.0119 |
NLD | -23.324 | 0.0092 | 0.0780 | 0.0543 | 0.0073 | 0.0159 |
NOR | 20.725 | -0.0127 | 0.0858 | 0.0435 | 0.0117 | 0.0193 |
POL | -24.111 | 0.0097 | 0.0746 | 0.0481 | -0.0020 | 0.0146 |
PRT | -28.526 | 0.0119 | 0.1410 | 0.0910 | 0.0403 | 0.0504 |
SVK | -8.113 | 0.0017 | 0.0607 | 0.0457 | 0.0013 | 0.0286 |
SVN | -26.044 | 0.0106 | 0.0900 | 0.0629 | -0.0102 | 0.0360 |
SWE | 23.256 | -0.0139 | 0.0887 | 0.0569 | 0.0004 | 0.0229 |
USA | -26.001 | 0.0105 | 0.0722 | 0.0370 | 0.0147 | 0.0065 |
Hyper-parameters in the covariance function for each country:
Year Lengthscale | Week Lengthscale | Process variance | Observ. Noise | |
---|---|---|---|---|
AUT | 0.0820 | 3.8761 | 0.0019 | 0.0015 |
BEL | 0.1108 | 1.9774 | 0.0021 | 0.0012 |
BGR | 0.0134 | 1.6602 | 0.0027 | 0.0011 |
CHE | 0.0776 | 3.3045 | 0.0016 | 0.0011 |
CZE | 0.0618 | 3.1972 | 0.0018 | 0.0013 |
DEUTNP | 0.0936 | 1.8566 | 0.0023 | 0.0005 |
DNK | 0.0552 | 2.8559 | 0.0012 | 0.0011 |
ESP | 0.0244 | 1.9043 | 0.0023 | 0.0004 |
EST | 0.0128 | 2.2728 | 0.0014 | 0.0041 |
FIN | 0.3635 | 1.7041 | 0.0013 | 0.0011 |
FRATNP | 0.1073 | 1.9666 | 0.0016 | 0.0004 |
GBRTENW | 0.1755 | 3.2533 | 0.0031 | 0.0036 |
GBR_NIR | 0.1755 | 0.7333 | 0.0060 | 0.0091 |
GBR_SCO | 0.7294 | 19.4302 | 0.0073 | 0.0032 |
GRC | 0.0762 | 1.3665 | 0.0030 | 0.0012 |
HRV | 0.1203 | 3.1614 | 0.0024 | 0.0026 |
HUN | 0.1065 | 3.0152 | 0.0023 | 0.0019 |
ISR | 0.0509 | 2.5408 | 0.0010 | 0.0016 |
ITA | 0.3921 | 2.3206 | 0.0029 | 0.0008 |
LTU | 0.5755 | 1.8085 | 0.0023 | 0.0020 |
LVA | 0.5023 | 1.9217 | 0.0018 | 0.0023 |
NLD | 0.1267 | 3.5372 | 0.0015 | 0.0007 |
NOR | 0.1333 | 4.5492 | 0.0008 | 0.0017 |
POL | 0.0681 | 2.8480 | 0.0018 | 0.0009 |
PRT | 0.0795 | 1.8138 | 0.0030 | 0.0014 |
SVK | 0.0234 | 2.8201 | 0.0021 | 0.0019 |
SVN | 0.3811 | 2.6331 | 0.0022 | 0.0031 |
SWE | 0.0023 | 2.5665 | 0.0011 | 0.0009 |
USA | 0.3682 | 4.9320 | 0.0008 | 0.0001 |