Covid-19 Excess Deaths

The analysis below applies the SOGP framework to investigate Covid-19 excess deaths. To do so we utilize the recently published short-term mortality fluctuation (STMF) dataset published by the Human Mortality Database. The STMF was expressly created to accomodate the critical need of short-term mortality analyses in response to the Covid-19 pandamic. The STMF contains detailed country-specific mortality, segregated by week (52 weeks per year), sex, and 3 age groups in 32 countries:

Austria (AUT)
Belgium (BEL)
Bulgaria (BGR)
Switzerland (CHE)
Czech Republic (CZE)
Germany (DEUTNP)
Denmark (DNK)
Spain (ESP)
Estonia (EST)
Finland (FIN)
France (FRATNP)
England and Wales (GBRTENW)
Northern Ireland (GBR_NIR)
Scotland (GBR_SCO)
Greece (GRC)
Croatia (HRV)
Hungary (HUN)
Iceland (ISL)
Israel (ISR)
Italy (ITA)
Lithuania (LTU)
Luxembourg (LUX)
Latvia (LVA)
Netherlands (NLD)
Norway (NOR)
Poland (POL)
Portugal (PRT)
Russia (RUS)
Slovakia (SVK)
Slovenia (SVN)
Sweden (SWE)
United States (USA)

The dataset is frequently updated. The analysis below is based on datasets last updated on September 17, 2020:

deaths <- readr::read_csv("https://www.mortality.org/Public/STMF/Outputs/stmf.csv", skip=1)

For our analysis we work with mortality rates (provided in STMF as the RTotal field) transformed to the log scale. We furthermore apply a few adjustments:

We ignore Week 22 and 35 in England & Wales due to a persistent dip in reported deaths during those weeks due to national holidays.
We drop the very last reported week of 2020 for CHE, FIN, NOR, SWE, and USA which have steep declines due to incomplete reporting.
We drop Iceland and Luxembourg which have different seasonality and much more noisy data due to very small population (more than 4 times smaller than all other countries considered).
We drop Russia which has limited data available as the most recent data is only up to 2018.
For purposes of illustration, we work with all-age, all-gender mortality

deaths <- data.table(deaths)
deaths <- deaths[Sex=="b",.(Year,Week,RTotal,DTotal,CountryCode)] 
deaths <- deaths[,y:=log(RTotal)]

\(\color{#c64329}{\textbf{Gaussian Process Regression in estimating 2020 excess deaths}}\)

Our primary goal is to estimate excess mortality during 2020. This is defined as the difference between the observed and the expected number of deaths. Epidemiologists use excess mortality as a standard tool to understand the full impact of the pandemic. It does not only reflect deaths dirctly due to Covid-19 but also other causes associated with the pandemic, such as delayed or forgone care due to hospitals prioritizing Covid-19 emergencies or patients themselves staying away.

To define excess mortality requires agreeing on the baseline, i.e. the expected number of deaths if Covid-19 did not occur. A very simple method is to look at historical weekly mortality and then average. For example, one common metric is the median mortality over the previous 5 years, 2015-2019. However, this approach ignores trends in the data (specifically the mortality improvement factors whereby mortality gets better YoY), is sensitive to data outliers, and most importantly does not provide good uncertainty quantification. UQ is critical to answer the question of whether observed death counts are statistically significant or can be attributed to “random” fluctuations. Mortality data is intrinsically volatile and it is not straightforward to distinguish between frequent fluctuations that could happen versus an unambiguous Covid-19 spike.

Our analysis uses statistical learning, namely GP models to answer the above. To do so, we build a GP model on the historical data, taking for our training set Years 2010-2019, as well as Weeks 1-5 of 2020. We then use this training set to make probabilistic forecasts for Weeks 6-present of 2020 and compare this forecast to the observed mortality. The vertical blue line in the figure below delineates our training set from the forecast period.

The Figure below displays the raw weekly log-mortality rates (after adjustments listed above) for each country indexed by weeks of the year. The 2020 observed log-mortality series is highlighted in red.

Most countries display strong annual seasonality, with mortality highest in Winter. In many countries, there is a smaller peak in Summer.
Not all countries have data available as early as 2010 in the STMF; for instance, starting year in Germany is 2016 and in the US is 2013.
Most countries have started to experience the outbreaks of Covid-19 in early to late March, so it is reasonable to assume that the first 5 weeks in 2020 don’t include any excess deaths.

To apply the Gaussian process modeling, we treat data as a 2-dimensional table, indexed by Weeks (1-52, coordinate 1) and Years (2010-2020, coordinate 2):

We use the Squared-Exponential anisotropic kernel with two lengthscales \(\theta_{Wk}, \theta_{Yr}\).
For the mean function we use a periodic sinusoidal trend in Week with an annual and semi-annual frequency to capture the aforementioned effects, as well as a linear trend in Year to capture a baseline constant mortality improvement factor: \[m(x^n)= \beta_0 + \beta_{yr}x^n_{yr} + A_1 \cos\bigg(\dfrac{2\pi}{52}x^n_{wk}\bigg) + A_2\sin\bigg(\dfrac{2\pi}{52}x^n_{wk}\bigg)+ A_3\cos\bigg(\dfrac{4\pi}{52}x^n_{wk}\bigg)+ A_4\sin\bigg(\dfrac{4\pi}{52}x^n_{wk}\bigg)\] where:
- \(\beta_0\): intercept (baseline mortality in the country)
- \(\beta_{yr}\): linear coefficient in Year
- \(A_1\): amplitude for cosine wave with annual frequnecy
- \(A_2\): amplitude for sine wave with annual frequency
- \(A_3\): amplitude for cosine wave with semi-annual frequency
- \(A_4\): amplitude for sine wave with semi-annual frequency
- We shift \(x^n_{wk}\) by 4 weeks to better align with the winter mortality peaks around Feb 1 and summer mortality peaks around Aug 1
We fit a constant observation noise \(\sigma_\epsilon\) that is estimated as part of the MLE procedure
We apply a genetic optimizer to fit the GP via MLE as implemented in the DiceKriging R package.
Within the 2-dim representation of our data, our training set has a notched shape (since we include a few weeks of 2020 in addition to full-year observations prior to that). This would be difficult to use in a time-series model but is straightforward in the spatial GP paradigm.

Our goal is to use a Gaussian Process model to:

smooth log-mortality rates in Year and Week dimensions;
perform out-of-sample predictions in 2020 starting from Week 5 to Week 37 (middle of September 2020).

Note that GP is effectively a spatial model and does not have any time-series concepts embedded; it treats retrospective and future forecasts identically.

\(\color{#c64329}{\textbf{Sample Gaussian Process Regression for Sweden all-Age Mortality}}\) Below is an illustration on the application of Gaussian Process to analyze and forecast time-series data in Sweden.

\(\color{#c64329}{\textbf{Excess mortality}}\)

We can compare the GP-based mean forecast for 2020 against the observed weekly log-mortality rates to identify excess mortality. In the Figure below, the shaded region is the excess mortality starting at Week 11 (early March). Some countries have experienced much more severe impact than others. Observe that 2020 data ends at different weeks in different nations. We also note that in most countries there is a strong seasonal effect whereby mortality is expected to decrease as we enter the Summer months.

\(\color{#c64329}{\textbf{Credible Bands}}\)

To highlight the uncertainty quantification available with a GP model, we plot below the credible intervals around our GP-based 2020 forecasts. Below, we plot the mean forecasts for each country along with the 80% and 95% credible bands. We also show the 5-year medians and the observed 2020 rate for comparison purpose.

We note that in several countries, there is no statistically significant deviation from expected mortality. In other cases, such as Spain and USA, the Covid-19 spikes are clear.

A few remarks can be made relative to the historical 5-year median:

The historical median is much noisier and hence is problematic as a baseline compared to the smoothed GP forecast
In most countries there is a strong annual trend, namely forecasted mortality is above the historical median. This occurs due to overall aging of the populations driving up total mortality counts.
As a result of this gap, the estimated excess deaths in our framework are lower compared to relying on historical median.

\(\color{#c64329}{\textbf{Excess deaths}}\)

Country	Total Excess Deaths in 2020	Last Week Reported
Austria	363	35
Belgium	8328	35
Bulgaria	-944	36
Switzerland	674	33
Czech Republic	-411	30
Germany	4011	33
Denmark	363	36
Spain	43614	35
Estonia	148	36
Finland	297	34
France	18415	33
England and Wales	46768	36
Nothern Ireland	724	32
Scotland	3622	37
Greece	-506	26
Croatia	-599	14
Hungary	-2242	33
Israel	321	34
Italy	48789	26
Lithuania	188	31
Latvia	-386	37
Netherlands	7430	35
Norway	305	34
Poland	184	26
Portugal	3641	35
Slovakia	-746	30
Slovenia	-106	13
Sweden	5741	35
USA	216381	33

\(\color{#c64329}{\textbf{GP Diagnostics}}\)

For completeness, we report below the fitted GP mean functions for each country, as well as the GP hyperparameters. We note that our basic model had difficulty finding appropriate lengthscales for E&W and was fitted manually for now.

	Intercept	Year trend	A_1	A_2	A_3	A_4
AUT	-6.010	0.0007	0.0785	0.0531	0.0036	0.0306
BEL	-1.517	-0.0016	0.0917	0.0720	0.0095	0.0261
BGR	-18.014	0.0069	0.0937	0.0651	0.0275	0.0154
CHE	-1.193	-0.0018	0.0879	0.0456	0.0006	0.0293
CZE	-12.113	0.0037	0.0643	0.0478	-0.0024	0.0274
DEUTNP	-16.387	0.0059	0.0787	0.0646	-0.0067	0.0335
DNK	3.331	-0.0040	0.0677	0.0569	0.0047	0.0194
ESP	-26.612	0.0108	0.1048	0.0805	0.0365	0.0419
EST	-5.406	0.0005	0.0758	0.0575	-0.0046	0.0245
FIN	-13.872	0.0046	0.0636	0.0436	-0.0009	0.0116
FRATNP	-22.259	0.0087	0.0988	0.0526	0.0130	0.0233
GBRTENW	-13.563	0.0044	0.1077	0.0152	0.0161	-0.0019
GBR_NIR	2.126	-0.0034	0.1048	0.0604	0.0116	0.0175
GBR_SCO	-20.849	0.0081	0.0913	-0.0189	0.0243	-0.0157
GRC	-36.836	0.0160	0.0769	0.0526	0.0595	0.0442
HRV	-22.022	0.0087	0.0705	0.0583	0.0141	0.0301
HUN	-12.419	0.0040	0.0845	0.0573	-0.0009	0.0284
ISR	2.278	-0.0037	0.1123	0.0675	0.0131	0.0383
ITA	-2.048	-0.0013	0.1022	0.0638	0.0397	0.0481
LTU	-9.550	0.0026	0.0669	0.0626	0.0037	0.0112
LVA	-12.836	0.0043	0.0798	0.0636	-0.0029	0.0119
NLD	-23.324	0.0092	0.0780	0.0543	0.0073	0.0159
NOR	20.725	-0.0127	0.0858	0.0435	0.0117	0.0193
POL	-24.111	0.0097	0.0746	0.0481	-0.0020	0.0146
PRT	-28.526	0.0119	0.1410	0.0910	0.0403	0.0504
SVK	-8.113	0.0017	0.0607	0.0457	0.0013	0.0286
SVN	-26.044	0.0106	0.0900	0.0629	-0.0102	0.0360
SWE	23.256	-0.0139	0.0887	0.0569	0.0004	0.0229
USA	-26.001	0.0105	0.0722	0.0370	0.0147	0.0065

Hyper-parameters in the covariance function for each country:

	Year Lengthscale	Week Lengthscale	Process variance	Observ. Noise
AUT	0.0820	3.8761	0.0019	0.0015
BEL	0.1108	1.9774	0.0021	0.0012
BGR	0.0134	1.6602	0.0027	0.0011
CHE	0.0776	3.3045	0.0016	0.0011
CZE	0.0618	3.1972	0.0018	0.0013
DEUTNP	0.0936	1.8566	0.0023	0.0005
DNK	0.0552	2.8559	0.0012	0.0011
ESP	0.0244	1.9043	0.0023	0.0004
EST	0.0128	2.2728	0.0014	0.0041
FIN	0.3635	1.7041	0.0013	0.0011
FRATNP	0.1073	1.9666	0.0016	0.0004
GBRTENW	0.1755	3.2533	0.0031	0.0036
GBR_NIR	0.1755	0.7333	0.0060	0.0091
GBR_SCO	0.7294	19.4302	0.0073	0.0032
GRC	0.0762	1.3665	0.0030	0.0012
HRV	0.1203	3.1614	0.0024	0.0026
HUN	0.1065	3.0152	0.0023	0.0019
ISR	0.0509	2.5408	0.0010	0.0016
ITA	0.3921	2.3206	0.0029	0.0008
LTU	0.5755	1.8085	0.0023	0.0020
LVA	0.5023	1.9217	0.0018	0.0023
NLD	0.1267	3.5372	0.0015	0.0007
NOR	0.1333	4.5492	0.0008	0.0017
POL	0.0681	2.8480	0.0018	0.0009
PRT	0.0795	1.8138	0.0030	0.0014
SVK	0.0234	2.8201	0.0021	0.0019
SVN	0.3811	2.6331	0.0022	0.0031
SWE	0.0023	2.5665	0.0011	0.0009
USA	0.3682	4.9320	0.0008	0.0001

Covid-19 Excess Deaths

Mike Ludkovski, UCSB

Nhan Huynh, UCSB

September 21, 2020