The Human Mortality Database (link) is one of the best sources of demographic data that is publicly available. In addition to their regular reporting across about 40 countries, the curators have now added a special time series of weekly death data across 13 countries to enable the tracking of COVID19 deaths, and the effect on weekly mortality rates. In their usual fashion, the HMD have provided the data in an easy to use csv file which can be downloaded from the website.
Rob Hyndman (whose work on time series forecasting I have learned much from over the years and whose R packages and textbook I use/mention in this blog) posted today about this new data source on his excellent blog (https://robjhyndman.com/hyndsight/excess-deaths/). He shows plots of the excess deaths for all 13 countries.
I was wondering how one might derive a predictive interval for these weekly mortality rates, and to what extent the COVID19
mortality rates would lie outside a reasonable interval. What follows is a rough analysis using time series methods. A quick inspection of the rates for the UK shows that there are strong seasonal features, as shown in the image below.
Interestingly, the data dip and then recover around week 13 each year, probably due to reporting lags around the Easter holidays. Other similar effects can be seen in the data. To produce reasonable forecasts, one approach would be to attempt to model these seasonal issues explicitly.
Thanks to the excellent forecast package in R, this is really easy. First, we show a season and trend (STL) decomposition of the these data (excluding 2020). For details on this technique, see this link.
The plot shows quite a strong seasonal pattern which is often characterized by dips and recoveries in the data. Overall, the trend component seems to have increased since 2010, which is a little puzzling at first glance, as mortality rates in the UK have improved for most of the period 2010-2019, but is probably due to an aging population.
One could now use the STL forecasting function (forecast::stlf) to produce forecasts. Here, I have rather chosen to fit a seasonal ARIMA model (link). The model specification that the forecast::auto.arima function is an ARIMA(2,0,1)(1,1,0) model.
Finally, we are ready to forecast! In this application, I have used confidence levels of 95% and 99.5% respectively. Plotting the 2020 data against this interval produces the following figure.
It can be seen that the 2020 weekly mortality rates fall dramatically outside even a 99.5% interval. It is probably not too surprising that an uncertainty interval calibrated on only 10 years of data is too narrow, but the extent to which this has occurred is dramatic!
This analysis is very rudimentary and could be improved in several ways: obviously the distributional assumptions should be amended to allow for larger shocks, and more advanced forecasting would allow for sharing of information across countries.
The code can be found on my GitHub here: