Very excited to give a keynote talk at the European Actuarial Academy GmbH‘s conference on Data Science and Data Ethics later today. The theme of my talk is AI in Actuarial Science, Two Years On. Slides are attached to this post.
Excited to post a new paper on safely incorporating deep learning models into the actuarial toolkit. The paper covers several important aspects of deep learning models have not yet been studied in detail in the actuarial literature: the effect of hyperparameter choice on the accuracy and stability of network predictions, methods for producing uncertainty estimates and the design of deep learning models for explainability.
This paper was written under the research grant program of the AFIR-ERM section of the International Actuarial Association, to whom I am very grateful.
Here is an exhibit from the paper showing confidence bands derived using quantile regression:
Here is another exhibit, showing the impact of different hyperparameter choices:
Please read the full paper here if of interest:
A short article discussing on our paper on applying machine learning principles for IBNR reserving is in the April 2021 edition of The Actuary.
The paper can be found here:
Today Michael Merz, Andreas Tsanakas, Mario Wüthrich and I released a new paper showing a novel deep learning interpretability technique. This can be found on SSRN:
Compared to traditional model interpretability techniques which usually operate either at a global or instance level, the new technique, which we call Marginal Attribution by Conditioning on Quantiles looks at the contributions of variables at each quantile of the response. This provides significant insight into variable importance and the relationships between inputs to the model and predictions. The image above illustrates the output from the MACQ method on the Bike Sharing dataset.
We are excited to release a new paper on aggregating neural network predictions with a focus on actuarial work. The paper examines the stability of neural network predictors at a portfolio and individual policy level and shows how the variability of these predictions can act as a data-driven metric for assessing which policies the network struggles to fit. Finally, we also discuss calibrating a meta-network on the basis of this metric to predict the aggregated results of the networks.
Here is one image from the paper which looks at the effect of “informing” the meta-network about how variable each individual policy is.
If you would like to read the paper, it can be found here:
The Human Mortality Database (link) is one of the best sources of demographic data that is publicly available. In addition to their regular reporting across about 40 countries, the curators have now added a special time series of weekly death data across 13 countries to enable the tracking of COVID19 deaths, and the effect on weekly mortality rates. In their usual fashion, the HMD have provided the data in an easy to use csv file which can be downloaded from the website.
Rob Hyndman (whose work on time series forecasting I have learned much from over the years and whose R packages and textbook I use/mention in this blog) posted today about this new data source on his excellent blog (https://robjhyndman.com/hyndsight/excess-deaths/). He shows plots of the excess deaths for all 13 countries.
I was wondering how one might derive a predictive interval for these weekly mortality rates, and to what extent the COVID19
mortality rates would lie outside a reasonable interval. What follows is a rough analysis using time series methods. A quick inspection of the rates for the UK shows that there are strong seasonal features, as shown in the image below.
Interestingly, the data dip and then recover around week 13 each year, probably due to reporting lags around the Easter holidays. Other similar effects can be seen in the data. To produce reasonable forecasts, one approach would be to attempt to model these seasonal issues explicitly.
Thanks to the excellent forecast package in R, this is really easy. First, we show a season and trend (STL) decomposition of the these data (excluding 2020). For details on this technique, see this link.
The plot shows quite a strong seasonal pattern which is often characterized by dips and recoveries in the data. Overall, the trend component seems to have increased since 2010, which is a little puzzling at first glance, as mortality rates in the UK have improved for most of the period 2010-2019, but is probably due to an aging population.
One could now use the STL forecasting function (forecast::stlf) to produce forecasts. Here, I have rather chosen to fit a seasonal ARIMA model (link). The model specification that the forecast::auto.arima function is an ARIMA(2,0,1)(1,1,0) model.
Finally, we are ready to forecast! In this application, I have used confidence levels of 95% and 99.5% respectively. Plotting the 2020 data against this interval produces the following figure.
It can be seen that the 2020 weekly mortality rates fall dramatically outside even a 99.5% interval. It is probably not too surprising that an uncertainty interval calibrated on only 10 years of data is too narrow, but the extent to which this has occurred is dramatic!
This analysis is very rudimentary and could be improved in several ways: obviously the distributional assumptions should be amended to allow for larger shocks, and more advanced forecasting would allow for sharing of information across countries.
The code can be found on my GitHub here:
I am very grateful to the Casualty Actuary Society’s committee that awarded my 2018 paper “AI in Actuarial Science” the 2020 Hachemeister prize. I hope that the methods discussed in the paper eventually make an impact on P&C actuarial practice! If you want to read the paper, please find it here. One of my favorite images from the paper is pasted below and shows the results of a convolutional autoencoder fit to telematics data.
We are excited to share a new paper on “Discrimination-Free Insurance Pricing” written by Mathias Lindholm, Andreas Tsanakas, Mario Wüthrich and a small contribution from myself. In this paper, we present a general method for removing direct and indirect discrimination from the types of models in common use in the insurance sector (GLMs) as well as more advanced machine learning methods.
One of my favorite plots from the paper shows a comparison of prices produced using a neural net with our method applied to the results.
We would like to hear your feedback and the paper can be downloaded here:
The call for abstracts is open for the first Insurance Data Science Conference – Africa at the Sandton Convention Centre, 20 April 2020, which is jointly organized by the Actuarial Society of South Africa, QED Actuaries and Consultants and the University of the Witwatersrand. The invited keynote speakers are Mario Wüthrich (RiskLab, ETH Zürich) and Marjorie Ngwenya (ALU School of Insurance, past President of the Institute and Faculty of Actuaries).
We are inviting abstracts in the areas of insurance analytics, machine learning, artificial intelligence, and actuarial science. Please send your abstract by 29 February 2020 to: email@example.com
Your submission should include:
- Name and affiliation of the speaker
- Email address of the speaker
- Title of the presentation
- Abstract of 10 to 20 lines and not more than 5 references
- Format: pdf file and either a Word file or LaTeX file
- If submitting LaTex, please test your submission file, e.g. via https://latexbase.com/
The submitted abstracts will be evaluated and speakers will be selected by the scientific committee. Conference fees will be waived for selected speakers.
For more information visit the conference web site: https://insurancedatascience.org.za. Registration for the conference will open in mid-February 2020.
Note: The international version of this conference has been held in European cities for the past 7 years and returns to London in 2020. For more information, please visit: https://www.insurancedatascience.org/