Nagging Predictors

We are excited to release a new paper on aggregating neural network predictions with a focus on actuarial work. The paper examines the stability of neural network predictors at a portfolio and individual policy level and shows how the variability of these predictions can act as a data-driven metric for assessing which policies the network struggles to fit. Finally, we also discuss calibrating a meta-network on the basis of this metric to predict the aggregated results of the networks.

Here is one image from the paper which looks at the effect of “informing” the meta-network about how variable each individual policy is.

If you would like to read the paper, it can be found here:

HMD – Weekly Data

The Human Mortality Database (link) is one of the best sources of demographic data that is publicly available. In addition to their regular reporting across about 40 countries, the curators have now added a special time series of weekly death data across 13 countries to enable the tracking of COVID19 deaths, and the effect on weekly mortality rates. In their usual fashion, the HMD have provided the data in an easy to use csv file which can be downloaded from the website.

Rob Hyndman (whose work on time series forecasting I have learned much from over the years and whose R packages and textbook I use/mention in this blog) posted today about this new data source on his excellent blog ( He shows plots of the excess deaths for all 13 countries.

I was wondering how one might derive a predictive interval for these weekly mortality rates, and to what extent the COVID19
mortality rates would lie outside a reasonable interval. What follows is a rough analysis using time series methods. A quick inspection of the rates for the UK shows that there are strong seasonal features, as shown in the image below.

Plot of UK total mortality rates (all ages and both sexes), 2010 to 2020

Interestingly, the data dip and then recover around week 13 each year, probably due to reporting lags around the Easter holidays. Other similar effects can be seen in the data. To produce reasonable forecasts, one approach would be to attempt to model these seasonal issues explicitly.

Thanks to the excellent forecast package in R, this is really easy. First, we show a season and trend (STL) decomposition of the these data (excluding 2020). For details on this technique, see this link.

STL decomposition of the UK total weekly death data

The plot shows quite a strong seasonal pattern which is often characterized by dips and recoveries in the data. Overall, the trend component seems to have increased since 2010, which is a little puzzling at first glance, as mortality rates in the UK have improved for most of the period 2010-2019, but is probably due to an aging population.

One could now use the STL forecasting function (forecast::stlf) to produce forecasts. Here, I have rather chosen to fit a seasonal ARIMA model (link). The model specification that the  forecast::auto.arima function is an ARIMA(2,0,1)(1,1,0)[52] model.

Finally, we are ready to forecast! In this application, I have used confidence levels of 95% and 99.5% respectively. Plotting the 2020 data against this interval produces the following figure.

It can be seen that the 2020 weekly mortality rates fall dramatically outside even a 99.5% interval. It is probably not too surprising that an uncertainty interval calibrated on only 10 years of data is too narrow, but the extent to which this has occurred is dramatic!

This analysis is very rudimentary and could be improved in several ways: obviously the distributional assumptions should be amended to allow for larger shocks, and more advanced forecasting would allow for sharing of information across countries.

The code can be found on my GitHub here:

2020 Hachemeister Prize

I am very grateful to the Casualty Actuary Society’s committee that awarded my 2018 paper “AI in Actuarial Science” the 2020 Hachemeister prize. I hope that the methods discussed in the paper eventually make an impact on P&C actuarial practice! If you want to read the paper, please find it here. One of my favorite images from the paper is pasted below and shows the results of a convolutional autoencoder fit to telematics data.

Convolutional Autoencoder fit to v-a telematics heatmaps generated using the simulation machine kindly provided by Mario Wüthrich at this link.

Discrimination-Free Insurance Pricing

We are excited to share a new paper on “Discrimination-Free Insurance Pricing” written by Mathias Lindholm, Andreas Tsanakas, Mario Wüthrich and a small contribution from myself. In this paper, we present a general method for removing direct and indirect discrimination from the types of models in common use in the insurance sector (GLMs) as well as more advanced machine learning methods.

One of my favorite plots from the paper shows a comparison of prices produced using a neural net with our method applied to the results.

We would like to hear your feedback and the paper can be downloaded here:

IDSC Africa – Call for Abstracts

The call for abstracts is open for the first Insurance Data Science Conference – Africa at the Sandton Convention Centre, 20 April 2020, which is jointly organized by the Actuarial Society of South Africa, QED Actuaries and Consultants and the University of the Witwatersrand. The invited keynote speakers are Mario Wüthrich (RiskLab, ETH Zürich) and Marjorie Ngwenya (ALU School of Insurance, past President of the Institute and Faculty of Actuaries).

We are inviting abstracts in the areas of insurance analytics, machine learning, artificial intelligence, and actuarial science. Please send your abstract by 29 February 2020 to:

Your submission should include:

  • Name and affiliation of the speaker
  • Email address of the speaker
  • Title of the presentation
  • Abstract of 10 to 20 lines and not more than 5 references
  • Format: pdf file and either a Word file or LaTeX file
  • If submitting LaTex, please test your submission file, e.g. via

The submitted abstracts will be evaluated and speakers will be selected by the scientific committee. Conference fees will be waived for selected speakers.

For more information visit the conference web site: Registration for the conference will open in mid-February 2020.

Note: The international version of this conference has been held in European cities for the past 7 years and returns to London in 2020. For more information, please visit:

IDSC – Africa

One of the most inspiring events I have attended was the Insurance Data Science Conference held in Zurich this year.

I am very excited to announce that on 20 April 2020 we will hold an affiliated event in South Africa, which will be organized by the Actuarial Society of South Africa, QED Actuaries & Consultants, and the University of the Witwatersrand. The conference website is here:

The call for abstracts is open and we look forward to receiving your submission.

Finally, the 2020 event in Europe will be held in London at the Cass Business School. The call for abstracts has just gone out and the website is here:

Insurance Data Science Conference

Ideas from IDSC 2019

About a week ago, I attended the second Insurance Data Science Conference held at ETH Zürich. On a personal note, I am very grateful to the conference organizers for inviting me to give a keynote, and my deck from that presentation is here. Making the conference extra special for me was the opportunity to meet the faculty of ETH Zürich’s RiskLab, who have written some of the best textbooks and papers on the actuarial topics that I deal with in my professional capacity.

This was one of the best organized events I have attended, including the beautiful location of the conference dinner at the Zürich guild house (shown below), and the hard choices of deciding between simultaneous sessions at the conference. It was great to see the numerous insurance professionals, academics and students who were present – the growth in the number of conferences attendees from previous years is witness to the huge current interest in data science in insurance, which will I am sure will help create tangible benefits for the industry, and the policyholders it serves.

In this post I will discuss some of the interesting ideas presented at the IDSC 2019 that stand out in my memory from the conference. If any of these snippets spark interest, then the full presentations can be found at the conference website here.

Evolution of Insurance Modelling

It is interesting to observe the impact on modelling techniques caused the availability of data at a more granular level than previously, or due to a recognition of the potential benefits of better exploiting traditional data. I would categorize this impact as a move towards more empirical modelling, but still framed within the classical actuarial models, and I explain this by examining some of the standout talks for me that fell into this category. Within my talk, I showed the following slide, which discusses the split between those actuarial tasks driven primarily by models, versus those driven by empirical relationships found within datasets. Many of the talks I discuss cover proposal to make tasks that are today more model driven, more empirically driven.

One of the sessions was structured with a focus on reserving techniques. Alessandro Carrato presented on an interesting technique that adapts the chain ladder method within an unsupervised learning framework. This technique is used for reserving for IBNeR on reported claims and works by clustering claims trajectories in a 2d spaced comprised of claims paid and outstanding loss reserves. Loss development factors are then calculated using development factors calculated from the more developed claims in each cluster. Thus, the traditional approach of finding “homogenous” lines of business, which is usually done subjectively, is here replaced by unsupervised learning. Another reserving talk, by Jonas Crevecoeur, also investigated the possibility of reserving at a more granular level using several GLMs, which were shown to reduce to more traditional techniques depending on the choice of GLM covariates.

Within the field of mortality modelling, Andrew Cairns presented on a new dataset covering mortality in the UK split by small geographic areas. This dataset also includes several static variables describing the circumstances of each of these areas, such as deprivation index, education, weekly income, nursing homes, allowing for the modelling of granular mortality rates depending on these covariates. This presentation took a very interesting approach – firstly, an overall national mortality rate was calculated, and then the mortality rate in each area was compared to the national rate in a typical “actual versus expected” analysis. Models were then estimated to explain this AvE analysis in terms of the covariates, as well as in terms of the geographic location of each area. An interesting finding was that income deprivation is an important indicator of excess mortality at the older ages, whereas unemployment is more important at the younger ages.

Another talk on mortality modelling was given by Andrés Villegas, who cast traditional mortality models into what I would call a feature engineering context. In other words, many traditional mortality models, such as the Cairns-Blake-Dowd model can be expressed as a regression of the mortality rate on a number of features, or basis functions which represent, different combinations of age, period and cohort effects. The method basically proceeds by setting up a very large number of potential features, and then selecting these using the grouped lasso technique (which gives zero weight to most features i.e. performs feature selection). A very similar idea has appeared in the reserving literature from Gráinne McGuire, Greg Taylor and Hugh Miller (link). This talk epitomized for me the shift to more empirical techniques, within a field that has traditionally been defined by models and competing model specifications(Gompertz vs Kannisto, Lee-Carter vs Cairns-Blake-Dowd etc).

Keeping it safe

A topic touched on by some speakers was the need to manage new, emerging risks arising due to advanced algorithms and open source software. Jürg Schelldorfer presented an excellent view of how to apply machine learning models within a highly regulated industry such as insurance. Some of his ideas were to focus on prediction uncertainty, and to provide questions to be answered when peer reviewing ML models. I highly recommend this presentation if you are going on the ML journey within an established company!

Jeffrey Bonh also spoke about this theme, emphasizing “algorithmic risks”, which are risks arising due to poor data used to calibrate ML algorithms, or due to the risks of malpractice during algorithmic design and calibration.

Within this section, I would also mention the amazing morning keynote by Professor Buhmann, who presented on an alternative to the paradigm of empirical risk minimization, used often to train ML models. The extent of the knowledge of ML theory shown in this talk was breath-taking, and I am excited to delve into Professor Buhmann’s work in more detail link. The lesson here for me was that it is a mistake to assume that ML methodology is “cut and dried”, and that by building more knowledge about alternative methods, one can hopefully understand some of the risks implied by these techniques.

R – the language for insurance data science

The IDSC began life as the R in Insurance conference, and in this respect, many interesting talks covered innovative R packages. Within the sessions I attended, Daphné Giorgi presented an R package used for simulating human populations based on individuals, which showed excellent performance due to the implementation of some of the algorithms in C++. Kornelius Rohmeyer presented a very promising package called DistrFit, which, as the name implies, is helpful for fitting distributions to insurance claims. This package is a very neat Shiny app, which automates some of the drudge work when fitting claims distributions in R. I hope this one gets a public release soon! Other notable packages are Silvana Pesenti’s SWIM package which implements methods for sensitivity analysis of stochastic models and the interesting sue of Hawke’s processes by Alexandre Boumezoued for predicting cyber claims.

I would also mention the excellent presentation on TensorFlow Probability by Roland Schmid. TF Probability offers many possibilities of incorporating a probabilistic view into Keras deep learning models (amongst other things) and it is exciting that RStudio is in the process of porting this package from Python to R.


The above is a sample of the excellent talks presented (biased towards my own interests), and I have not done justice to the rest of the talks on the day.

I look forward to IDSC 2020 and wish the organizers every success as this conference grows from strength to strength!

Industry publications – Sigma/Lloyd’s

Some of my favorite reading on insurance related topics comes from Swiss Re‘s Sigma series and Lloyd’s of London’s emerging risk teams. 

The latest Swiss Re Sigma publication covers the CAT events of 2018, which were driven mainly by “secondary perils”:

The Lloyd’s of London reports cover the impacts of AI on insurance and the risks of robotics: