We are excited to share a new paper on “Discrimination-Free Insurance Pricing” written by Mathias Lindholm, Andreas Tsanakas, Mario Wüthrich and a small contribution from myself. In this paper, we present a general method for removing direct and indirect discrimination from the types of models in common use in the insurance sector (GLMs) as well as more advanced machine learning methods.
One of my favorite plots from the paper shows a comparison of prices produced using a neural net with our method applied to the results.
We would like to hear your feedback and the paper can be downloaded here:
The call for abstracts is open for the first Insurance Data Science Conference
– Africa at the Sandton Convention Centre, 20 April 2020, which is jointly
organized by the Actuarial Society of South Africa, QED Actuaries and
Consultants and the University of the Witwatersrand. The invited keynote
speakers are Mario Wüthrich (RiskLab, ETH
Zürich) and Marjorie Ngwenya (ALU School of Insurance, past President
of the Institute and Faculty of Actuaries).
We are inviting abstracts in the areas of insurance
analytics, machine learning, artificial intelligence, and actuarial science.
Please send your abstract by 29 February 2020 to: firstname.lastname@example.org
Your submission should include:
affiliation of the speaker
Email address of
Title of the
Abstract of 10 to
20 lines and not more than 5 references
Note: The international version
of this conference has been held in European cities for the past 7 years and
returns to London in 2020. For more information, please visit: https://www.insurancedatascience.org/
About a week ago, I attended the second Insurance Data Science Conference held at ETH Zürich. On a personal note, I am very grateful to the conference organizers for inviting me to give a keynote, and my deck from that presentation is here. Making the conference extra special for me was the opportunity to meet the faculty of ETH Zürich’s RiskLab, who have written some of the best textbooks and papers on the actuarial topics that I deal with in my professional capacity.
This was one of the best organized events I have attended, including the beautiful location of the conference dinner at the Zürich guild house (shown below), and the hard choices of deciding between simultaneous sessions at the conference. It was great to see the numerous insurance professionals, academics and students who were present – the growth in the number of conferences attendees from previous years is witness to the huge current interest in data science in insurance, which will I am sure will help create tangible benefits for the industry, and the policyholders it serves.
In this post I will discuss some of the interesting ideas presented at the IDSC 2019 that stand out in my memory from the conference. If any of these snippets spark interest, then the full presentations can be found at the conference website here.
Evolution of Insurance Modelling
It is interesting to observe the impact on modelling techniques
caused the availability of data at a more granular level than previously, or
due to a recognition of the potential benefits of better exploiting traditional
data. I would categorize this impact as a move towards more empirical modelling,
but still framed within the classical actuarial models, and I explain this by
examining some of the standout talks for me that fell into this category. Within
my talk, I showed the following slide, which discusses the split between those
actuarial tasks driven primarily by models, versus those driven by empirical relationships
found within datasets. Many of the talks I discuss cover proposal to make tasks
that are today more model driven, more empirically driven.
One of the sessions was structured with a focus on reserving
techniques. Alessandro Carrato presented on an interesting technique that
adapts the chain ladder method within an unsupervised learning framework. This technique
is used for reserving for IBNeR on reported claims and works by clustering
claims trajectories in a 2d spaced comprised of claims paid and outstanding loss
reserves. Loss development factors are then calculated using development
factors calculated from the more developed claims in each cluster. Thus, the traditional
approach of finding “homogenous” lines of business, which is usually done subjectively,
is here replaced by unsupervised learning. Another reserving talk, by Jonas
Crevecoeur, also investigated the possibility of reserving at a more granular level
using several GLMs, which were shown to reduce to more traditional techniques depending
on the choice of GLM covariates.
Within the field of mortality modelling, Andrew Cairns
presented on a new dataset covering mortality in the UK split by small geographic
areas. This dataset also includes several static variables describing the circumstances
of each of these areas, such as deprivation index, education, weekly income,
nursing homes, allowing for the modelling of granular mortality rates depending
on these covariates. This presentation took a very interesting approach –
firstly, an overall national mortality rate was calculated, and then the
mortality rate in each area was compared to the national rate in a typical “actual
versus expected” analysis. Models were then estimated to explain this AvE analysis
in terms of the covariates, as well as in terms of the geographic location of
each area. An interesting finding was that income deprivation is an important indicator
of excess mortality at the older ages, whereas unemployment is more important at
the younger ages.
Another talk on mortality modelling was given by Andrés Villegas, who cast traditional mortality models into what I would call a feature engineering context. In other words, many traditional mortality models, such as the Cairns-Blake-Dowd model can be expressed as a regression of the mortality rate on a number of features, or basis functions which represent, different combinations of age, period and cohort effects. The method basically proceeds by setting up a very large number of potential features, and then selecting these using the grouped lasso technique (which gives zero weight to most features i.e. performs feature selection). A very similar idea has appeared in the reserving literature from Gráinne McGuire, Greg Taylor and Hugh Miller (link). This talk epitomized for me the shift to more empirical techniques, within a field that has traditionally been defined by models and competing model specifications(Gompertz vs Kannisto, Lee-Carter vs Cairns-Blake-Dowd etc).
Keeping it safe
A topic touched on by some speakers was the need to manage
new, emerging risks arising due to advanced algorithms and open source software.
Jürg Schelldorfer presented an excellent view of how to apply machine learning models
within a highly regulated industry such as insurance. Some of his ideas were to
focus on prediction uncertainty, and to provide questions to be answered when
peer reviewing ML models. I highly recommend this presentation if you are going
on the ML journey within an established company!
Jeffrey Bonh also spoke about this theme, emphasizing “algorithmic
risks”, which are risks arising due to poor data used to calibrate ML
algorithms, or due to the risks of malpractice during algorithmic design and calibration.
Within this section, I would also mention the amazing morning keynote by Professor Buhmann, who presented on an alternative to the paradigm of empirical risk minimization, used often to train ML models. The extent of the knowledge of ML theory shown in this talk was breath-taking, and I am excited to delve into Professor Buhmann’s work in more detail link. The lesson here for me was that it is a mistake to assume that ML methodology is “cut and dried”, and that by building more knowledge about alternative methods, one can hopefully understand some of the risks implied by these techniques.
R – the language for insurance data science
The IDSC began life as the R in Insurance conference, and in
this respect, many interesting talks covered innovative R packages. Within the
sessions I attended, Daphné Giorgi presented an R package used for simulating human
populations based on individuals, which showed excellent performance due to the
implementation of some of the algorithms in C++. Kornelius Rohmeyer presented a
very promising package called DistrFit, which, as the name implies, is helpful for
fitting distributions to insurance claims. This package is a very neat Shiny
app, which automates some of the drudge work when fitting claims distributions
in R. I hope this one gets a public release soon! Other notable packages are Silvana
Pesenti’s SWIM package which implements methods for sensitivity analysis of
stochastic models and the interesting sue of Hawke’s processes by Alexandre
Boumezoued for predicting cyber claims.
I would also mention the excellent presentation on
TensorFlow Probability by Roland Schmid. TF Probability offers many possibilities
of incorporating a probabilistic view into Keras deep learning models (amongst other
things) and it is exciting that RStudio is in the process of porting this
package from Python to R.
The above is a sample of the excellent talks presented
(biased towards my own interests), and I have not done justice to the rest of
the talks on the day.
I look forward to IDSC 2020 and wish the organizers every
success as this conference grows from strength to strength!
Very excited to be attending the Insurance Data Science conference at ETH Zurich on Friday. I will be giving a keynote presentation on the state of the art in applying deep learning to actuarial problems. If you are there, then it would be great to meet. My slides are available at the link below:
– An excellent tutorial article by Jürg Schelldorfer and Mario Wüthrich showing how to apply a hybrid GLM/neural net for pricing. The paper is here: https://lnkd.in/edv5s9k
– This paper uses a recurrent neural network (LSTM) to forecast the time parameters of a Lee-Carter model, and the results look very promising – much better than using an ARIMA model: https://lnkd.in/eRAddBd
– Lastly, this paper proposes an interesting combination of a decision tree model with Bühlmann-Straub credibility: https://lnkd.in/eJts5Mp
Great to see the state of the art being advanced on so many fronts!
A topic that I have been interested in for a long time is forecasting mortality rates, perhaps because this is one of the interesting intersections of statistics and these days, machine learning, and the field of actuarial science in life insurance. Several methods to model mortality rates over time have been proposed, ranging from the relatively simple method of extrapolating mortality rates directly using time series, to more complicated statistical approaches.
One of the most famous of these is the Lee-Carter method, which models mortality as an average mortality rate that changes over time. The change over time is governed by a time-based mortality index, which is common to all ages, and an age-specific rate of change factor:
ln(mx,t )=ax +kt .bx
where mx,t is the force of mortality at age x in year t, ax is average log mortality rate during the period at age x, κt is the time index in year in year t and bx is the rate of change of log mortality with respect to the time index at age x.
How are these quantities derived? There are two methods prominent in the literature – applying Principal Components Analysis, or Generalized Non-linear Models, which are different from GLMs in the sense that the user can specify non-additive relationship between two or more terms. To forecast mortality, models are first fit to historical mortality data and the coefficients (in the case of the Lee- Carter model, the vector κ) are then forecast using a time series model, in a second step.
In the current age of big data, relatively high quality mortality data spanning an extended period are available for many countries from the excellent Human Mortality Database, which is a resource that anyone with an interest in the study of mortality can benefit from. Other interesting sources are databases containing sub-national rates for the USA, Japan and Canada. The challenge, though, is how to model all of these data simultaneously to improve mortality forecasts? While some extensions of basic models like the Lee-Carter model have been proposed, these rely on assumptions that might not necessarily be applicable in the case of large scale mortality forecasting. For example, some of the common multi-population mortality models rely on the assumption of a common mortality trend for all countries, which is likely not the case.
In the paper, we tackle this problem in a novel way – feed all the variables to a deep neural network and let it figure out how exactly to model the mortality rates over time. This speaks to the idea of representation learning that is central to modern deep learning, which is that many datasets, such as large collections of images as in the ImageNet dataset, are too complicated to model by hand-engineering features, or it is too time consuming to perform the modelling. Rather, the strategy in deep learning is to define a neural network architecture that expresses useful priors about the data, and allow the network to learn how the raw data relates to the problem at hand. In the example of modelling mortality rates, we use two architectural elements that are common in applications of neural networks to tabular data:
We use a deep network, in other words the network consists of multiple layers of variables that expresses the prior that complex features can be represented by a hierarchy of simpler representations learned in the model.
Instead of using one-hot encoding to signify to the network when we are modelling a particular country, or gender, we use embedding layers. When applied to many categories, one-hot encoding produces a high-dimensional feature vector that is sparse (i.e. many of the entries are zero), leading to potential difficulties in fitting a model as there might not be enough data to derive credible estimates for each category. Even if there is enough data, as in our case of mortality rates, each parameter is learned in isolation, and the estimated parameters do not share information, unless the modeller explicitly chooses to use something like credibility or mixed models. The insight of Bengio et al. (2003) to solve these problems is that categorical data can successfully be encoded into low dimensional, dense numerical vectors, so, for example, in our model, country is encoded into a five-dimensional vector.
In the paper, we also show how the original Lee-Carter model can be expressed as a neural network with embeddings!
Here is a picture of the network we have just described:
In the paper, we also employ one of the most interesting techniques to emerge from the computer vision literature in the past several years. The original insight is due to the authors of the ResNet paper, who analysed the well-known problem that it is often difficult to train deep neural networks. They considered that a deep neural network should be no more difficult to train than a shallow network, since the deeper layers could simply learn the identity function, and thus be equivalent to a shallow network. Without going too far off track into these details, their solution is simple – add skip layers that connect the deep layers to more shallow layers in the network. This idea is expanded on in the DenseNet architectures. We simply added a connection between the feature layer, and the fifth layer of the network, connecting the embedding layers almost to the deepest layer of the network. This boosted the performance of the networks considerably.
We found that the deep neural networks dramatically outperformed the competing methods that we tested, forecasting with the lowest MSE in 51 out of the 76 instances we tested! Here is a table comparing the methods, and see the paper for more details:
Lastly, an interesting property of the embedding layers learned by neural networks is the fact that the parameters of these layers are often interpretable as so-called “relativities” (to use some actuarial jargon), in other words, as defining the relationship between the different values that a variable may take. Here is a picture of the age embedding, which shows that the main relationship learned by the network is the characteristic shape of a modern life table:
This is a rather striking result, since at no time did we specify this to the network! Also, once the architecture was specified, the network has also learned to forecast mortality rates more successfully than human specified models reminding me of one of the desiderata for AI systems listed by Bengio (2009):
“Ability to learn with little human input the low-level, intermediate, and high-level abstractions that would be useful to represent the kind of complex functions needed for AI tasks.”
We would value any feedback on the paper you might have.
Bengio, Y. 2009. “Learning deep architectures for AI”, Foundations and trends® in Machine Learning 2(1):1-127.