A major challenge in actuarial modelling is how to deal with categorical variables with many levels (i.e. high cardinality). This is often encountered when one has a rating factor like car model, which can take on one of thousands of values, some of which have significant exposure and others with exposure close to zero.
In a new paper with Mario Wüthrich, we show how to incorporate these variables into neural networks using different types of regularized embeddings, including using variational inference. We also consider both the case of standalone variables, as well as the case of variables with a natural hierarchy, which lend themselves to being modelled with recurrent neural networks or Transformers. On a synthetic dataset, the proposed methods provide a significant gain in performance compared to other techniques.
We show the problem we are trying to solve in the image below, which illustrates how the most detailed covariate in the synthetic dataset – Vehicle Detail – can produce observed values vastly different from the true value due to sampling error.
A special thank you to Michael Mayer, PhD for input into the paper and interesting discussions on the topic!
In these past days I had the privilege of presenting on the topic of “Explainable deep learning for actuarial modelling” to Munich Re‘s actuarial and data science teams. In this talk I covered several explainable deep learning methods: the CAXNN, LocalGLMnet and ICEnet models.
My slides are attached below if this is of interest.
I am pleased to share a new paper on adding smoothness and monotonicity constraints to neural networks. This is a joint work with Mario Wüthrich.
In this paper, we propose a novel method for enforcing smoothness and monotonicity constraints within deep learning models used for actuarial tasks, such as pricing. The method is called ICEnet, which stands for Individual Conditional Expectation network. It’s based on augmenting the original data with pseudo-data that reflect the structure of the variables that need to be constrained. We show how to design and train the ICEnet using a compound loss function that balances accuracy and constraints, and we provide example applications using real-world datasets. The structure of the ICEnet is shown in the following figure.
Applying the model produces predictions that are smooth and vary with risk factors in line with intuition. Below is an example where applying constraints forces a neural network to produce predictions of claims frequency that increase with population density and vehicle power.
The Institute and Faculty of Actuaries (IFoA) has been key to my journey as an actuary, providing my initial professional education and, subsequently, many great opportunities to contribute and learn more along the way. This made receiving the 2022 Outstanding Achievement award from the IFoA’s GI Board yesterday very special:
The award was given in connection with my research into applying machine and deep learning within actuarial work. My hope is that more actuaries within the vibrant community attending the 2022 GIRO conference will be motivated to apply these techniques in their own work.
Thank you again to the #GIRO2022 organizing committee and the #ifoa for a fantastic event!
I was delighted to present the first masterclass in the series as part of the short-term insurance practicing initiative of the Association of South African Black Actuarial Professionals and Old Mutual Insure. The title was “Reserving with the Cape Cod Method” and the attached slides cover everything from the basics all the way up to advanced methods of setting the parameters using machine learning. More materials can be found at the GitHub link on the title slide.
I was delighted to speak at the Actuarial Society of South Africa (ASSA)‘s annual short term insurance seminar, on Discrimination Free Insurance Pricing and our new work on multi-task networks. My slides are below.
Thanks so much to Mathias Lindholm, Andreas Tsanakas and Mario Wüthrich for this collaboration!
I am very excited to announce our next paper on Discrimination Free Insurance Pricing (DFIP). The first paper introduced a method for removing indirect discrimination from pricing models. The DFIP technique requires that the discriminatory features (e.g. gender) are known for all examples in the data on which the model is trained, as well as for the subsequent policies which will be priced. In this new work, we only require that the discriminatory features are known for a subset of the examples and use a specially designed neural network (with multiple outputs) to take care of those examples that are missing this information. In the plot below, we show that this new approach produces excellent approximations to the true discriminatory free price in a synthetic health insurance example.
We are excited to post a new paper, covering feature selection in our explainable deep learning architecture, the LocalGLMnet. Deep learning models are often criticized for not being explainable nor allowing for variable selection. Here, we show how group LASSO regularization can be implemented within the LocalGLMnet architecture so that we receive feature sparsity for variable selection. On several examples, we find that the proposed methods can identify less important variables successfully, even on smaller datasets! The figure below shows output from the model fit to the famous bike sharing dataset, where randomly permuted variables receive zero importance after regularization.
Deep learning models have gained great popularity in statistical modeling because they lead to very competitive regression models, often outperforming classical statistical models such as generalized linear models. The disadvantage of deep learning models is that their solutions are difficult to interpret and explain, and variable selection is not easily possible because deep learning models solve feature engineering and variable selection internally in a nontransparent way. Inspired by the appealing structure of generalized linear models, we propose a new network architecture that shares similar features as generalized linear models, but provides superior predictive power benefiting from the art of representation learning. This new architecture allows for variable selection of tabular data and for interpretation of the calibrated deep learning model, in fact, our approach provides an additive decomposition in the spirit of Shapley values and integrated gradients.