High-Cardinality Categorical Covariates in Network Regressions

A major challenge in actuarial modelling is how to deal with categorical variables with many levels (i.e. high cardinality). This is often encountered when one has a rating factor like car model, which can take on one of thousands of values, some of which have significant exposure and others with exposure close to zero.

In a new paper with Mario Wüthrich, we show how to incorporate these variables into neural networks using different types of regularized embeddings, including using variational inference. We also consider both the case of standalone variables, as well as the case of variables with a natural hierarchy, which lend themselves to being modelled with recurrent neural networks or Transformers. On a synthetic dataset, the proposed methods provide a significant gain in performance compared to other techniques.

We show the problem we are trying to solve in the image below, which illustrates how the most detailed covariate in the synthetic dataset – Vehicle Detail – can produce observed values vastly different from the true value due to sampling error.

A special thank you to Michael Mayer, PhD for input into the paper and interesting discussions on the topic!


Talk on ‘Explainable deep learning for actuarial modelling’

In these past days I had the privilege of presenting on the topic of “Explainable deep learning for actuarial modelling” to Munich Re‘s actuarial and data science teams. In this talk I covered several explainable deep learning methods: the CAXNN, LocalGLMnet and ICEnet models.

My slides are attached below if this is of interest.

Smoothness and monotonicity constraints for neural networks using ICEnet

I am pleased to share a new paper on adding smoothness and monotonicity constraints to neural networks. This is a joint work with Mario Wüthrich.

In this paper, we propose a novel method for enforcing smoothness and monotonicity constraints within deep learning models used for actuarial tasks, such as pricing. The method is called ICEnet, which stands for Individual Conditional Expectation network. It’s based on augmenting the original data with pseudo-data that reflect the structure of the variables that need to be constrained. We show how to design and train the ICEnet using a compound loss function that balances accuracy and constraints, and we provide example applications using real-world datasets. The structure of the ICEnet is shown in the following figure.

Applying the model produces predictions that are smooth and vary with risk factors in line with intuition. Below is an example where applying constraints forces a neural network to produce predictions of claims frequency that increase with population density and vehicle power.

You can read the full paper at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4449030 and we welcome any feedback.

New book by Wüthrich and Merz published!

The fantastic new resource from Mario Wüthrich and Michael Merz on statistical learning for actuarial work has just been published by Springer. This is open access and freely available here:


Everyone involved in these areas will find a wealth of information in this book and I give it my highest recommendation!

GIRO 2022

The Institute and Faculty of Actuaries (IFoA) has been key to my journey as an actuary, providing my initial professional education and, subsequently, many great opportunities to contribute and learn more along the way. This made receiving the 2022 Outstanding Achievement award from the IFoA’s GI Board yesterday very special:


The award was given in connection with my research into applying machine and deep learning within actuarial work. My hope is that more actuaries within the vibrant community attending the 2022 GIRO conference will be motivated to apply these techniques in their own work.

Thank you again to the #GIRO2022 organizing committee and the #ifoa for a fantastic event!

Reserving with the Cape Cod Method – OMI/ASABA Masterclass

I was delighted to present the first masterclass in the series as part of the short-term insurance practicing initiative of the Association of South African Black Actuarial Professionals and Old Mutual Insure. The title was “Reserving with the Cape Cod Method” and the attached slides cover everything from the basics all the way up to advanced methods of setting the parameters using machine learning. More materials can be found at the GitHub link on the title slide.

DFIP Old and New – Talk at the 2022 STIC Seminar

I was delighted to speak at the Actuarial Society of South Africa (ASSA)‘s annual short term insurance seminar, on Discrimination Free Insurance Pricing and our new work on multi-task networks. My slides are below.

Thanks so much to Mathias Lindholm, Andreas Tsanakas and Mario Wüthrich for this collaboration!

Discrimination Free Insurance Pricing – new paper

I am very excited to announce our next paper on Discrimination Free Insurance Pricing (DFIP). The first paper introduced a method for removing indirect discrimination from pricing models. The DFIP technique requires that the discriminatory features (e.g. gender) are known for all examples in the data on which the model is trained, as well as for the subsequent policies which will be priced. In this new work, we only require that the discriminatory features are known for a subset of the examples and use a specially designed neural network (with multiple outputs) to take care of those examples that are missing this information. In the plot below, we show that this new approach produces excellent approximations to the true discriminatory free price in a synthetic health insurance example.

The new paper can be found here:


Thank you to Mathias Lindholm, Andreas Tsanakas and Mario Wüthrich for this wonderful collaboration!

LASSO Regularization within the LocalGLMnet Architecture

We are excited to post a new paper, covering feature selection in our explainable deep learning architecture, the LocalGLMnet. Deep learning models are often criticized for not being explainable nor allowing for variable selection. Here, we show how group LASSO regularization can be implemented within the LocalGLMnet architecture so that we receive feature sparsity for variable selection. On several examples, we find that the proposed methods can identify less important variables successfully, even on smaller datasets! The figure below shows output from the model fit to the famous bike sharing dataset, where randomly permuted variables receive zero importance after regularization.

The paper can be found here:



Excited to post a new paper with Mario Wüthrich on a local GLM model parametrized using a neural network:


Deep learning models have gained great popularity in statistical modeling because they lead to very competitive regression models, often outperforming classical statistical models such as generalized linear models. The disadvantage of deep learning models is that their solutions are difficult to interpret and explain, and variable selection is not easily possible because deep learning models solve feature engineering and variable selection internally in a nontransparent way. Inspired by the appealing structure of generalized linear models, we propose a new network architecture that shares similar features as generalized linear models, but provides superior predictive power benefiting from the art of representation learning. This new architecture allows for variable selection of tabular data and for interpretation of the calibrated deep learning model, in fact, our approach provides an additive decomposition in the spirit of Shapley values and integrated gradients.

%d bloggers like this: