LASSO Regularization within the LocalGLMnet Architecture

We are excited to post a new paper, covering feature selection in our explainable deep learning architecture, the LocalGLMnet. Deep learning models are often criticized for not being explainable nor allowing for variable selection. Here, we show how group LASSO regularization can be implemented within the LocalGLMnet architecture so that we receive feature sparsity for variable selection. On several examples, we find that the proposed methods can identify less important variables successfully, even on smaller datasets! The figure below shows output from the model fit to the famous bike sharing dataset, where randomly permuted variables receive zero importance after regularization.

The paper can be found here:


Excited to post a new paper with Mario Wüthrich on a local GLM model parametrized using a neural network:

Deep learning models have gained great popularity in statistical modeling because they lead to very competitive regression models, often outperforming classical statistical models such as generalized linear models. The disadvantage of deep learning models is that their solutions are difficult to interpret and explain, and variable selection is not easily possible because deep learning models solve feature engineering and variable selection internally in a nontransparent way. Inspired by the appealing structure of generalized linear models, we propose a new network architecture that shares similar features as generalized linear models, but provides superior predictive power benefiting from the art of representation learning. This new architecture allows for variable selection of tabular data and for interpretation of the calibrated deep learning model, in fact, our approach provides an additive decomposition in the spirit of Shapley values and integrated gradients.

Mind the Gap – Safely Incorporating Deep Learning Models into the Actuarial Toolkit

Excited to post a new paper on safely incorporating deep learning models into the actuarial toolkit. The paper covers several important aspects of deep learning models have not yet been studied in detail in the actuarial literature: the effect of hyperparameter choice on the accuracy and stability of network predictions, methods for producing uncertainty estimates and the design of deep learning models for explainability.

This paper was written under the research grant program of the AFIR-ERM section of the International Actuarial Association, to whom I am very grateful.

Here is an exhibit from the paper showing confidence bands derived using quantile regression:

Here is another exhibit, showing the impact of different hyperparameter choices:

Please read the full paper here if of interest:

Objective “Judgement” – article discussing `The Actuary and IBNR Techniques`

A short article discussing on our paper on applying machine learning principles for IBNR reserving is in the April 2021 edition of The Actuary.

The paper can be found here:

Interpreting Deep Learning Models with Marginal Attribution by Conditioning on Quantiles

Today Michael Merz, Andreas Tsanakas, Mario Wüthrich and I released a new paper showing a novel deep learning interpretability technique. This can be found on SSRN:

Compared to traditional model interpretability techniques which usually operate either at a global or instance level, the new technique, which we call Marginal Attribution by Conditioning on Quantiles looks at the contributions of variables at each quantile of the response. This provides significant insight into variable importance and the relationships between inputs to the model and predictions. The image above illustrates the output from the MACQ method on the Bike Sharing dataset.

x is not f(x): Insurance edition

I have recently been reading Nassim Taleb’s new book , Statistical Consequences of Fat Tails, which is freely available on arXiv:

Note that if the book interests you, you are welcome to join the Global Technical Incerto Reading Club, which I host together with James Sharpe ( ). Next week (3 March), Nassim will speak to the reading club on the book, with some focus on Chapter 3, and if you are interested in the talk, you can sign up here:

Session 1 – Introduction by Nassim Taleb

Tuesday, Mar 2, 2021, 5:00 PM

Online event

560 Members Attending

We are delighted that Nassim Taleb will speak at the next meeting of the reading club. Details to follow.

Check out this Meetup →

Reading Club: Meetup 1

In preparation for the talk I read through Chapter 3, which summarizes some of the key themes within the book. In this post I discuss some of the thoughts I had on how the themes addressed in Chapter 3 pop up within the insurance world, with a focus on the idea that exposure to a risk needs to be treated differently from the underlying risk itself.

x is not f(x)

The idea as I understood it is that one can spend much time and effort trying to forecast the behavior of a random variable x whereas the component of the risk that has a practical effect on you is not the random variable itself, but is defined by how you have ‘shaped’ your exposure to the risk, expressed as f(x).

In what follows, I call the random variable x the ‘risk’ and the manner in which the risk affects you, the exposure to the risk f(x). Key is that whereas it can be difficult, if not impossible, to gain knowledge about a risk (often due to the difficult statistical properties of the risks that are impactful), changing the impact of the risk on you (or your P&L/company) is easier. The idea is expressed beautifully in the book by the adage of a trader (page 37):

“In Fooled by Randomness (2001/2005), the character is asked which was more probable that a given market would go higher or lower by the end of the month. Higher, he said, much more probable. But then it was revealed that he was making trades that benefit if that particular market goes down.”

If we consider that the insurance industry has survived exposure to heavy-tailed risks for centuries, and that insurance companies usually cause system-wide problems only when they stray into taking large financial risks (as opposed to insurance risks), it would be reasonable that the industry should be a good example of implementing these principles in practice.

Shaping exposures within insurance

The idea of focusing on how one is exposed to risk as opposed to trying to forecast the actual risk itself appears almost everywhere within the insurance industry, once you start looking for it. In almost every case I can think of, insurers do not accept the full exposure to risks as they stand, but using contractual terms and conditions, ensure that the full impact of a risk does not manifest on their P&L.

Some obvious examples are the applications of limits within general insurance. Instead of taking on the full consequences of a risk, insurance policies usually have a maximum payout for each occurrence of a risk, and perhaps also for the total impact of the risks over the course of the policy term. Limits act to ensure that the full tail risk of an insurance policy is limited (technically, this is called “right censoring”) and that the maximum loss on an insurance book is bounded. See the sections below for some more discussion of the implications of policy limits.

Limits act to shape the exposure to insurance claim ‘severity’, which is one of two famous components of insurance risk losses. The other is ‘frequency’ which refers to the extent that more claims than expected might result within an insurance portfolio. Another common response to reduce the exposure to frequency losses is to include excesses within policies that require policyholders to pay for losses below a certain amount. Since insurance risks generate many smaller losses, which nonetheless involve a constant cost to administer and pay claims, shaping the frequency exposure in this manner is also key.

Now we consider less obvious examples. Whereas the latter examples are implemented in a strictly contractual manner, a key process through which insurers shape their risk exposure is via underwriting. Some risks are just too heavy-tailed for insurers to have much of an appetite to write them, for example, almost every property policy I have seen excludes losses resulting from war and nuclear power. Within liability insurance, anything with a United States liability exposure is usually considered too risky for insurers outside of the US to provide cover for, due to the extreme claim awards in the US compared to other jurisdictions. Many insurers will only approach aviation risks or highly volatile manufacturing operations (e.g. chemicals) with extreme care.

A different approach is to only write policies for a slice of a risk that is more appealing. For example, within the cyber insurance market, most insurers do not offer coverage for the full risk exposure of a cyber loss, but only provide limited support e.g. helping recover lost data or covering the costs of crisis management practitioners. This leads some people to complain that the cyber risk market is “inefficient” or “dysfunctional” since it is difficult to find cover for the actual cyber risks faced; on the other hand given the potential extreme losses that cyber risk can cause, and the limited data on loss experience, this criticism is somewhat akin to “lecturing birds on how to fly”.

The final layer in shaping exposure for an insurer is reinsurance – or getting risk away off one’s P&L by passing it onto another insurer. Among the many different forms of reinsurance are policies that produce an option-like exposure, where one can pass risk above a fixed level of losses to the counterparty for a fixed premium (excess of loss). Other options are to share risks in more or less equal proportions.

By the end of applying all of these risk “shaping” approaches to define
f(x) , hopefully an insurer is suitably protected from risks it doesn’t want to take.

Other implications

One of the implications of adding limits to an insurance policy is that the subsequent analysis of losses generated by these policies needs to use special methods. A popular approach for analyzing the severity of losses within general insurance is to fit one distribution to the smaller and more frequent attritional losses, and another disruption to the extreme losses, with the latter distribution often motivated by extreme value theory (see the introductory session on EVT here). However, this approach ignores the fact the each loss has an upper bound determined by the limits on the policy generating the loss. Also, since these extreme losses follow a very heavy tailed distribution, naïve estimators of the statistical properties of these losses are likely to be biased. To solve this problem in other domains,
Taleb and his collaborators introduce an approach – called “shadow moments” – which works by first transforming the data to a new domain that is unbounded, parameterizing EVT distributions in this domain, and then translating the implications of these models back to the original bounded domain. Two works demonstrating this are in the context of war and pandemic casualties are Cirillo & Taleb (2016, 2020). The shadow moment approach seems to have substantial applicability in insurance modelling.

In the next post we will talk about how actuaries apply valuation approaches to measure f(x).


Cirillo, P., & Taleb, N. N. (2016). On the statistical properties and tail risk of violent conflicts. Physica A: Statistical Mechanics and Its Applications, 452, 29–45.

Cirillo, P., & Taleb, N. N. (2020, June 1). Tail risk of contagious diseases. Nature Physics, Vol. 16, pp. 606–613.

The Actuary and IBNR Techniques

Some exhibits from the talk. The box plot shows our method can select a performant variant of the chain ladder method for out of sample data.

Yesterday, we presented our new paper at the virtual GIRO event. The paper provides a method for the optimal selection of IBNR techniques using a machine learning philosophy, and can be found here:

The slides from the presentation are below.

Nagging Predictors

We are excited to release a new paper on aggregating neural network predictions with a focus on actuarial work. The paper examines the stability of neural network predictors at a portfolio and individual policy level and shows how the variability of these predictions can act as a data-driven metric for assessing which policies the network struggles to fit. Finally, we also discuss calibrating a meta-network on the basis of this metric to predict the aggregated results of the networks.

Here is one image from the paper which looks at the effect of “informing” the meta-network about how variable each individual policy is.

If you would like to read the paper, it can be found here:

%d bloggers like this: