Accurate and Explainable Mortality Forecasting with the LocalGLMnet

I’m excited to share our latest research paper on using the LocalGLMnet, an explainable deep learning model, to forecast mortality rates for multiple populations. This paper is joint work with Francesca Perla, Salvatore Scognamiglio and Mario Wüthrich.

Mortality forecasting is crucial for actuarial applications in life insurance and pensions, as well as for demographic planning. However, most existing machine learning models for this task are not transparent. We wanted to bridge this gap by adapting the LocalGLMnet, which preserves the interpretable structure of generalized linear models while allowing for variable selection and interaction identification.

We applied our model to data from the Human Mortality Database (HMD) and the United States Mortality Database, and found that it produced highly accurate forecasts that can be explained by autoregressive time-series models of mortality rates. We also showed how regularizing and denoising our model can improve its performance even further.

The image below shows a comparison of forecasting results between different models for the HMD.

The full paper can be found here:

ASSA 2022 Convention Awards

Last week was the ASSA 2022 Convention held in Cape Town, South Africa. We were delighted to hear that our paper “LASSO Regularization within the LocalGLMnet Architecture” won the RGA Prize for the Best Convention Paper and the Swiss Re Prize for the Best Paper on Risk or Reinsurance.

The paper can be found here:
LASSO Regularization within the LocalGLMnet Architecture

I’m most appreciative of the Actuarial Society of South Africa (ASSA)’s making this award and hope that actuaries will start to use the method proposed for interpretable machine learning. Thanks very much to Professor Mario Wüthrich for this project!

No alternative text description for this image

I was also pleased to hear that another paper, Mind the Gap – Safely Incorporating Deep Learning Models into the Actuarial Toolkit, was highly commended by ASSA’s Research Committee. This paper can be found here:

Mind the Gap

During the event, we also presented a paper on bootstrapping the Cape-Cod method. Below is a nice summary drawn at the Convention.

Discrimination Free Insurance Pricing – new paper

I am very excited to announce our next paper on Discrimination Free Insurance Pricing (DFIP). The first paper introduced a method for removing indirect discrimination from pricing models. The DFIP technique requires that the discriminatory features (e.g. gender) are known for all examples in the data on which the model is trained, as well as for the subsequent policies which will be priced. In this new work, we only require that the discriminatory features are known for a subset of the examples and use a specially designed neural network (with multiple outputs) to take care of those examples that are missing this information. In the plot below, we show that this new approach produces excellent approximations to the true discriminatory free price in a synthetic health insurance example.

The new paper can be found here:

Thank you to Mathias Lindholm, Andreas Tsanakas and Mario Wüthrich for this wonderful collaboration!

x is not f(x): Insurance edition

I have recently been reading Nassim Taleb’s new book , Statistical Consequences of Fat Tails, which is freely available on arXiv:

Note that if the book interests you, you are welcome to join the Global Technical Incerto Reading Club, which I host together with James Sharpe ( ). Next week (3 March), Nassim will speak to the reading club on the book, with some focus on Chapter 3, and if you are interested in the talk, you can sign up here:

Session 1 – Introduction by Nassim Taleb

Tuesday, Mar 2, 2021, 5:00 PM

Online event

560 Members Attending

We are delighted that Nassim Taleb will speak at the next meeting of the reading club. Details to follow.

Check out this Meetup →

Reading Club: Meetup 1

In preparation for the talk I read through Chapter 3, which summarizes some of the key themes within the book. In this post I discuss some of the thoughts I had on how the themes addressed in Chapter 3 pop up within the insurance world, with a focus on the idea that exposure to a risk needs to be treated differently from the underlying risk itself.

x is not f(x)

The idea as I understood it is that one can spend much time and effort trying to forecast the behavior of a random variable x whereas the component of the risk that has a practical effect on you is not the random variable itself, but is defined by how you have ‘shaped’ your exposure to the risk, expressed as f(x).

In what follows, I call the random variable x the ‘risk’ and the manner in which the risk affects you, the exposure to the risk f(x). Key is that whereas it can be difficult, if not impossible, to gain knowledge about a risk (often due to the difficult statistical properties of the risks that are impactful), changing the impact of the risk on you (or your P&L/company) is easier. The idea is expressed beautifully in the book by the adage of a trader (page 37):

“In Fooled by Randomness (2001/2005), the character is asked which was more probable that a given market would go higher or lower by the end of the month. Higher, he said, much more probable. But then it was revealed that he was making trades that benefit if that particular market goes down.”

If we consider that the insurance industry has survived exposure to heavy-tailed risks for centuries, and that insurance companies usually cause system-wide problems only when they stray into taking large financial risks (as opposed to insurance risks), it would be reasonable that the industry should be a good example of implementing these principles in practice.

Shaping exposures within insurance

The idea of focusing on how one is exposed to risk as opposed to trying to forecast the actual risk itself appears almost everywhere within the insurance industry, once you start looking for it. In almost every case I can think of, insurers do not accept the full exposure to risks as they stand, but using contractual terms and conditions, ensure that the full impact of a risk does not manifest on their P&L.

Some obvious examples are the applications of limits within general insurance. Instead of taking on the full consequences of a risk, insurance policies usually have a maximum payout for each occurrence of a risk, and perhaps also for the total impact of the risks over the course of the policy term. Limits act to ensure that the full tail risk of an insurance policy is limited (technically, this is called “right censoring”) and that the maximum loss on an insurance book is bounded. See the sections below for some more discussion of the implications of policy limits.

Limits act to shape the exposure to insurance claim ‘severity’, which is one of two famous components of insurance risk losses. The other is ‘frequency’ which refers to the extent that more claims than expected might result within an insurance portfolio. Another common response to reduce the exposure to frequency losses is to include excesses within policies that require policyholders to pay for losses below a certain amount. Since insurance risks generate many smaller losses, which nonetheless involve a constant cost to administer and pay claims, shaping the frequency exposure in this manner is also key.

Now we consider less obvious examples. Whereas the latter examples are implemented in a strictly contractual manner, a key process through which insurers shape their risk exposure is via underwriting. Some risks are just too heavy-tailed for insurers to have much of an appetite to write them, for example, almost every property policy I have seen excludes losses resulting from war and nuclear power. Within liability insurance, anything with a United States liability exposure is usually considered too risky for insurers outside of the US to provide cover for, due to the extreme claim awards in the US compared to other jurisdictions. Many insurers will only approach aviation risks or highly volatile manufacturing operations (e.g. chemicals) with extreme care.

A different approach is to only write policies for a slice of a risk that is more appealing. For example, within the cyber insurance market, most insurers do not offer coverage for the full risk exposure of a cyber loss, but only provide limited support e.g. helping recover lost data or covering the costs of crisis management practitioners. This leads some people to complain that the cyber risk market is “inefficient” or “dysfunctional” since it is difficult to find cover for the actual cyber risks faced; on the other hand given the potential extreme losses that cyber risk can cause, and the limited data on loss experience, this criticism is somewhat akin to “lecturing birds on how to fly”.

The final layer in shaping exposure for an insurer is reinsurance – or getting risk away off one’s P&L by passing it onto another insurer. Among the many different forms of reinsurance are policies that produce an option-like exposure, where one can pass risk above a fixed level of losses to the counterparty for a fixed premium (excess of loss). Other options are to share risks in more or less equal proportions.

By the end of applying all of these risk “shaping” approaches to define
f(x) , hopefully an insurer is suitably protected from risks it doesn’t want to take.

Other implications

One of the implications of adding limits to an insurance policy is that the subsequent analysis of losses generated by these policies needs to use special methods. A popular approach for analyzing the severity of losses within general insurance is to fit one distribution to the smaller and more frequent attritional losses, and another disruption to the extreme losses, with the latter distribution often motivated by extreme value theory (see the introductory session on EVT here). However, this approach ignores the fact the each loss has an upper bound determined by the limits on the policy generating the loss. Also, since these extreme losses follow a very heavy tailed distribution, naïve estimators of the statistical properties of these losses are likely to be biased. To solve this problem in other domains,
Taleb and his collaborators introduce an approach – called “shadow moments” – which works by first transforming the data to a new domain that is unbounded, parameterizing EVT distributions in this domain, and then translating the implications of these models back to the original bounded domain. Two works demonstrating this are in the context of war and pandemic casualties are Cirillo & Taleb (2016, 2020). The shadow moment approach seems to have substantial applicability in insurance modelling.

In the next post we will talk about how actuaries apply valuation approaches to measure f(x).


Cirillo, P., & Taleb, N. N. (2016). On the statistical properties and tail risk of violent conflicts. Physica A: Statistical Mechanics and Its Applications, 452, 29–45.

Cirillo, P., & Taleb, N. N. (2020, June 1). Tail risk of contagious diseases. Nature Physics, Vol. 16, pp. 606–613.

The Actuary and IBNR Techniques

Some exhibits from the talk. The box plot shows our method can select a performant variant of the chain ladder method for out of sample data.

Yesterday, we presented our new paper at the virtual GIRO event. The paper provides a method for the optimal selection of IBNR techniques using a machine learning philosophy, and can be found here:

The slides from the presentation are below.

IDSC – Africa

One of the most inspiring events I have attended was the Insurance Data Science Conference held in Zurich this year.

I am very excited to announce that on 20 April 2020 we will hold an affiliated event in South Africa, which will be organized by the Actuarial Society of South Africa, QED Actuaries & Consultants, and the University of the Witwatersrand. The conference website is here:

The call for abstracts is open and we look forward to receiving your submission.

Finally, the 2020 event in Europe will be held in London at the Cass Business School. The call for abstracts has just gone out and the website is here:

Insurance Data Science Conference

Advances in time series forecasting – M4 and what it means for insurance

Not necessarily the best way to forecast!
Photo by Jenni Jones on Unsplash

In a previous post I discussed the M4 conference and what my key takeaways were. In this post I plan to focus the discussion on insurance, and then specifically on actuarial work, and think about what the advances in time series forecasting might mean for actuaries and other professionals in insurance.

This post starts off by discussing the traditional time series forecasting problem, where it appears in the context of insurance, and how insurers could benefit from recent advances, and then narrows in to focus on actuarial work.

Let’s quickly cover what is meant by time series forecasting. Quite often, the only data that is available for a problem consists of past values that a series took, measured at regular points in time In other words, associated variables which would help to explain the past values of the series, are not available, and the exercise needs to be informed only by the past values of the series. For example, one might have data on the number of various insurance products sold monthly for the past five years (in this case, associated variables such as number of salespeople or advertising spend might not be available), and to understand revenue, one might need to forecast the number of products that will be sold over the next quarter or year.

Some more examples of this are given in a fantastic online book on forecasting by Rob J Hyndman and George Athanasopoulos over here. I would recommend this book to anyone interested in time series forecasting!

Insurance and forecasting 

Compared to more traditional industries, insurance is interesting in that there is no physical product being sold, and insurers do not need to maintain or forecast inventories. Having said that, the familiar time series forecasting problem pops up in the context of insurance in other areas, for example:

  • Forecasting the number of sales or claims and the associated resourcing requirements
  • Forecasting revenue, losses, expenses and profits

Perhaps surprisingly, revenue forecasts play a major role in determining the capital requirements of insurers under Solvency II, which is the European insurance legislation, as well as in SAM, which is the South Africa variation. In fact, part of the capital requirements for insurance risk are often directly proportional to forecast premiums, see, for example, Article 116.3.a of the Solvency II Directive

So, besides for insurers, regulators around the world also have an interested in ensuring that revenue forecasts are accurate and advances in time series forecasting, such as those at the M4 conference, should see wider applications in insurance. One advance to consider is Microsoft’s extensive use of machine learning to determine revenue forecasts, as described in this paper , by Jocelyn Barker and others. At the M4 conference (and in the paper) Jocelyn noted that these forecasts are used for widely from providing Wall Street guidance to managing global sales performance. 

Some of the other ideas that could also be of benefit, that were expressed at the M4 conference, and are now clearly established in the time series literature are understanding:

  • when to make changes to statistical forecasts (summary here)
  • the value of aggregating forecasts (insightful presentation from Bob Winkler at M4 on the topic is here) from different methods

A peculiarity of insurance forecasting is that often insurance professionals will not aim to forecast the actual value of losses and expenses, but rather will focus on ratios that express these quantities in terms of revenue (or a close proxy to revenue). For example, if one wants to forecast losses, then one would try to forecast loss ratios, which express how many cents are paid in losses for every dollar of revenue. In the next section, I will discuss how these ratios are often currently forecast in insurance companies. 

Forecasting in Actuarial Work

For the main topic of this post, I want to examine the work that actuaries do for insurers, that often consists of, or contains forecasts of some kind.

In life insurance, these forecasts are often the key variables underlying pricing and reserving such as:

  • Mortality
  • Morbidity
  • Withdrawal or lapse rates
  • Expenses

In P&C insurance (or general or short-term if you are in the UK or South Africa), these forecasts are often comprised of:

  • Loss ratios
  • Frequency rates and average cost per claim
  • Premium rates
  • Claims development patterns

As an aside, not so long ago, these lists would have included investment returns, but a large swathe of the actuarial profession has more or less adopted market consistent valuation practices, which dictate that all cashflows should be valued like bond cashflows, with the implication that investment returns can simple be read off from market yield curves. One currently controversial discussion here is the valuation of no negative guarantees on reverse mortgages in the UK, see here from Dean Buckner and Kevin Dowd.

A common assumption that is made for some of these variables is that whatever experience has occurred over the past few years will repeat itself in the future – in time series jargon, actuaries often use so-called “naive” forecasts (please read the conclusion though, where I note that this is not always the case). Here are some examples of naive forecasts in current actuarial work:

  • When determining (P&C) claims reserves, an allowance must be made for the costs of managing claims (to be precise, here I refer to claims department and associated costs, or ULAE), in addition to the cost of indemnifying policyholders. The South African SAM regulations allow actuaries to forecast these costs on the basis of the average claims management costs over the past two years/
  • Also on P&C reserving, a very common approach to determining claims development patterns (which are used then to forecast the extent of the outstanding claims that are still to be reported) is to rely on  averages of recent experience. 
  • Mortality analysis often consists of comparing an assumed mortality table to recent experience. The assumed mortality table is then adjusted to match the recent experience more closely, and only rarely will a trend over time be allowed for. 
  • When pricing P&C insurance with a GLM, a dataset of recent claims experience is used to derive factors which define how different policies are likely to perform. For example, how much more likely are claims if the policyholder is a new driver, compared to an experienced driver. These factors are most often based on the recent past, with no allowance for trend over the years.

In all these examples, the recent past is taken as representative of the future. The reasons for this are probably a general lack of sufficient data to do better, and the difficulties in specifying a suitable model that can capture these changes over time adequately. However, as data quality (and quantity) improves, and especially, as the options for modelling increase (for example, using neural nets instead of GLMs), I think there are ample opportunities to improve on some parts of current practice. 

Two potential paths to achieve this stand out for me from the M4 conference:

  • One way to improve forecasts is to come up with a smart way of ensembling multiple models (as opposed to coming up with new, more complicated models), as done by the runners up to the M4 competition (link). Of course, this needs to be done in a scientific manner, and very little research has been performed on how this could be achieved on traditional actuarial models. The advantage of this approach is that the building blocks remain the same traditional models, and a meta-model works out which of these models is best and when.
  • Another way is more or less to forget about model specification, and let a neural net find an optimal model automatically, as was done in Slawek Smyl’s winning solution (link). To do this, one generally needs more data than in traditional modelling approaches, but the results can be impressive. I particularly favor this latter approach, and for examples of applications to population mortality forecasting and claims reserving, I would point to two recent papers I co-authored that are up on SSRN that demonstrate this approach:

Having noted some of the above areas that can be improved, it is important to end by stating that often, data simply isn’t available to do much better than the most simple forecasts, and, indeed, in cases where the data is available, actuaries will try use more sophisticated modelling. One example is mortality improvement modelling, generally undertaken by providers of annuities and other products exposed to longevity risk, where actuaries apply mortality models from both the actuarial and demographic “schools”, most often to population level data. Another example is claims reserving, where there is increasing attention being placed on developing reserving models that allow for trends in claims development assumptions over time, though I have not yet seen one of these in practice. 

In conclusion, I think it is an exciting time to be involved in actuarial work and insurance more broadly, and I look forward to seeing how advances made in other areas will influence the insurance industry. 





%d bloggers like this: