Neural Network Embedding of the Over-Dispersed Poisson Reserving Model

Claims reserving for non-life (i.e. GI or P&C) companies is a core activity of actuaries working in these companies, and a huge academic literature on the subject has been produced (Schmidt 2017). Recently, there has been more focus on how machine learning can be applied to claims reserving and some examples of studies are Kuo (2018); Wüthrich (2018a); (Wüthrich 2018b); Zarkadoulas (2017).

When I think about the literature that has sprung up on the claims reserving problem, one issue that has always bothered me is that actuaries in practice will often be forced to depart from the theoretical methods, because the triangles that they encounter do not conform to the assumptions of the theory. For example, one will often observe that the claims development pattern is not constant over time, and then averaging over all accident years will produce inaccurate reserves. Thus, in practice, actuaries apply all sorts of heuristics to derive a hopefully less biased set of assumptions that are then applied to derive reserves. This becomes very problematic when the actuaries are then required to derive uncertainty estimates, which are used in Solvency II/SAM for setting capital, because the methods for deriving the uncertainty estimates generally are unable to cater for the heuristics that were applied to derive the best estimate of the reserves. Some approaches that have emerged recently apply non-linear mixed models or fully Bayesian models to allow for changing claims development patterns, but I have not yet seen someone derive the uncertainty of the reserves using these methods.

So, with this background in mind, this post is about a new approach to the claims reserving problem that solves these issues very neatly using the paradigm of representation learning (i.e. allowing a neural network to figure out the optimal way to use the input features within the model structure). The approach appears in a new paper applying neural networks to the claims reserving problem that I am delighted to have worked on together with Andrea Gabrielli and Mario Wüthrich, which is available here:

Paper on SSRN

In this paper, we show how a traditional IBNR model – the over dispersed Poisson model (Renshaw and Verrall 1998), which uses a GLM to model the claims run off triangle – can be embedded into a neural network, which is then allowed to learn additional model structure, automatically enhancing the accuracy of the claims reserving model. The underlying claims data was simulated from the individual claims simulation machine developed by my co-authors (Gabrielli and Wüthrich 2018) and aggregated into six triangles representing different lines of business. One very nice feature of these data is that we also have the claims runoff and we can thus compare the predicted claims (derived using our reserving method) to the actual claims development.

This paper features the following ideas, which are discussed next:

  • Residual learning
  • Learning over multiple lines of business
  • Uncertainty prediction

In this paper, we are building on an idea that was used in our recent paper on mortality forecasting using neural networks (Richman and Wüthrich 2018), in which we showed how the Lee-Carter mortality model can be expressed and extended to multiple populations within a neural network framework, leading to accurate mortality forecasts at a large scale.

However, whereas in the previous paper, we did not maintain the structure of the Lee-Carter model, in the current paper, we have maintained the ODP reserving model, which is a familiar reference point for actuaries, and allowed the network to enhance the familiar model; thus the network is learning about whatever residual structure remains after the application of the ODP model. Here is a view of the neural network used in the current paper:

This is a similar concept to the very successful class of computer vision models called ResNets (He, Zhang, Ren et al. 2016), which consist of very deep neural networks, where each set of layers learn a residual function.  This concept was shown to be successful in allowing the training of exceptionally deep networks on the ImageNet dataset, and in the Lee-Carter paper, we showed how including a residual connection improved the performance of our deep network. Here, we use this idea a little differently, not to calibrate a very deep network, but to improve the calibration times by providing the ODP model to the network within a skip connection, dramatically reducing the time taken to calibrate the final neural network. Using the flexibility of the neural networks, we also calibrate the model on six triangles simultaneously, and these results are shown in the paper to be more accurate than either the original ODP model (which produces biased predictions that are too low across all lines), or the neural network calibrated to a single triangle. In fact, comparing the predicted claims to the actual claims, we find that the neural network calibrated to the six triangles produces exceptionally accurate predictions!

Why is this model more accurate? We show in the paper that the network has learned additional structure that has picked up automatically on a shift in the claims development patterns over time. Here is a view of the claims development patterns for each of the accident years relating to one of the lines of business:

Thus, the network automatically has learned to vary the assumptions applied to each accident year, resulting in more accurate predictions. This is the paradigm of representation learning that was mentioned above – we have not specified to the model exactly how the claims development assumptions should vary by accident year, but fed information regarding accident and development year into the neural network, and allowed it to figure out how to combine this information optimally.

Perhaps most importantly, since each network is quick to calibrate, we then apply the bootstrap to derive the uncertainty of the predictions of the network, which interestingly is similar to the aggregate uncertainty of the ODP model. This is one of the first examples in the literature that I have seen whereby a model that is complex enough to be applied to real life triangles is also amenable to uncertainty analysis. This work therefore is likely to be an important step to advancing the state of the art of claims reserving models!

Please feel free to contact us if you have any feedback, which we would value!


Gabrielli, A. and M. Wüthrich. 2018. “An Individual Claims History Simulation Machine”, Risks 6(2):29.

He, K., X. Zhang, S. Ren and J. Sun. 2016. “Deep residual learning for image recognition,” Paper presented at Proceedings of the IEEE conference on computer vision and pattern recognition. 770-778.

Kuo, K. 2018. “DeepTriangle: A Deep Learning Approach to Loss Reserving”, arXiv arXiv:1804.09253

Renshaw, A.E. and R.J. Verrall. 1998. “A stochastic model underlying the chain-ladder technique”, British Actuarial Journal 4(04):903-923.

Richman, R. and M. Wüthrich. 2018. “A Neural Network Extension of the Lee-Carter Model to Multiple Populations”, SSRN

Schmidt, K. 2017. A Bibliography on Loss Reserving. Accessed: 8 July 2018.

Wüthrich, M. 2018a. “Machine learning in individual claims reserving”, Scandinavian Actuarial Journal:1-16.

Wüthrich, M. 2018b. “Neural networks applied to chain–ladder reserving”, European Actuarial Journal 8(2):407-436.

Zarkadoulas, A. 2017. “Neural network algorithms for the development of individual losses.” Unpublished thesis, Lausanne: University of Lausanne.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: