Motor policy price comparisons – comparing apples with oranges

Introduction

I recently tried to obtain a quote for comprehensive motor insurance from a price comparison website. The quote was on an older car, worth approximately R70k. After asking for some of my details, the comparison website presented me with something quite similar the following table of premiums and excesses.

Note that these are not the actual premiums and excesses quoted (due to copyright issues) but are modified by adding a normal random variable and then rounding the excesses. I don’t think these changes distort the economic reality of what I was quoted, but, nonetheless, these are not the actual numbers.

Premium Excess
1 458 9845
2 514 4840
3 534 7620
4 532 4580
5 544 4580
6 584 4580
7 571 4580
8 767 3920
9 894 4515

Most of the policies presented had similar terms and conditions – some sort of cashback benefit, hail cover and car rental. The distinguishing features seemed to be premium and excess. However, as a consumer, in this case, I found it difficult to compare these premiums, except for those with a R4.58k excess. What is a good deal and which of these is more overpriced? It makes some sense that policy number 9 is overpriced – I can get a lower excess for a lower premium, so this policy is definitely sub-optimal. But what about policy 8 – this has a low excess, but seems very expensive compared to the policies with only a slightly higher excess. Is this reasonable? Intuitively, and having some idea how motor policies are priced, my answer is no, but can we show this from the numbers presented?

Moral Soap Box (feel free to skip)

Before getting into the details of how I tried to work with these numbers, I think it is important to stop and consider the public interest. Would the general consumer of insurance have any idea how to compare these different premiums given the different excesses?  Probably not, in my opinion, leading to the title of this post. I guess that some rational consumers would be ‘herded’ into comparing policies 4-7, since they have the same excess, and maybe go for the cheapest one of those. But this is perhaps only a “local minimum” – maybe, in fact, one of the other policies offers better value. Also, one has to rely on the good faith of those running the comparison website to present policies with only the same terms and conditions, or else this supposedly rational strategy might backfire if policy number 4 has worse terms. Lastly, this all makes sense on day one – what will the insurer offering such a generous premium do over the lifetime of the policy – will they keep being so generous or will the consumer be horrified after a couple of steep price hikes.

Hence, this set of quotes seems to me a “comparison of apples with oranges”.

The code

As usual, the code for this post is on my Github, over here:

https://github.com/RonRichman/ABC_pricing/

Note that code is under the open-source MIT License, so please read that if you want to use it!

The theory

Of course, if we had access to the pricing models underlying these premiums then it would be a simple matter to work out what is expensive and what is not, but the companies quoting were not so kind as to share these and only provided these point estimates. I have some ideas about the frequency of occurrence of motor claims and the average cost per claim, so ideally I would want to incorporate this information into whatever calculations I perform, pointing to the need for some sort of Bayesian approach to the problem. However, the issue here is that the price of a general non-Life/P&C policy is really the outcome of a complicated mathematical function – the collective risk process  – often represented by a compound Poisson distribution, which, to my knowledge, does not have an explicit likelihood function (which is why, in practice, actuaries will use Monte Carlo simulation or other approaches like the Panjer approximation or the Fast Fourier Transform to simulate from the distribution). Since most Bayesian techniques require an explicit likelihood function (or the ability to decompose the likelihood function into a bunch of simpler distributions), it would therefore be difficult to build a Bayesian model with standard methods like Markov Chain Monte Carlo (MCMC).

So, in this blog post I share an approach to this problem that I took using an amazing technique called Approximate Bayesian Computation (‘ABC’). To explain the basic idea, it is worth going back to the basics of Bayesian calculations, which try to make direct inferences about parameters in a statistical problem. These calculations generally progress in three steps

  • Prior information on the problem at hand is encoded in a statistical distribution for the parameters we are interested in. For example, the average cost per claim might be distributed as a Gamma distribution.
  • The data likelihood is then calculated based on a realization of the parameters from the prior distribution.
  • The likelihood of a set of parameters is then assessed as the product of a) the likelihood of getting that parameter set multiplied with b) the data likelihood divided by c) the total probability of all parameter sets and data likelihoods.

In this case, the data likelihood is not available easily. The basic idea of ABC is that in models with an intractable likelihood function, one can use a different method of ascertaining whether or not a parameter set is “likely” or not. That is, by generating data based on the prior distribution and comparing how “close” this generated data is to the actual data, one can get a feel for which parts of the prior distribution make sense in the context of the data, and which do not.

For some more information on ABC, have a look at this blog post and the sources it quotes:

http://www.sumsar.net/blog/2014/10/tiny-data-and-the-socks-of-karl-broman/

The generative model and priors

I assumed that the number of claims, N, claims are distributed as a Poisson distribution, with a frequency parameter drawn from a beta distribution:

I selected the parameters of the Beta distribution to produce a mean frequency of .25 (i.e. a claim every four years) with a standard deviation of .075.

Cost per claim was modelled as a log-normal distribution:

Instead of putting priors on   and , which do not have an easy real world interpretation, instead I chose priors for the average cost per claim (ACPC) and the standard deviation of the cost per claim (SDCPC) , and, for each draw from these prior distributions, found the matching parameters for the log-normal. Both of these priors were modelled as Gamma distributions:

with the parameters of the gamma chosen so that the average cost per claim is R20k with a standard deviation of R2.5k and the standard deviation of the cost per claim is R10k with a standard deviation of R2.5k.

The code to find the corresponding log-normal parameters, once we have an ACPC and SDCPC is:

[sourcecode language=”r”]

lnorm_par = function(mean, sd) {

cv = sd/mean

sigma2 = log(cv^2+1)

mu = log(mean)-sigma2/2

results = list(mu,sigma2)

results

}

[/sourcecode]

Lastly, I assumed that the insurers are working to a target loss ratio of 70% (i.e. for every 70c of claims paid, the insurers will bring in R1 of income), with a standard deviation of 2.5%. This distribution also followed a beta, similar to the frequency rate.

The following algorithm was then run 100 000 times:

  • Draw a frequency parameter from the Beta prior
  • Simulate the number of claims from the Poisson distribution, using the frequency parameter
  • Draw an average cost per claim and it’s standard deviation, and find the corresponding log-normal distribution
  • For each claim, simulate a claim severity from the log-normal
  • For each excess with a corresponding premium quote, subtract the excess from the claims and add these up
  • The implied premium is the sum of the claims net of the excess divided by:
    • 12, since we are interested in comparing monthly premiums
    • the target loss ratio of the insurers, to gross up the premium for expenses and profit margins

Inference

So far we have generated lots of data from our priors. Now it is time to see which of the parameter combinations actually produce premiums reasonably in line with the quotes on the website. To simplify things, I put each of the simulated parameters into one of nine “buckets” depending on the percentile of the parameter within its prior distribution.

[sourcecode language=”r”]

claims[,freq_bin :=ntile(freq,9)]

claims[,sev_bin :=ntile(acpc,9)]

claims[,sev_sd_bin :=ntile(acpc_sd,9)]

claims[,lr_bin :=ntile(LR,9)]

claims[,id:=paste0(freq_bin, sev_bin, sev_sd_bin, lr_bin)]

[/sourcecode]

Then, indicative premiums for each bucket were derived by averaging the premiums derived in the previous section for each parameter “bucket”. The distance between the generated data and the actual quoted premiums was taken as the absolute percentage error:

And for the very last step, the median distance between the generated and quoted premiums was found for each parameter bucket. I only selected those “buckets” which produced a median distance of less than 8%. The median was used, instead of the mean, since I believe that some of the quotes are actually unreasonable, and I do not want to move the posterior distance too much in their favour by using a distance metric that is sensitive to outliers.

Now we have everything we need to show the posterior distributions of the parameters:

Some observations are that the prices I was quoted implies both a frequency and severity of claims that are a little bit higher than I assumed, but with a lower average cost per claim. The standard deviation of the average cost per claim is lower as well, with less weight given to the tails than I had assumed. Lastly, the loss ratio distribution matches the prior quite well.

Prices

Lastly, the implied prices are shown in red the next image.

Bearing in mind that this is all based on the assumption of actuarially unfair premiums – in other words, allowing the insurer to add a substantial profit to the actual risk premium by targeting a loss ratio of 70% – only three of the quotes are reasonable (two of those with an excess of R4.58k and the one with an excess of R4.84k). The rest of the quotes are significantly higher than can be justified by my priors on the key elements of the claims process, and it would seem irrational for a consumer with similar priors to take out one of these policies.

Conclusion

This post showed how it is possible to back out the parameters that underlie an insurance quote using prior information and Approximate Bayesian Computation.  Based on the analysis, we can go back to the original question I asked at the beginning of the post – is the low excess policy number 8 priced reasonably? The answer, based on my priors, seems to be “no”, and the excesses quoted here do not seem to be all that useful when it comes to explaining the prices of each quote.

What could be modelled more accurately? Some of the policies include a cashback, which we could priced explicitly using the posterior parameter distributions, but I personally attach very little utility to cashback benefits and would not pay more for one. So this is a more minor limitation, in my opinion.

I would love to hear your thoughts on this.

Bridging between the tribes – chain-ladder and lifetables

Introduction

During the last few years of my career I have had the opportunity to work in two of the major fields of practice for actuaries – life insurance and non-life insurance. Something that always bothered me is that actuaries who perform reserving work in either of these two areas use totally different techniques from each other.

Life actuaries will generally build cash-flow models to project out expected income and outgo to derive the expected profit for each policy they are called on to reserve for, which is then discounted back to produce the reserve amount. One of the key inputs into this type of reserving model is a life table which tabulates mortality rates which apply to the insured population that is being reserved for.

Non-life actuaries, on the other hand, almost never build cash-flow models, but will apply a range of techniques to past claims information (arranged into a “triangle”, see later in the post for a famous example) to derive expected claims amounts that are held as an incurred but not reported reserve (IBNR). Some of these techniques are the chain-ladder, the Bornhuetter-Ferguson (Bornhuetter and Ferguson 1973)  and Cape-Cod techniques (Bühlmann and Straub 1983). Lifetables are never considered.

It would make sense intuitively that there is some connection between these two “tribes” of actuaries who, after all, are both trying to do the same things, but for different types of company – make sure that the companies have enough funds held back to fund claims payments. This post tries to illustrate that in fact, hidden away in the chain-ladder method, there is an implicit life table calculation and that IBNR calculations can be cast in a life table setup. The key idea was actually expressed in a paper I wrote for the 2016 ASSA convention with Professor Rob Dorrington and appeared as an appendix in the paper.

Something else that the idea helps with is that it provides an explanation why the chain-ladder is so popular and seems to work well. The chain-ladder method remains the most popular choice of method for actuaries reserving for short term insurance liabilities globally and in South Africa  (Dal Moro, Cuypers and Miehe 2016). Although stochastic models have been proposed for the chain-ladder method by Mack (1993) and Renshaw and Verrall (1998), the underlying chain-ladder algorithm is still described in the literature as an heuristic, see for example Frees, Derrig and Meyers (2014).

The simple explanation for the success of the chain-ladder method is that underlying the estimates of reserves produced by the chain-ladder method is a life table and that the chain-ladder method is actually a type of life-table estimator.

The rest of the post shows the simple maths and some R code to “pull out” a life table for the chain-ladder calculation. In a future post, I hope to discuss some other helpful intuitions that can be built once the basic idea is established.

The code for this post is available on my GitHub account here:

Code

Chain-ladder calculations

Define:

as the claims amount relating to accident year i in development period J, where there are I accident years and J development years. An example claims triangle is shown below, that appears in Mack (1993). This triangle can easily be pulled up in R by running the following code that references the excellent Chainladder package:

[sourcecode language=”r”]

require(ggplot2)

require(ChainLadder)

require(data.table)

require(reshape2)

require(magrittr)

GenIns

[/sourcecode]

 

i C(i,1) C(i,2) C(i,3) C(i,4) C(i,5) C(i,6) C(i,7) C(i,8) C(i,9) C(i,10)
1 357 848 1 124 788 1 735 330 2 218 270 2 745 596 3 319 994 3 466 336 3 606 286 3 833 515 3 901 463
2 352 118 1 236 139 2 170 033 3 353 322 3 799 067 4 120 063 4 647 867 4 914 039 5 339 085  
3 290 507 1 292 306 2 218 525 3 235 179 3 985 995 4 132 918 4 628 910 4 909 315    
4 310 608 1 418 858 2 195 047 3 757 447 4 029 929 4 381 982 4 588 268      
5 443 160 1 136 350 2 128 333 2 897 821 3 402 672 3 873 311        
6 396 132 1 333 217 2 180 715 2 985 752 3 691 712          
7 440 832 1 288 463 2 419 861 3 483 130            
8 359 480 1 421 128 2 864 498              
9 376 686 1 363 294                
10 344 014                

The chain-ladder algorithm predicts the next claims amount in the table:

as:

where f is the so called loss development factor in development period j.

The volume weighted estimator of the loss development factor is defined in Mack (1993) as:

The estimate of the ultimate claims – the claims amount after all of the claims development is finished – for accident year i is given by:

In R, most of the chain-ladder calculations have been helpfully automated. To produce the loss development factors and an estimate of the IBNR, one runs the following code:

[sourcecode language=”r”]

fit = ChainLadder::MackChainLadder(GenIns)

plot(fit)

[/sourcecode]

Estimating the life table

Now for the lifetable. The percentage of claims developed by development period j is defined as:

 and the percentage of claims developed in period j is:

The claims development can be cast in demographic terms as follows. Assume that for each accident year i, a population of claims:

 

will eventually be reported. In each development period j:

of the claims will be reported, or will “die”. The term  is therefore comparable to the demographic quantity:

 

which is the probability of death in the period j, after surviving to time j. A full lifetable can then be derived from:

This is shown in the next table and plot, followed by the R code to produce the numbers.

j 1 2 3 4 5 6 7 8 9 10
C(i,j+1) 11 614 543 17 912 342 21 930 921 21 654 971 19 828 268 17 331 381 13 429 640 9 172 600 3 901 463  
C(i,j) 3 327 371 10 251 249 15 047 844 18 447 791 17 963 259 15 954 957 12 743 113 8 520 325 3 833 515  
f(j) 3.49 1.75 1.46 1.17 1.10 1.09 1.05 1.08 1.02 1.00
F(j,J) 14.45 4.14 2.37 1.63 1.38 1.25 1.15 1.10 1.02 1.00
tq0 0.07 0.24 0.42 0.62 0.72 0.80 0.87 0.91 0.98 1.00
t|q0 0.07 0.17 0.18 0.19 0.11 0.07 0.07 0.05 0.07 0.02
qx 0.07 0.19 0.24 0.33 0.28 0.27 0.34 0.35 0.80 1.00
tpx 0.93 0.76 0.58 0.38 0.28 0.20 0.13 0.09 0.02
px 0.93 0.81 0.76 0.67 0.72 0.73 0.66 0.65 0.20

[sourcecode language=”r”]

t_prime_qx = c(PERC_DEV[1], diff(PERC_DEV))

max_age = length(PERC_DEV)

px = numeric(10)

tpx = numeric(10)

qx = numeric(10)

px[1] = (1-t_prime_qx[1])

tpx[1] = px[1]

qx[1]= t_prime_qx[1]

for (i in 2:length(PERC_DEV)){

print(i)

qx[i] = t_prime_qx[i]/tpx[i-1]

px[i] = (1-qx[i])

tpx[i] = tpx[i-1]* px[i]

}

lifetable = data.table(t = seq(1,max_age), PERC_DEV=PERC_DEV,px = px, tpx=tpx, qx=qx,t_prime_qx=t_prime_qx )

lifetable_melt = lifetable %>% melt(id.var="t") %>% data.table()

lifetable_melt %>% ggplot(aes(x=t, y=value)) + geom_line(aes(group = variable, colour = variable)) + facet_wrap(~variable)

[/sourcecode]

Conclusion

When will the above calculations work well? These calculations make sense when dealing with triangles that increase monotonically i.e. do not allow for over-reserving or salvage and recoveries.  A good example is on count triangles of paid claims.

Now that we have shown that the chain-ladder estimates a lifetable, the question is whether this is just an interesting idea that lets one connect two diverse areas of actuarial practice, or if any significant insights with practical implications can be derived. That will be the subject of the next post.

References

Bornhuetter, R. and R. Ferguson. 1973. “The Actuary and IBNR”, Proceedings of the Casualty Actuary Society Volume LX, Numbers 113 & 114

Bühlmann, H. and E. Straub. 1983. “Estimation of IBNR reserves by the methods chain-ladder, Cape Cod and complementary loss ratio,” Paper presented at International Summer School. Vol. 1983:

Dal Moro, E., F. Cuypers and P. Miehe. 2016. Non-life Reserving Practices. ASTIN.

Frees, E.W., R.A. Derrig and G. Meyers. 2014. “Predictive Modeling in Actuarial Science”, Predictive Modeling Applications in Actuarial Science 1:1.

Mack, T. 1993. “Distribution-free calculation of the standard error of chain-ladder reserve estimates”, Astin Bulletin 23(02):213-225.

Renshaw, A.E. and R.J. Verrall. 1998. “A stochastic model underlying the chain-ladder technique”, British Actuarial Journal 4(4):903-923.

 

%d bloggers like this: