Introduction
This post is a continuation of the last two weeks and tries to find an estimate of impact of motor accident deaths on life expectancy in South Africa. A significant part of this impact could be eliminated by self-driving cars and the post two weeks ago looked at the possible benefits in the UK and the USA. Last week I discussed some of the issues encountered when dealing with demographic data in South Africa and this week proposes a simple approach that tries to avoid some of the issues that were discussed to derive the impact of motor accidents on life expectancy.
Caveat: Digging into these numbers, it seems to me that dealing with this properly needs much more than a blog post and could be the subject of detailed research. Indeed, much of the work has been done already in the National Burden of Disease study (Pillay-Van Wyk, Laubscher, Msemburi et al. 2014; Pillay-van Wyk, Msemburi, Laubscher et al. 2016) and the most this post can attempt to do is see what can be derived from the publicly available information. I recommend that anyone interested in mortality in South Africa go through this fantastically detailed study.
My guess is that the numbers in this post are a lower bound on what the true reduction in life expectancy due to motor accidents is.
If you want to consider the appropriateness of these numbers, please also read the section below “Conclusions and Limitations”.
The code for this post has been uploaded to my Github here, in the file “traffic mort – RSA.r”:
https://github.com/RonRichman/traffic_mortality
Approach
For the purpose of this post, I am going to try avoid the issue of incomplete reporting of deaths as much as possible. I am not aware of any demographic (i.e. mathematical) method that can correct incomplete reporting of deaths by cause, I am going to make the strong assumption that the level of completeness of reporting of deaths by cause is constant in each year i.e. there is no greater propensity to report a death due to one cause more than another. If this is the case, then some simple arithmetic shows that the ratio of deaths due to one cause to deaths due to another cause is an unbiased estimate of the true ratio and therefore we don’t need to correct the death data.
Important to note here is the study by Matzopoulos, Prinsloo, Wyk et al. (2015) who went through mortuary records in 2009 to work out the true number and cause of deaths due to injuries, including motor accident deaths. The Burden of Disease study (Pillay-van Wyk, Msemburi, Laubscher et al. 2016) used these numbers as an input into a calculation whereby they corrected for injury-specific completeness of reporting, which implies that the assumption made above is a little questionable.
To deal with the issue of mislabelled cause of death data, I am going to take the following approaches:
- Firstly, hunt through the data to find causes of death that are not in the CDC list but probably represent motor accident deaths
- Using these deaths, establish a cause-specific age profile and hunt though the rest of the data to see if we find any matches.
- Try to cross-reference the WHO data with the Road Traffic Management reports on accident fatalities in South Africa.
I am then going to derive a set of factors for each age group and year which explain how many of the reported deaths are due to motor accidents.
Lastly, I am not going to try rederive my own set of mortality rates given the uncertainties in both the death and population data, but I am rather going to rely on modelled estimates of mortality from the Thembisa model (https://www.thembisa.org/). The Thembisa model seems to me to be the best publicly available model for this purpose and the project maintainers have made a very commendable effort to make the model and documentation available at their website. Using these estimates and the reduction factors discussed in the step before this, we will have everything we need to work out an adjusted life expectancy.
Data
I used three main sources of data. Like last week, the cause of death data is from the WHO Mortality database (http://www.who.int/healthinfo/statistics/mortality_rawdata/en/), which compiles death counts in 5-year bands by the ICD10 classification for a large number of countries around the world.
Secondly, I used the reports from the Road Traffic Management Corporation to compare the number of reported motor accident deaths in the WHO data to an external source. The reports are available here and contain many other interesting pieces of data:
http://www.rtmc.co.za/index.php/reports/traffic-reports
Lastly, I used the Tembisa model outputs to provide mortality rates.
Coding
Similar to the last post, I used the Centre for Disease Controls classification of the ICD-10 codes to identify the deaths due to motor accidents. This classification can be found here: https://www.cdc.gov/nchs/data/nvsr/nvsr66/nvsr66_05.pdf
However, this seemed to capture an unrealistically small number of deaths. When I looked at the data a bit more, I realized that many of the counts in the ICD10 codes relating to traffic deaths were under less informative codes than the CDC coding allowed for:
Cause | Deaths | ICD Title |
V89 | 162598 | Motor- or nonmotor-vehicle accident, type of vehicle unspecified |
V09 | 11890 | Pedestrian injured in other and unspecified transport accidents |
V19 | 708 | Pedal cyclist injured in other and unspecified transport accidents |
I added these three to the CDC list on the basis that most of the recorded motor accident deaths are probably lurking in these codes. I then worked out the percentage of deaths at each age accounted for by these deaths, producing the following plot:
The shape of these curves is quite different from those for the UK and the USA, which peak at the ages when people begin to drive:
Looking more closely at the plot for South Africa, one can see that in recent years, the pattern is shifting towards a peak at these ages too. This makes sense – as fewer AIDS deaths get recorded in recent years (with the impact of AIDS mortality falling in recent years, probably due to ARVs as discussed in Pillay-van Wyk, Msemburi, Laubscher et al. (2016)[1]), the impact of other causes is increasing.
More disturbing, though, is the fact that motor accidents don’t seem to be as much of an issue in South Africa, compared to the USA and UK, accounting for a maximum of about 7.5% of deaths compared to upwards of 40% in the USA at some ages.
Some more prior knowledge comes from Pillay-van Wyk, Msemburi, Laubscher et al. (2016) who show on page 646 of their study that road injuries were the ninth largest cause of death in South Africa in both 1997 and 2012.
A final hint that we are missing some deaths comes from the Road Traffic Management Corporation reports, which contain fatality numbers for each of the years since 2004. I pulled these numbers out of the reports, and produced the comparison shown below:
Year | Deaths | Crash Fatalities | Proportion |
2004 | 5 026 | 12 778 | 39% |
2005 | 5 279 | 14 135 | 37% |
2006 | 5 546 | 15 419 | 36% |
2007 | 5 995 | 14 920 | 40% |
2008 | 5 470 | 13 875 | 39% |
2009 | 5 550 | 13 768 | 40% |
2010 | 5 511 | 13 967 | 39% |
2011 | 5 027 | 13 954 | 36% |
2012 | 5 250 | 13 528 | 39% |
2013 | 5 544 | 11 844 | 47% |
2014 | 5 786 | 12 702 | 46% |
2015 | 6 171 | 12 944 | 48% |
The table shows that only about 40-45% of the deaths registered by the RTMC are showing up in the WHO data. The RTMC uses a different reporting process than the deaths going into the WHO data and relies on reports issued by the police in the case of accidents. Could we perhaps be missing some of the deaths from the RTMC because we are missing some deaths in the WHO data due to poor coding?
Searching through the data
To look for some of the missing deaths, I calculated the age “signature” of the traffic deaths that we have already found, which I defined as the proportion of deaths in each age bucket for each sex in each year that we have coded as being due to motor accidents. This signature looked like the following plot.
I then searched through the WHO data and calculated the distance between the age signature for the motor deaths and each cause of death labelled by ICD10 code. The table below shows the results:
Country | Cause | Deaths | distance |
South Africa | Y34 | 273496 | 30% |
UK | X969 | 41 | 32% |
USA | X940 | 1965 | 38% |
USA | X930 | 2188 | 40% |
USA | X708 | 3377 | 47% |
USA | O960 | 490 | 51% |
USA | X701 | 2454 | 51% |
USA | X804 | 1052 | 53% |
USA | X730 | 9945 | 53% |
USA | X744 | 2488 | 55% |
USA | X740 | 35822 | 56% |
USA | X808 | 1033 | 57% |
USA | X816 | 805 | 57% |
USA | W776 | 73 | 57% |
USA | X748 | 5357 | 57% |
USA | O961 | 596 | 58% |
USA | X718 | 1207 | 58% |
USA | X702 | 667 | 59% |
USA | X728 | 2077 | 60% |
USA | W875 | 63 | 61% |
It turns out that the closest match amongst the SA, USA and UK data is code Y34, which stands for “Unspecified event, undetermined intent”. The correspondence is quite good for both sexes, but a little bit out for females at the younger ages. The match is shown in the following plot (the lines represent the age signature of Y34 and the dots represent the signature shown above):
So I think it is a fair conclusion that some of the motor related deaths in South Africa land up in the WHO data under a “garbage” code. This is also in line with Matzopoulos, Prinsloo, Wyk et al. (2015) who found that the aggregate number of deaths in their study was not significantly different from the aggregate number due to external deaths in the Stats SA data (which feeds into the WHO dataset) but that deaths had been mislabelled.
Therefore, I transferred some of the deaths from cause Y34 into those related to motor accidents. I used the RTMC reports as the “true” number of deaths, which is another questionable assumption since Matzopoulos, Prinsloo, Wyk et al. (2015) actually found more motor accident related deaths than those reported by the RTMC. For this reason I view the number produced next as a lower bound, and discuss more in the conclusion.
The final proportions of deaths due to motor accidents I derived are as follows:
It can be seen that these are significantly higher than the proportions in the previous section.
Impact on Life Expectancy
The next step is to calculate the impact on life expectancy. I extended out the Thembisa mortality rates to age 110 using a Gompertz curve and then reduced the mortality rates in the Thembisa model by the proportions of the deaths due to motor accident discussed above. These curves for 2015 are shown in the following plot (the blip at age 90 is where the Gompertz curve joins the data and should be smoothed out), together with the curves adjusted for the impact of motor accidents:
The impact on life expectancy at birth is as follows:
Sex | Year | e0 | e0 – no motor accidents | Increase |
Male | 2004 | 51.72 | 52.51 | 0.79 |
Male | 2005 | 51.71 | 52.58 | 0.86 |
Male | 2006 | 52.07 | 52.96 | 0.89 |
Male | 2007 | 52.93 | 53.82 | 0.89 |
Male | 2008 | 51.24 | 52.06 | 0.81 |
Male | 2009 | 55.41 | 56.25 | 0.83 |
Male | 2010 | 57.20 | 58.04 | 0.84 |
Male | 2011 | 58.50 | 59.34 | 0.85 |
Male | 2012 | 59.20 | 60.05 | 0.85 |
Male | 2013 | 59.74 | 60.54 | 0.80 |
Male | 2014 | 60.15 | 61.03 | 0.88 |
Male | 2015 | 60.47 | 61.34 | 0.88 |
Female | 2004 | 55.77 | 56.09 | 0.32 |
Female | 2005 | 55.66 | 56.01 | 0.35 |
Female | 2006 | 56.32 | 56.71 | 0.40 |
Female | 2007 | 57.83 | 58.18 | 0.35 |
Female | 2008 | 56.35 | 56.66 | 0.31 |
Female | 2009 | 61.15 | 61.47 | 0.32 |
Female | 2010 | 63.04 | 63.38 | 0.33 |
Female | 2011 | 64.71 | 65.08 | 0.37 |
Female | 2012 | 65.79 | 66.13 | 0.34 |
Female | 2013 | 66.81 | 67.13 | 0.33 |
Female | 2014 | 67.66 | 68.00 | 0.34 |
Female | 2015 | 68.00 | 68.35 | 0.35 |
The gain in life expectancy for males is much higher than for females, which is due to two factors:
- The higher mortality rates due to accidental death for males, compared to females
- The bigger impact of motor deaths for males compared to females, as shown above
Translating these numbers into years of life lost due to motor accidents, using the reported 2015 birth cohorts from Stats SA, we get an 417 124 years of life for males and 163 157 for females.
Conclusion and Limitations
This post examined the impact of motor accident related deaths on mortality and life expectancy in South Africa. Like most exercises focussing on South African mortality that I have been involved in, it comes down to trying to work out how deaths have been reported and recorded.
The key assumptions that were made are:
- some of the motor accident related deaths have been misreported under Y34
- the RTMC reports are the true number of these deaths
- all causes of death are reported with the same level of completeness
Matzopoulos, Prinsloo, Wyk et al. (2015) found that in fact, more deaths had been recorded by mortuary reports in 2009 than appeared in the RTMC reports. The difficulty I have in using their number in this type of armchair analysis is that we know the WHO data is not completely reported, so some part of the deaths that they found relates to the normal under-reporting of deaths in South Africa, and not the cause specific reporting issues. The fact that they found more deaths also invalidates the assumption that all deaths are reported at the same level of completeness but it is unclear to me how to correct the WHO data using their finding.
This represents a limitation of the analysis performed above and it seems to me that the gain in life expectancy derived in this analysis is probably too low.
This post showed that eliminating deaths due to motor accidents would be a big win for public health. The problem is that I imagine self-driving cars will not come to South Africa nearly as quickly as more developed countries and also I don’t imagine that the whole population would benefit immediately. Other challenges for self-driving cars in South Africa are likely to arise from the relatively poor road infrastructure. Therefore, the potential benefits to mortality will not be realized anytime soon.
References
Matzopoulos, R., M. Prinsloo, V.P.-v. Wyk, N. Gwebushe et al. 2015. “Injury-related mortality in South Africa: a retrospective descriptive study of postmortem investigations”, Bulletin of the World Health Organization 93(5):303-313.
Pillay-Van Wyk, V., R. Laubscher, W. Msemburi, R.E. Dorrington et al. 2014. “Second South African National Burden of Disease Study: Data Cleaning, Validation and SA NBD List”, Cape Town: Burden of Disease Research Unit, South African Medical Research Council
Pillay-van Wyk, V., W. Msemburi, R. Laubscher, R.E. Dorrington et al. 2016. “Mortality trends and differentials in South Africa from 1997 to 2012: second National Burden of Disease Study”, The Lancet Global Health 4(9):e642-e653.
[1] Quoting from page 651 of their study “We report a marked decline in HIV/AIDS and tuberculosis mortality since 2006, which can be attributed to the intensified antiretroviral treatment rollout for adults since 2005. According to the National Department of Health, more than 2 million people received antiretroviral therapy in 201231 versus an estimated 47 500 in 2004. The rollout of the prevention of mother-to-child transmission programme since 2002 has reduced infections and hence deaths in infants.”