On 1 May 2020 the UK’s national statistical agency (ONS) released data and a report on deaths that had occurred between 1 March and 17 April in England and Wales. Similar data may have been released in Scotland and Northern Ireland by their devolved statistical agencies. I have not analysed it. In this piece I will not analyse Welsh data. This is because Indices of Multiple Deprivation (IMDs) are devolved and English IMDs are not directly comparable to Welsh IMDs. Doing so would take more of my time.
The ONS report contained excellent maps and charts explaining the geographic patterns of Covid-19 deaths. More people have died of Covid-19 in London. Fewer people have died of in rural areas. More people have died with diagnosed Covid-19 in the most deprived areas than in the least deprived
The third of these findings generated a lot of press coverage. In Figure 7 of their online report the ONS say that “as with all deaths, Covid-19’s effects are worse the more deprived an area is… however in the most deprived areas, Covid-19 has had a proportionally higher impact”.
I have spent the last week understanding this statement, reproducing it, and extending the methods to regions of England and cities within England. The statement is true. It reproduces. I think a lot of people have no understand it though.
I am sharing my work here. Much of it is powered by code and data that I have published . But before I start, some warnings and definitions. There are many more warnings and definitions than I have managed to include in this blog post. More are contained in the raw data and its analysis. More will be added in the future. I am determined not to over-complicated this blog post or my analysis. Over-complication is often lazy and laziness leads to errors. It also leads to misunderstandings in public debate. I fear that we’ve seen that in this case. But we can clear things up now, because the internet is amazing.
Analysis of death rates is complex and hard. There are many parts to it and many judgements that can be made differently. It is common for people to over-simplify their analysis. It is also common for people to over-complicate their analysis. This can be by relying on standard procedures that are not relevant to their analysis. Or using procedures that give too little improvement in completeness of analysis to be worth the increase in how hard that analysis is to understand.
Both risks are bigger risks if an analyst approaches their work hoping to find something and failing to control that urge. Laziness of any type, through over-simplification, over-complication, or prejudice, leads to errors. And in analysis of death rates those errors can be large.
I may have made such errors in this analysis. I hope that sharing my work and trying to sort the lazy from the valid feedback will reduce them.
The death rate in poor parts of England is about the same as the death rate in richer parts of England. About nine per thousand people die each year. But the people who die in the poor parts of England are younger. Most people consider the tragedy of a death at 60 to be greater than the tragedy of death at 90, largely because the former is much more likely to be avoidable. Across whole populations, researchers use age-standardised mortality rates to capture this.
There’s a great example of this in the ONS’s 1 May release. Table 3 in the spreadsheet shows the number of people who died in England between 1 March and 17 April split by the deprivation of the place where they lived. Each decile of deprivation has a population of about 5.5 million (England has a population of about 55 million and deciles split this into ten equal sized pieces) and in each decile of deprivation about 8500 people died in the 48 days from 1 March to 17 April.
But we know from other data that the more deprived the decile, the younger the people who died were. And once we correct for that, by valuing the greater tragedy of a younger death, we see that the age-standardised mortality rate in the most deprived deciles is far higher than in the less deprived deciles. That’s the rate column in Table 3.
That’s the basics of age standardisation of mortality rates. I don’t use them in my analysis at the moment. I think it confuses people who don’t realise that the numbers they’re looking at aren’t people who have died, but people weighted by their age. I am not convinced that we should be age-standardising death rates for a disease whose death profile we don’t know yet (although early results suggest it is likely to be quite similar to death from natural causes). I am open to a debate on both points, but that’s the decision I’ve taken.
If you catch Covid-19, become seriously ill, are admitted to hospital, are tested for Covid-19 and that test works correctly, and you die in hospital, you will definitely be counted as having died of Covid-19. For anything less than that, your death will be recorded, but it might not mention Covid-19 even if you died with it.
We know that more people have died with Covid-19 than have been recorded as dying with it. The size of the gap varies across countries and regions and demography. We use the excess deaths measure to try and close this gap.
With a pandemic like Covid-19 that kills a lot of people in a short time period estimating excess deaths is pretty easy to understand. This is different to something like AIDS which keep killing over decades. But with Covid-19 at this stage of its outbreak things are easier. You look at how many people died in the same place and time in previous years and you assume that the same number of people should have died this year. Any difference is excess deaths and you assume that all of these deaths are caused by c Covid-19.
I’ve talked about deprivation a lot up to now, but I haven’t talked about how we define it in England. The definition is important. You can get all the details on the English indices of deprivation 2019 page . Here I’ll just share what we need to know.
The IMD combines about 40 measures into a single deprivation score for every small area in England. The inputs include data on Income (including special details on poverty affecting children and the elderly), Employment, Education, Health (keen statisticians among the readers should now have alarm bells ringing about using age-standardising death rates while also using IMDs for classification, but don’t worry too much, the health weight is low in the overall IMD), Crime, Housing, and Environment. The small areas are called Lower Layer Super Output Area (LSOAs) and each has a population of around 1500. Groups of LSOAs form Middle Layer Super Output Areas (MSOAs) with a population of around 7500.
Since the ONS have released their Covid-19 death data for MSOAs that is the geography I will use from now on.
I have calculated IMDs for every MSOA in England. I have then assigned a decile to each MSOA. If an MSOA is in the most deprived 10% of MSOAs it gets labelled 1. If it’s in the next most deprived 10% of MSOAs it gets labelled 2. If it’s in the least deprived 10% of MSOAs it gets labelled 10. There are 6791 MSOAs in England, and so there are 679 MSOAs in each decile (plus an extra one somewhere). The populations of each decile are almost exactly the same.
But since I’m going to be looking at data within regions and cities I need to explain a big danger. Deprivation in England is not evenly spread. For example, most places in North West England are deprived while most places in East England are not deprived. Together they are a pretty even distribution of deprivation, a good representation of England.
If you forget this when working with IMD deciles you can have some problems. Later I’ll look at answering a question like “has Covid-19 caused more deaths in deprived parts of Birmingham?”.
The problem is that by the standards of England as a whole, almost all of Birmingham is deprived, so the answer is trivial. Yes. Almost always. Almost everything that happens in Birmingham, or Manchester, or Liverpool happens in a deprived part of England. By definition.
But this usually isn’t what the people asking the question mean. They mean to ask “has Covid-19 caused more deaths in the more deprived parts of Birmingham?”. To answer that we have to create deprivation deciles for Birmingham, which is what I’ve done.
The ecological fallacy is a nightmare for statisticians, especially those working in biology. There is no better example than the data for death rates by MSOA in England & Wales that I prepared a few days ago .
We know that Covid-19 kills older people at a far greater rate than it kills younger people. And so we would reasonably expect that parts of England & Wales with more old people would have a higher death rate from Covid-19. We don’t see that. It is a great example of the ecological fallacy; that the properties we observe in a group (an MSOA is a group of people) are not reflected in the properties of individuals within the group.
So we know that Covid-19 is killing older people, but in areas of England & Wales that are younger.
The infuriating thing about the ecological fallacy is that it tends to sneak up on statisticians without much warning. And the type of geography-based analysis that the ONS and I have done is particularly susceptible to it.
There are good techniques to reduce the risk of being a victim of the ecological fallacy, but none is fool proof. The best way is to check results by looking at individuals, which is exactly what The ONS have done to explore Covid-19’s variable impact on people of different ethnicity and what Ben Goldacre and his team have done to understand risk factors for Covid-19 mortality in hospitals . The latter is a mind-blowingly outstanding piece of work and you should read it. But beware of assuming that its risk factor finding for deprivation proves this work wrong. Ben’s work only looks at hospitalised patients who have tested positive for Covid-19, my result suggests that it the excess deaths that were never tested for Covid-19 are very important.
But looking at individuals cannot make us safe from the ecological fallacy. Deprivation is the property of a place, not of a person within it. We cannot escape the risk of the ecological fallacy, but I will try to avoid it in the results section below.
I estimate that Covid-19 has caused an equal number of deaths in all deprivation deciles within England.
This may seem like a different finding to the one that the ONS reported. I don’t think it is. There are two big reasons why our results look different.
There are lots of problems with my data. Here are two big ones.
This does not contradict the ONS’s findings that “as with all deaths, Covid-19’s effects are worse the more deprived an area is… however in the most deprived areas, Covid-19 has had a proportionally higher impact” if you understand that statement. But I fear that many people repeating that phrase do not understand it.
Or maybe it’s me that’s wrong. That happens quite often.
Since you might have a lot of questions right now, here’s a table that might answer some of them.
The work I have done is shared on GitHub. The PowerBI project I’m using for the visuals in this blog post is on there too but it’s a mess. Please be careful if you use it. It lets us look at excess deaths and confirmed Covid-19 deaths in every local authority and region of England. Here are four.
Within statistical significance, within Leeds, all deciles of deprivation have suffered a similar number of deaths from Covid-19, both by the excess death measure and by testing.
Within statistical significance, within Manchester, all deciles of deprivation have suffered a similar number of deaths from Covid-19, both by the excess death measure and by testing.
The larger number of cases in London means that a pattern does appear to be emerging. It’s still unclear, but it looks like Covid-19 may be causing more deaths in more deprived areas.
I find this stuff really difficult. I have to constantly check that I understand what I’m writing about. It’s hard.
But the test that I’ve found keeps working for me is going back to Birmingham. The data is clear, the people in Birmingham who have died of Covid-19 lived overwhelmingly in the most deprived parts of England.
But the people in Birmingham who have died of Covid-19 lived equally in places with the lowest, highest, and most middling levels of deprivation.
If you can understand why those two statements are both correct at the same time you probably understand what deprivation means. If you can understand why an equal number of excess deaths in all deciles of Birmingham probably means that the age-standardised mortality rate increase associated with Covid-19 is higher in the most deprived areas of Birmingham you probably understand age-standardisation well enough to shout at me for not doing what you’d have done.
Most of the time I can understand both. But not always. So I’ll be keeping my eye on Birmingham’s data.