How many people in India have been infected with coronavirus?

In a briefing on June 11, the Indian Council of Medical Research presented estimates of levels of Covid-19 infection in India and its infection fatality rate – the proportion of those infected who are expected to die. These estimates were based on incomplete results from a seroprevalence survey conducted during May to measure the prevalence of antibodies to SARS-CoV-2, indicating likely past infection.

Some estimates were surprising, but there was no indication of their likely accuracy. They were, for the most part, reported uncritically in the press.

ICMR’s claims

High prevalence in April: The headline claim was that 0.73% of the population in 65 districts had already been exposed to Covid-19 by the end of April. These districts were presumed by most commentators to be representative of the country as a whole. If so, around 10 million people nationwide would have been infected by the end of April. This would mean that testing had detected just 1 in every 280 infections.

Low fatality rate: The ICMR also claimed that the infection fatality rate of Covid-19 in India is just 0.08%. Put simply, fewer than 1 in 1,000 of those infected would die. This would make Covid-19 in India far less deadly than around the world where the majority of infection fatality rate estimates have been in the range 0.5% to 1%. This bolsters problematic narratives about Covid-19 fatality, in particular suggestions of Indian exceptionalism.

Were these claims credible, and how do they fit in with what we know of Covid-19 in India and elsewhere? In order to analyse the ICMR’s estimates, let’s look at the data from another angle.

Estimating prevalence and fatalities based on early data

Can we estimate the prevalence of Covid-19 in India without seroprevalence data? One approach is to use data on Covid-19 deaths, along with some assumptions about fatality rate and various delays. After all, deaths should reflect true infection levels.

But the problem is there are many ways that Covid-19 deaths might go missing. Both narrative evidence and the data itself tell us this has happened on a fairly large scale. For example, evidence of Covid-19 fatality underreporting on a significant scale has come to light from West Bengal, Delhi, Tamil Nadu, Maharashtra and Madhya Pradesh.

So here we trust only the early fatality data, and assume that upto April 10, Covid-19 deaths were being accurately identified and recorded. Early data is then used to set various quantities that can be used for the estimates. The details and explanation of this approach are in this supporting material here.

A key quantity needed for the estimates is the case-fatality reporting delay. This is the average difference between the times after infection when cases are recorded and when deaths are recorded. If, say, this is currently 10 days, then a surge in cases reported today should result in a surge in deaths reported 10 days from now.

The case-fatality reporting delay in the early period is all we need in order to estimate fatalities and there is some evidence it was around eight to 10 days. But, since this is rather uncertain, we explore a range of values between two days and 10 days.

To estimate infection levels, we additionally require the infection fatality rate, which we allow to vary between 0.3% and 1%. Some results, based on data from this crowd-sourced website are in the table that follows. Entries in bold are plausible “intermediate” estimates.

	C-F delay: 2 days	C-F delay: 7 days	C-F delay: 10 days
Missing fatalities by April 30	16%	49%	62%
Missing fatalities by June 29	27%	61%	74%
Infections by April 30 at 0.3% IFR	1.6 million	2.7 million	3.9 million
Infections by April 30 at 0.5% IFR	0.94 million	1.6 million	2.3 million
Infections by April 30 at 1% IFR	0.47 million	0.81 million	1.16 million
Infections by June 8 at 0.3% IFR	7.7 million	14.4 million	21.7 million
Infections by June 8 at 0.5% IFR	4.7 million	8.7 million	13.0 million
Infections by June 8 at 1% IFR	2.3 million	4.3 million	6.5 million

Assumptions behind these estimates are given in detail in the supporting material. For the most part, they lead to underestimation of cases and fatalities. Nevertheless, to obtain 10 million infections at the end of April as suggested by the ICMR, we would need some combination of rapidly dropping case detection, a long case-fatality reporting delay, and a low infection fatality rate.

Credible claims?

Were the ICMR’s prevalence figures credible? The ICMR’s claim of 0.73% prevalence in late April lies quite far above the range of estimates above. The very low case detection and infection fatality rate estimates it implies should be a warning sign about its credibility. Since the ICMR’s analysis has not been published, one is forced to use some detective work to understand how this figure might have been calculated. The key finding is that the figure is highly sensitive to minor inaccuracies in the estimation of test-kit properties and should be treated with scepticism.

Was disease surveillance really as poor as the ICMR’s numbers suggest? Case detection of under 5% is quite normal, but the 0.35% suggested by the ICMR’s numbers is very poor indeed. This would, if correct, be a huge indictment of India’s Covid-19 surveillance systems. Again, it seems more likely that prevalence was overestimated rather than that case detection was really so poor.

Was there community transmission by the end of April? The most glaring inconsistency in the ICMR’s briefing was the claim of high prevalence but no community transmission. To claim that there has been no community transmission is to assert that the source of most infections can be identified. But if, as suggested by the ICMR’s estimates, only one in every 280 infections had been recorded, the evidence for such an assertion would be flimsy indeed.

Ironically, the ICMR’s estimates suggest that lockdown failed to an extent beyond what even critics of lockdown have suggested. Covid-19 would need to have spread rapidly under the radar during lockdown to result in 10 million infections by the end of April.

A woman waits to give a sample at a COVID-19 testing centre. Credit: Arun/Sankar AFP

Did the ICMR acknowledge fatality undercounting? Curiously, the ICMR’s claims could be interpreted as an acknowledgement of significant death undercounting by mid-May. If, indeed, there had been about 10 million infections by late April then, at the ICMR-estimated infection fatality rate of 0.08%, we would expect a total of about 8,000 fatalities three weeks later. In fact, only 3,584 had been recorded. Thus the assertions about prevalence and infection fatality rate only seem consistent if about one in two fatalities had already been missed.

What is remarkable in this whole episode is how rather extreme and surprising claims were made without caveats or discussion. These were not questioned in most articles reporting the claims, and were not followed up with more detail or explanation. Almost a month on still no data has been shared. Overall, this seems to be another instance where potentially valuable data has been devalued by careless analysis and a lack of transparency, hindering the fight against Covid-19.

Murad Banaji is a mathematician with an interest in disease modelling.

We welcome your comments at letters@scroll.in.