Of 20 government datasets that IndiaSpend analysed, the collection of data or its public release has been delayed for 12 datasets. Crucial data such as from the census, household expenditure and poverty estimates, which influence other data sets and policymaking, are over two years old.

Political manipulation in data releases and a weakened data infrastructure are primary reasons for these delays, experts say. Data which are already available in the public sphere suffer from issues of accessibility, issues related to the format data is available in which impact its usability, and other quality issues.

As the year comes to a close, we look at the problems with India’s data landscape.

Source: Author’s compilation based on government data. Credit: IndiaSpend

Data in India is always released with some lag, most of it with a year’s lag. For instance, crime statistics from the National Crime Records Bureau are, in recent years, being released with up to a year’s lag, health and nutrition data such as the National Family Health Survey are collected with a periodicity of three years and released months later.

One of the reasons why India puts out data with a year’s lag is due to the sheer number of entries and the time taken for collection of the nationwide data, according to this report by Citizens for Justice and Peace, a Mumbai-based Human Rights movement.

Large data does take time [to collect and analyse] and India does not have adequate data management and analysis capacity for large data, said Avani Kapur, a fellow at the Centre for Policy Research, where she leads the Accountability Initiative.

However, many data sets are seeing delays of more than a year. For instance, ‘Basic Road Statistics’, a report prepared by the Ministry of Road Transport and Highways, was last released in 2018-’19.

“Data (census data in particular) matters for voting, for Finance Commission transfers [of funds from the Union government to the states] and even for primary health centres, as these are decided on population-based norms,” says Professor S Chandrashekhar of the Indira Gandhi Institute of Development Research in Mumbai.

From determining the number of schools, and primary health care centres to voting constituencies, data are critical for public policy. Of the data sets listed above, the Census of India, Household Consumer Expenditure Survey and Poverty Estimates are most crucial. These datasets have an impact on the creation of other datasets. For example, experts have raised questions about the future of the Socio-Economic Caste Census data given the delay in collecting census data.

The delay in data releases is impacting government schemes and programmes, and results in unreliable estimates from other surveys on consumption, health and employment, which depend on census data to determine policy and welfare measures, experts say.

The Census is a decennial survey which determines, down to the village level, the population, literacy and migration, among other aspects of India’s population. The 2021 census has been delayed due to the nationwide Covid-19-induced lockdowns, and even as restrictions had been lifted by the central and state governments, the timeline for the next census is unclear.

On December 14, the Minister of State for Home Affairs, Nityanand Rai, told the Rajya Sabha that the government had spent, until then, Rs 24.8 crore in developing mobile and web applications, a portal, and other census-related activities.

The Household Consumer Expenditure Survey is carried out every five years by the National Sample Survey Office and provides insights into household consumption patterns and levels. These surveys assist the Union as well as state governments in planning and policy formulation. The latest such survey data are available from 2011-’12. The government had decided not to release the data for the 2017-’18 survey, pointing to “data quality issues”. Training for the data collection process for the 2022-’23 survey began in July 2022, according to this release from the Ministry of Statistics and Programme Implementation.

Poverty estimates: The erstwhile Planning Commission had been releasing data on the number of people below the poverty line every five-six years since 1973-’74. The last estimates for 2011-’12, based on the Tendulkar poverty line, were released in July 2013. “After this, no official poverty estimates in India have been released,” according to a 2020 working paper published by the Ministry of Rural Development.

Experts point to political manipulation by the government as one of the main reasons for the delay in data releases. “Data complicates policymaking because it makes decision-making​​​​​​ that much more difficult , particularly for a party that comes to power with a rigid and an unrealistic agenda and this generates pressure​s to delay or even manipulate data​​​​,” explains Vikas Kumar, associate professor at the Azim Premji University who teaches a course on the political economy of government statistics.

Kumar also points out that political manipulation of surveys and their outcomes is not new; successive governments have done this but the scale and frequency of interference has grown over the past decade. For example, the 2001 religion data were not released until after the 2004 Lok Sabha election that brought the United Progressive Alliance-I to power. The 2011 Census was conducted by the United Progressive Alliance-II, which delayed the release of the religion-based data that would have been ready as early as April 2013, Kumar said. It was released in 2015.

Delays in the release of data can be attributed to the unwillingness of the government statistical system to face public scrutiny and political interference, according to a 2020 paper co-authored by​​ Vikas Kumar. He adds that the growing communalisation of​​​ politics has specifically affected the timeliness of the census data on identity.

Kumar also pointed out the progressive weakening of the infrastructure of government statistical agencies from the 1970s. “Deinstitutionalisation and politicisation​ of bureaucracy that began in the 1970s if not earlier picked up pace during the 1990s when, under structural adjustment, the government had to tighten its belt. This led to budget cuts​ for politically unimportant​​ departments like statistics, which meant a hiring freeze, and delay in necessary reforms to cope with the changes in the political economy after liberalisation,” Kumar said.

Delay in government data in India is part of a larger issue. Experts also point out other concerns such as accessibility of the available data, formatting issues, quality of data issues and withdrawal of data after release.

Data Formats

Very often, the available data are unusable, since they are released in formats such as Portable Document Format and sometimes as scanned images. “There is a wealth of data available. The government needs to make it available in an easy, accessible manner, where one can download it and have it in Application Programming Interface format,” Chandrashekhar said. Most data when it is first entered onto government websites, is available in excel or comma-separated values file formats, yet it is not made available to the public in the same format, Chandrashekhar further explains.

Quality issues

The Health Management Information System data published by the Ministry of Health and Family Welfare is a public portal where data from over 2,00,000 public health facilities – most of them government-run and rural – across India are uploaded every month. The data currently available date back to 2008, and are available down to the sub-district level for every state, according to the analytical report of 2019-’20.

But The Health Management Information System data does not always match that from other datasets.

For instance, the National Family Health Survey-4 (2015-’16), an independently conducted survey, showed India’s immunisation coverage at 62%; where many developed states’ performance had worsened. For a comparable period (2014-’15), the Health Management Information System showed the percentage of fully immunised children to be consistently above 100%, as IndiaSpend reported in May 2019.

Quality of data issues have also been pointed out in other reports such as the Medical Certification of Cause of Death reports published by the Civil Registration System; Only a fifth of deaths in 2019 were medically certified, and more often than not with the underlying cause of death recorded incorrectly, as IndiaSpend reported in June 2020.

Removal of data

There have been reports of the removal of data from the public domain as well. In August 2020, IndiaSpend used the Health Management Information System data to show that immunisations for children as well as access for adults to drugs and treatments for life-threatening diseases had declined sharply during March-May 2020. Following the publication of the article, the data became unavailable on the Health Management Information System website, as we reported in July 2021. The data was then released by August 2, 2021, however it still remains ‘provisional’.

There have been instances of data being collected but not being released. For example, the government has made it clear that it has no proposal to release the Socio Economic and Caste census.

The 2017-’18 data on household consumer expenditure were not released owing to “data quality issues”, as we said above.

This article first appeared on IndiaSpend, a data-driven and public-interest journalism non-profit.