Don’t miss the latest developments in business and finance.

The crisis of data credibility in India and the resulting policy challenge

While flaws in collection methods could have a bearing on an individual's citizenship status, the larger issue is the use of such data in policy formulation and implementation

census
Abhishek WaghmareSanjeeb Mukherjee New Delhi
9 min read Last Updated : Feb 26 2020 | 4:01 PM IST
Rahul Patil, a young and progressive farmer in his thirties, felt he was being confronted when a person, with a smartphone held in his hand, asked him details about the properties he owns, the small businesses he runs, and so on. The enumerator was a fellow from his village he already knew, one with whom he used to walk to school back in the day, in a well-to-do village 100 miles from financial capital of Mumbai. And yet, he felt insecure while sharing his details—read data—about himself and his family.

Though he found the questions familiar, he recalled that only teachers or known officials from the village council used to ask them. He felt confronted especially as he regarded the data enumerator as an unofficial or a private person, not a representative of the government.

The National Statistical Office (NSO), for the first time, is carrying out the economic census—collecting data on enterprises from Indian households—using digital means. So, instead of school teachers coming with annotated sheets to enter data with a pencil, youngsters who have barely passed primary school, but are proficient in using smartphones, are enlisting household data for the purposes of the 7th Economic Census.

“I know the enumerator personally. But what would a fifty-year-old farm labourer from adjoining village, who does not happen to know the enumerator, do? Will he share data correctly with the confidence as before…difficult,” Patil sounds worried on the telephone, about his own data that he has already shared.

Taking cue from NSO’s Economic Census, the Registrar General of India is now embarking upon the exercise of carrying out the decadal Census—the biggest exhaustive data collection exercise in India that we all know—by giving the scarcely qualified enumerators the discretion to choose between two options: either note down the household data on prescribed official forms, or record it on the smartphone app designed by the government. The Census data collection that would begin in March 2020 would be a mix of paper-based and app-based data. 

Picture 1: NPR mentioned on website, but notification unavailable

Screenshot of web page of Census Management and Monitoring System which has a hyperlink for NPR notification, but the actual notification appears to be removed (Accessed on 25 February, 2000 hrs)

This is but one anecdotal evidence that represents the real situation out there, where household economic activity is recorded. If a small change in the way economic data is captured reduces trust, one can only imagine the plight of the data collection exercise, and the impact on the quality of data so collected, when the controversial National Population Register (NPR) would be rolled out. A section of the population, especially Muslims, fear that the NPR would put their citizenship in peril after the new Citizenship Amendment Act has come into force.

But a crisis bigger than that emanating from data collection issues, is the one on the use of data. Various economic data—irrespective of whether it is collected from households, or from banks or other government agencies such as Aadhaar—is used by the government for policy formulation and implementation. 

The questions raised on the quality of data collection and use put policymaking in India under peril. Let us look at a few instances. 

Data quality of national accounts 

Last year, the government scrapped its own data on consumer spending by Indian households, for the year 2017-18, for “data quality” issues. Now this data is the only dataset available in India that comes reasonably close to income levels of Indian households. Further, this data gives the NSO the very baseline to estimate the consumption for any year, in the calculation of the gross domestic product (GDP).

Then, the GDP data is used by the government to predict revenues for subsequent years, and by companies to project their output or sales. In the absence of reliable consumption data, the predictability of public finances and finances of private corporations comes under stress. 

In the current financial year, government’s revenues are nearly stagnant compared to the previous year, and growth in sales revenues of listed private companies is at a multi-year low, mimicking the economic slowdown. The initial expectation that GDP would nominally grow at 12 per cent, and government revenues would grow at above 16 per cent, has been destroyed by more recent data. 

Now these national accounts are the macro national view of the ground situation, which is the incomes of individual households, who are also the beneficiaries of most government schemes. 

Take the examples of government’s flagship health insurance scheme Ayushman Bharat, launched in September 2018. Its beneficiary list was ready: it just had to take the bottom half of the population pyramid from the socioeconomic census (SECC), a version of Census 2011. Rather, the entire swathe of rural development schemes—from affordable housing to clean cooking fuel—are based on the data set of the socioeconomic census. 

If a rural household owns a car, it is not eligible for, say, the health insurance scheme. With falling incomes and rising precariousness in the labour force, coupled with the loss of trust with the enumerator, many households tend to not “declare” that they own a car, or that they run a business. This affects the quality of data. 

However, it must be noted that some such problems have been addressed by another data exercise of the government, which is the Aadhaar. Citizens who express concern on data collection exercises also admit that Aadhaar is helping plug leakages, and that accurate identification of beneficiary is reducing corruption in welfare schemes. 

From giving monetary support to expecting mothers, to ensuring that money intended for anganwadi sevikas (women workers for child care) reaches them, to expanding the ambit of loans to women-led self help groups, confluence of two data sets, Aadhaar and SECC, has helped the government to work more efficiently. 

But will Aadhaar, and correct identification of people for social security or provision of private goods, be a sufficient level to which data helps policymaking? At the outset, it does not seem to be so. 

Employment and policymaking 

Employment data is another such set which is supremely important for policymaking. Agriculture, labour intensive industries in manufacturing, and construction are the main employers in the country, and good growth in investments (capital formation) is essential to create new jobs. While the investment trend has been clearly looking down in the past few years (they are growing at mere 1 per cent in the current year), two sets on data offer completely different conclusions. 

The NSO’s employment survey puts joblessness in India at its highest level—6.1 per cent in 2017-18—since Indira Gandhi’s prime ministership. On the other hand, the government expressly voiced its support to payroll data from the Employees’ Provident Fund Organisation (EPFO), as it is real time, and not a survey output. 

But while the former is a consistent and comparable data since decades, EPFO data is intermittent at best, and its interpretation is far from simple. 

For instance, in the first few months of 2019-20, number of people exiting from the PF database was more than number of people joining. This could mean that more jobs are being lost than those being created, or simply that employers are becoming increasingly incapable of paying employees’ provident fund contribution, or something else. But there is no clarity on the meaning of the data, yet. 

Further, the number of new EPF joiners who were subscribers at any point of time in the past (re-subscribers) form a majority of the net new subscribers in recent months. This too clouds the pace of job creation than to improve its understanding. In this scenario, it becomes hard to design policies, and those designed do not address the depth of the problem, affecting national development. 

Backlog in the national data repository

The government, riding on the broadband revolution nearly a decade ago, came up with a portal with the name www.data.gov.in. It was supposed to be the one-stop shop for all public data in India, from national accounts to social sector schemes, and private ownership. 

But even a cursory look at the portal shows that basic data such as the number of schools which do not have electricity connection is available only up to 2015. 

Picture 2: Official national data repository lacks recent data

Screenshot of the webpage of data.gov.in on electricity provision in Indian schools. Data for 2016-17 and later are not available. (Accessed on 26 February, 1200 hrs)


Quality issues in migration data: Problems in urban planning

A critical problem could arise while capturing migration data, which is the only data that gives idea about the contours of growth in urban population, the size and spread of rural-urban transformation, and thus, the extent of planning needed to absorb it. 

Noted economist Amitabh Kundu and former member of the National Statistical Commission, P C Mohanan, who quit the body in protest against the loss of data credibility, have noted serious anomalies in the 2001 and 2011 census data on migration. 

They found that the proportion of migrants who stayed at the place of enumeration for less than 10 years is lower than the number of “residual migrants” according to the 2011 Census.

“How could there be a very high growth in population of migrants who stayed at a place for more than 10 years, if the migration rate itself has gone up in the 2001-11 decade? The only logical explanation could be that many among decadal migrants recorded more than 10 years as their duration of stay at a destination, when it was not the case” says Kundu.

Their study also finds an anomaly that supports their proposition: 68.7 per cent of women migrants noted that they were migrated for a period between 10 and 20 years. This is much more than the women migrants who recorded less than 10 years as their duration in 2001 (65.4 million) — “a logical impossibility” they say. 

Mohanan, on the other hand, underlines the more recent crisis data can face. 

“I feel the biggest mistake that they did was to tag the population census with NPR, which has created the confusion, in a bid to save money and complete two surveys in one visit. The migration information in the population is likely to be highly biased now,” he says.

Now, especially with the Damocles sword of National Register of Citizens hanging over a section of the migrated population, there is a possibility that policy goals could get disturbed further. 

Topics :Citizenship BillNRC

Next Story