Or, how National Sample Survey data can be misinterpreted and misused
A few years ago, the Arjun Sengupta Committee misused the National Sample Survey Office (NSSO) survey data to make a claim that was not correct and not fair to the entity that generated that data. The Sachar Committee did something similar. The Tendulkar committee did something even more strange. The government never rejected any of these findings and, in some cases, accepted the reports with all their flaws.
In all three cases at least some of the data came from the National Sample Survey’s large-scale consumption expenditure survey. This survey samples greater than 100,000 households every five years. It’s a well-sampled survey, with a tried and tested questionnaire, and the survey staff are better trained than all private sector survey organisations. But despite all of this, these surveys are able to capture less than half the total household expenditure today. This data, which forms the basis for a lot of socio-economic policy, was able to explain India’s total household expenditure much better in the past, but that ability has been worsening steadily.
What are the problems with these surveys? There are many, and across many stages. Good sampling requires us to have base figures for every household in the area being surveyed; households to be surveyed are then chosen randomly from this; such information is collected only every 10 years by the Census and five years or so by the Election Commission. People shift houses, migrate, new areas come up and so on in the intervening period. Also, many households refuse for want of time or inclination. Typically, there is a high propensity for those with higher incomes or those with fewer members to refuse. This, therefore, introduces a selection bias. Further, the questions need to be asked and answered honestly. Some bad surveyors have been known to manufacture the data. In other cases respondents may lie — either overstate or understate, forget, or simply refuse to answer certain questions. Incomes are one item on which such reporting-related problems are very high. But such reluctance also exists in answers to questions on issues such as savings, investment, jewellery ownership, household expenditure, sexual behaviour, religion, caste and gender-related perceptions, preferences or practices and so on.
What does the raw data so collected mean? Only the naïve will believe that it is good enough to use straightaway. The estimation process converts such (admittedly) flawed raw data into an estimate. This estimate is then what needs to appropriately inform policy. Good researchers both within and outside the government are known to throw away some data, construct multipliers and scale up factors, calibrate the data, and use many other methods to convert the raw information into good quality estimates.
Also Read
No private entity actually has put its raw data in the public domain on a regular basis, most put out aggregates after having done their estimations and corrections. But the NSSO is more honest and transparent in what it does. It shares the raw data as it has collected with anyone and everyone, and leaves it to the researchers to do their own estimations the way they see fit. Good researchers should do further work on this raw data before using it for analysis or debate. Bad researchers, however, straightaway derive estimates from the raw data and, depending on their biases and inclinations, report such results.
The Sengupta committee’s famous finding that 77 per cent of the population spent less than Rs 20 a day occurred because of this poor use of data. The problem, which all researchers should know, is that people do not report their incomes and expenditures fully. The Sachar committee was more honest in what it did. In trying to figure whether biases existed against Muslims, it compared how much Muslim and Hindu households’ spending differed on a per capita basis. But that was also not good economic research. A better measure of workplace bias would have been when household expenditure or incomes are compared on a per income-earner basis. Moreover there are also occupational differences and so forth, and good research should have corrected for all of that.
The Tendulkar Committee was worse. That whole report is based on a blanket statement that rural poverty is being under-estimated but urban was not. Where did this come from? What did the NSSO do or not do to deserve this criticism? Why are its methods better for urban areas but not for rural areas? Actually, surveying rural households is always easier and tends to give better raw data, every surveyor will tell you that. But all three reports were accepted by the government.
The point is that, the government itself is misusing data, and its estimation and interpretation for its own ends. To do so, its researchers and consultants are criticising raw data when the problem exists with the research itself.
But that does not absolve the NSSO. Why is its raw data explaining less than 50 per cent of all household consumption, and why is this ratio falling over time? Why does it ask queries for items to which it knows it will not get correct responses? Why does it take so long to release the data? The list of problems is not short.
The problem with the NSSO is actually a systemic problem. We find it across the whole statistical set-up in the government of India. At one level there is a degree of honesty that is rare in any developing country in not messing with the data. At the other we find callousness, delays, stubbornly sticking to processes of the past and little or no innovation for some many years.
The answer lies on many fronts and the fact that the estimates are steadily worsening implies that a change is required. I deal with it in the next article.
The first in this series “Wanted: New ways to figure the facts” appeared on August 6. The third and final part will appear on August 20