'Primitive' assumptions don't alter our payroll results

Even if we assume formalisation pushed up the formal employment, can we deny we were under-reporting such payroll data earlier?

Soumya Kanti Ghosh Pulak Ghosh

Last Updated : Jan 23 2018 | 10:38 PM IST

Add as Preferred source

We would like to thank the two articles that came in the last couple of days as a commentary on our recently released payroll report. The critics make us feel better about the deep impact and potential of our work in the employment scenario and broadly in India’s policy making as this is probably the first use of data-driven insight in policy decision, contrary to the usual speculative and impressionist policy analysis rampant in the public domain.

While some of the assumptions are being coined “primitive” (including one in this paper on January 23), in our opinion the assumptions are a careful and conservative way of calculation with no effect on the end result. Let us explain in detail.

First, based on discussions with experts we assumed a haircut of 50 per cent may be relevant to de-duplicate the counting in Employees’ Provident Fund Organisation (EPFO) and Employees’ State Insurance Corporation (ESIC). Hence, even if somebody wants to do more haircut he can do the same. The good thing is we estimated both with EPFO and ESIC (with 50 per cent haircut) and without any addition from ESIC. That number without any addition from ESIC makes it 6 million, a pretty big one. So the results are still valid. Interestingly, we have strictly excluded payrolls in the 22-plus age group (3.74 million, of which 1.1 million are in the 22-26 age group). Contrast this with the 50 per cent cut that resulted in the 2.36-million omission. Hence, we are even conservative in adjusting our ESIC database by deleting an additional 1.4 million (3.74 million net of 2.36 million). Clearly, the most conservative estimate.

Second, the 25 per cent dropout from the 8.8 million graduates is based on our discussions. Again, this does not impact the result in any way as it is just laying down the contours of the qualified and non-qualified labour force coming into the job market.

Third, even if we are assuming people in the age group of 18-25 for EPFO, the moot point is that the cluster is around 22. And a person graduates at 21, does his/her MA at 23 and then may change job by 24 or above. Hence, 22 implies the best possible age for the first job and it is as such hard to believe that at age 22 people are getting into their second or third job creating multiple PF accounts. One must also know that the EPFO data set refers mostly to people who are not getting high salaries. We have also not yet taken any people more than the age group of 25 in our analysis, which is significant for EPFO and amounts to more than 5 million. Hence, again this assumption does not alter our results.

It is hard to believe that at the age of 22 people are getting into their second or third job creating multiple PF accounts

Next question: Why did we chose FY17 and FY18 for our analysis? The latest two years were picked up because the latest information makes sense for any policy. The earlier data was too difficult to get because automation took place two years ago and before that it is a mess with disparate databases and much duplication.

There is also the argument that the authors had privileged access to this data and this is not in public domain. The question is, why it should matter? Going by that logic, all doctoral thesis or research papers use data that may not be in public domain. All other data, like the data used by the Centre for Monitoring Indian Economy, is not in public domain; data used in the Economic Survey is not public data; all Reserve Bank of India data is not public data or becomes public after a lag. Does it, in any way, alter the published results?

Finally, how did the goods and services tax and demonetisation help in formalisation? We believe this argument is a non-starter. Even if we assume formalisation pushed up the formal employment, can we deny we were under-reporting such payroll data earlier and a formal employment with social security benefits is a godsend for such existing employees?

Our main motivation of this work was to use state-of-the-art big data analytics and economics to understand the job scenario in India as it is and in real time. We did avoid the extrapolation of sampling methods. We would be happy if India starts finding insights based on real-time data rather than impression economics.

Soumya Kanti Ghosh is group chief economic advisor, State Bank of India; Pulak Ghosh is professor, IIM Bangalore. Views are personal