Business Standard

The fault is not in our stats

Image

Devangshu Datta New Delhi
STANDARD DEVIATIONS
Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
Gary Smith
Bloomsbury; 326 pages (paperback), including index; Rs 399

Chinese numerology claims "4" is unlucky. Is belief in this superstition self-fulfilling? An American study says so. Apparently, people of Chinese origin are more likely to suffer heart attacks on the fourth day of every month. Another study asserts that people with initials that make up nice acronyms (such as "ACE") do better than people with "nasty" initials (such as "GAS").

Academic research can be directed in many unusual directions. But every branch of science, and most of the social sciences, relies on the correct handling of data. The cross-disciplinary reliance on crunching of data in multiple ways has led to many insights. But it can also lead to absurd conclusions such as in the studies described above, which all involved convoluted methodology and cherry-picking.

Poor statistical logic or flawed methodology can cause more subtle errors. This book describes many cases where data were badly handled, or deliberately falsified, and some cases where those data were brilliantly analysed. As such, it can be read as a collection of interesting cautionary tales seasoned with a little moralising.

Gary Smith is the Fletcher Jones Professor of economics at Pomona College, California, and an expert on statistical methods. He has published papers, textbooks and software on capital markets, financial asset management, poker, American football, bowling and sports injuries. In his first book for general readers, Professor Smith uses charts and tables extensively in setting up case studies. But he has avoided arcane statistical methodology and focused on descriptive textual analysis.

The weakest part of the book is that each chapter ends with a homily entitled "Don't be fooled" summarising the material. Scepticism - a reluctance to be fooled - must be part and parcel of a correct statistical approach. But the lesson need not be driven home with sledgehammers.

That said, this is a fascinating set of cross-disciplinary case studies across centuries. For instance, the University of California, Berkeley, was accused of gender bias because in percentage terms, it rejected more applications from female students, than from males. Yet most departments in Berkeley accepted more women than men. The paradox was resolved when it was figured out that rates of acceptance varied between departments and men applied in larger numbers to departments with highest rates of acceptance, while more women applied to departments with high rejection rates.

An even more stunning tour de force was Abraham Wald's recommendations that led to lower aircraft losses in World War II. The Royal Air Force wished to selectively armour-plate bombers. Wald analysed damage patterns on aircraft, which had survived enemy fire. He noted there was never damage to engines or cockpit areas and realised that aircraft hit in those spots did not survive.

Professor Smith also looks at John Snow's splendid epidemiological study, which pinpointed the cause of cholera by analysing water supply in 19th century London. There were two companies supplying water to the same parts of London. One piped water in from an uncontaminated source upriver in the Thames, while the other supplied from highly contaminated local sources. Households using the latter supplier suffered 10 times as many cholera cases.  Ergo, it had to be the water.

However, most of the case studies are about analysing flaws, errors and absurdities, as the title suggests. The author points out multiple cases where methodology was flawed, or the data cherry-picked to fit conclusions that can be published. Sometime this may be deliberate, and sometimes it may be honest error. Either way, the damage can be huge.

A 1998 paper in The Lancet (Britain's foremost medical journal) claimed normal children developed autism after being administered the Measles-Mumps-Rubella (MMR) vaccine. It was based on poor data and outright falsification. The lead author was hoping to market an alternative vaccine. The Lancet withdrew that study in 2010 claiming "it was utterly false". But by then, millions of parents had refused to let their children take ANY vaccinations, citing garbled versions of the paper. The "vaccines cause autism" claims continue to proliferate.

Then, there was the Carmen Reinhardt-Kenneth Rogoff paper that concluded gross domestic product (GDP) growth dipped when government debt crossed 90 per cent of GDP. That was cited as the equivalent of gospel since it was coming from two highly respected economists. Spreadsheet errors in calculation render it moot.

In many cases, an academic may simply be too eager to find an unusual result and publish. Statistics can be deliberately, or unknowingly, misused and misinterpreted. Governments and politicians misuse stats to buttress campaigns. Financial advisors cherry-pick statistics to push products. Doctors cannot make sense of false positive and false negative results in tests.

Professor Smith suggests our evolutionary hard-wiring is designed to detect patterns - dark clouds are associated with rains, loud roar equals predator. But patterns may be created by pure chance. Toss a coin 10 times and there's a 47 per cent chance it will come up with a sequence of four successive heads (or tails). This book could help readers develop a radar to detect when a pattern may be spurious and when it may be significant.

Twitter: @devangshudatta
 

Don't miss the most important news and views of the day. Get them on our Telegram channel

First Published: Feb 03 2015 | 9:25 PM IST

Explore News