Everybody Lies
Big Data, New Data and what the Internet Can Tell Us About Who We Really Are
Seth Stephens-Davidowitz
Dey Street Books
352 pages; Rs 884
In an age where data holds the compass to our progress, it is vital for us to understand how it works. This book is for anyone who is curious about how people tick. Funnily, although this book is about Big Data, author Seth Stephens-Davidowitz consciously refrains from defining the term. Instead, its many facets are revealed.
Everybody Lies takes the reader through the powers of Big Data, its uses and shortcomings, and finally, those aspects of Big Data of which we should be wary. Sprinkled with hilarious anecdotes, case studies and a clever writing style, Everybody Lies makes Big Data seem like a lot of fun.
The conventional approach to data ignores a large chunk of information —behaviour patterns on the internet. Let’s take door-to-door surveys. It is common for people to not be completely unbiased in surveys, resulting in distorted conclusions. Conventional data also does not allow us to zoom in to specific data subsets. This is where Big Data swoops in. For instance, think of how many people are likely to admit to being racist in a conventional door-to-door survey. Not many, surely. The author, however, demonstrates otherwise. He reveals how running a search on Google Trends with the right keywords can reveal astounding insights on consumer behaviour, racism, and even criminal tendencies.
Mr Stephens-Davidowitz begins the book with a Thanksgiving anecdote about his family urging him to get married, and advising him on the kind of woman he must marry. Aside from being completely relatable, this anecdote sets the tone for the rest of the book. Mr Stephens-Davidowitz uses this incident to explain that data science is intuitive because it is all about spotting patterns in behaviour and predicting how one data point will impact another. In fact, our “gut feeling” is probably our most trusted subconscious dataset. Through the following chapters in the book, he builds nuances to this observation.
Just like the gut, Big Data is best when it is intuitive and simple. So, the more complicated the data analysis, the more it fails. Mr Stephens-Davidowitz also places heavy emphasis on data available on Google, Facebook and other websites, turning innocuous information on the internet into a data goldmine. In fact, an entire chapter in the book is dedicated to Freudian slips, and how Big Data from the internet can be used to debunk the connection behind slips of tongue and Freudian slips.
Mr Stephens-Davidowitz argues, and reiterates through the book, that Big Data has four big powers: It keeps offering new types of data; it is honest; it allows zooming in on small subsets; and it allows causation to be detected.
True to the data scientist in him, the author dedicates a few case studies to explaining the first holy tenet of data scientists: Correlation is not causation. The gut can sometimes be wrong. He explains this using counter-intuitive case studies across the book — surrendering that sometimes “the world works in precisely the opposite way as I would have guessed”.
In the second part, the author unravels Big Data’s prophetic powers. If you ask the right questions, a good dataset can tell you how successful you will be one day. Big Data is also big on doppelgangers, the author shows. It relies on the information it has on people similar to you, and makes logical conclusions about you. Mr Stephens-Davidowitz submits that these discoveries can be milked to make poignant predictions about human behaviour.
In the final part, Mr Stephens-Davidowitz skilfully addresses Big Data’s leading worry: Does it threaten personal privacy? The author does not think so. He concludes that Big Data cannot predict an individual’s actions based on her online history. While it may be possible to predict the actions of clusters of people (for instance, which district is least likely to vote at the upcoming elections), it is not possible to apply the same logic to individuals — not just because it is unethical but also because it is impractical. This is probably why, even if a person googles an item on how to murder someone, it is unlikely that the police will come after him immediately. Big Data thankfully leaves our embarrassing (and sometimes worrisome) searches alone.
However, addressing a tangential concern, Mr Stephens-Davidowitz says nothing stops companies from using Google to know a person better. Banks can determine their borrowers’ creditworthiness and potential employers can gauge a candidate’s employability on the basis of the search results. However, if it’s any consolation, Big Data empowers consumers equally, potentially allowing them to impact corporations (for instance, the author observes that customer reviews on Yelp have been shown to impact restaurants’ revenues significantly).
Arguably, Big Data and data protection are topics of the future but, often, their analyses are too technical to comprehend. Everybody Lies, on the other hand, superbly demystifies Big Data for the reader. It breaks down technical aspects of data science with ease and engages the reader with fascinating data experiments. But above all, this book reminds the reader that although everybody lies, Big Data is the powerful digital truth serum we need.
The reviewer is a Research Associate with The Takshashila Institution