A recent series of investigative reports alleges that Chinese analytics company, Zhenhua Data Information Technology, has collected data on 10,000 Indians, among them several prominent personalities. Now how does one collect and analyse data of this nature? Well, much of this data is public, or semi-public.
Any 21st century individual spews data continuously. Also note that while the Supreme Court of India affirmed, back in 2017, that privacy is a fundamental right, India has no personal data privacy law. Moreover, the proposed legislation gives the government a free hand to collect whatever data it chooses, so the Bill will offer zero protection anyhow against government surveillance, if it is passed.
Most people have social media profiles, with accounts on Facebook, Instagram, WhatsApp, Twitter, TikTok, YouTube and so on. We also have public email ids, very often hosted on Gmail and associated with Google accounts. In addition, people keep blogs, or maintain personal, or professional websites. Many people also have profiles on LinkedIn and similar professional sites. Some of us are signed up for gig sites like Upwork and We Work Remotely. Academics have citations and references to their papers and their university affiliations visible online.
Much of this data can be collected easily and legally. The “collector” gets a good picture of the subjects’ work profiles, economic strata, educational backgrounds, tastes in entertainment, friends’ circles, political concerns, social attitudes, and so on.
Locational data is also useful. This is not considered private data under the proposed Indian personal data privacy law. Many businesses are built upon location — food-delivery services and taxi-hailing services, for instance.
Your handset has a unique International Mobile Equipment Identity (IMEI), or two IMEIs, if it’s dual-sim. So that specific handset can be tracked along with the sim, which is also unique. (IMEI and sims can be cloned and spoofed but this is neither common nor legal).
If your cellphone is switched on, it connects to the nearest tower. Your telecom service provider knows your “coarse” location, 24x7. If you keep GPS switched on, your exact location is known. Aarogya Setu uses this data. It is also available if Bluetooth is switched on. In current android systems, location data is also logged if you search for open WiFi networks, so leaving WiFi on also gives away location. In addition, hardware beacons in malls, and radio-frequency identification, or RFID, smart cards (such as Metro or toll cards) offer location data. Ad servers also know the locations of where they served ads.
This data can be picked up several ways. There are large complex markets for anonymised data. Anonymised data can, with the help of a little additional information, often be de-anonymised and tagged to specific persons.
Many handsets also have “cloud backup” services offered by the manufacturer — that cloud server may be logging location. Many apps also ask for, and log location — Uber and Zomato are obvious examples. It is likely several apps you’re using log location data – check your permissions. If location data is overlaid on a digital map and tied to other information, we can make very accurate guesses as to what the user is doing 24x7.
In addition, there’s transaction data. Your credit card/ debit card services are (we hope) secure, along with net banking services. But if you use e-commerce, the e-commerce site or app also logs card details and stores that specific transaction. If the “collector” has access to that app, it has access to some of your card and transaction data, too.
Stunning insights occur when artificial intelligence (AI) searches for patterns in those data mountains. A lot of digital marketing involves using big data of this sort for micro-targeting. Netflix, YouTube and Amazon use algorithms to suggest the next video to watch, or the next movie, or the next book. Google targets ads according to the patterns it picks up, looking at email, searches etcetera.
Back in 2012, US retailer Target set off a huge row when it congratulated a customer on the impending birth of her child, on the basis of search and buy patterns. The customer was a minor who had bought items, which the store’s data crunching indicated are usually bought by expecting women, and whose parents didn’t know about her pregnancy!
A more recent example is a data analytics that found common elements in the colour palettes of pictures posted by depressed, suicidal people.
The extent and specificity of the data that can be mined is mind boggling. Let’s say you use your credit card in Thiruvananthapuram to order chilly beef fry. Since the bill is itemised, dogged data collectors can also access what, and not just where, you ate. Now imagine how vulnerable you’d feel travelling to a cow-vigilante area after that, if some smart data analytics program might be picking up on your diet. This is the kind of fear a malicious data collector could unleash.