For the past half-decade, tremendous progress has been made in the field of artificial intelligence, or AI, especially as deep learning and machine learning algorithms have become more sophisticated, and more computing power and data have become available. However, this progress was largely tracked only by those genuinely interested in AI or digital technology in general.
Then, in November last year, OpenAI released ChatGPT and DALL-E and showed people what these could do with simple commands in natural language. Overnight, ChatGPT and AI became conversation starters in drawing rooms and corner offices. People rushed to try out ChatGPT and many were struck by its wondrous abilities. Many who had only a vague idea of AI until recently suddenly started discussing jargon such as generative AI, large language models (LLMs), generative adversarial networks (GANs) and heuristics.
The debate on AI’s abilities and dangers also gathered steam. One camp insisted that AI would only help achieve great things and solve many of mankind’s hitherto unsolved problems. The other camp painted doomsday scenarios — talking about how AI could be a bigger danger to humanity than even climate change.
Many of the AI debates in India, however, are not focusing on the most critical issue — the question of data, its veracity, the intellectual property concerns, and above all, data privacy.
Shorn of all technical jargon, most of today’s AI programs and algorithms that are making waves typically learn by crunching very large data sets. Think of any person trying to learn a subject. The more content they consume on the subject and revise, the better they get. The AI algorithm learns exactly like that. And that is why the data or content it can learn from becomes absolutely necessary for its success or failure. (There are other types of AI too but deep learning/machine learning and LLMs are the ones that are generating buzz currently.)
Generative AI programmes such as OpenAI’s ChatGPT and DALL-E or Google’s Bard need humongous quantities of data before they can do tricks they do to impress you. These LLMs are pre-trained on vast quantities of textual (or sometimes image) data to be able to generate text and images in a human-like fashion. The better and larger the data, the better the program will learn, assuming the researchers have got their algorithm right. That is why good, high quality data is critical for the success of the large language models and Generative AI programs that are making so much news every single day.
Even before OpenAI and ChatGPT became household words, and before LLMs and Generative AI came to public consciousness, policymakers worldwide had started worrying about data privacy issues of citizens, given how Big Tech was vacuuming up data by giving away services for free, or at a throwaway price. For example, Google’s search remains free because it is making more than enough money from advertising that is sold based on data it is collecting from users. Ditto for Facebook, Instagram, Twitter, and the other social media platforms. Even those who do not give services free — Amazon or Apple, for instance — collect every kind of data every time you are using them.
This was why in many countries, data privacy laws were enacted and refined to deal with data issues. Following the recent progress on the Generative AI front, the issue has become even more critical. And that is why countries ranging from China to the EU are now talking of regulating AI — and a big part of that regulation includes ensuring safety and proper precautions in handling data of its citizens. The EU Data Law, for instance, is one of the most comprehensive data regulations that puts the citizen’s privacy and protection at its heart. It will also be supplemented with a detailed AI regulation, which is close to being finalised.
India is probably the second biggest generator of digital data currently — after China. Our population along with the government’s thrust on broadband access and digital services have made that possible. Equally, we have been one of the laggards in setting rules that can protect data privacy. Multiple drafts of a data protection and privacy law have suffered from poor drafting. One more draft is ready now and is scheduled to be tabled in Parliament soon.
In terms of regulating AI, the policymakers in the country seem to be following a “wait and watch” approach unlike many other countries that have been proactive. Indian policymakers do not seem unduly worried about the dangers posed to Indian citizens because their data is largely unprotected.
The fact that India needs to formulate and pass its data law as soon as possible is not even a debate. But passing the law is only the first step. Unlike Western countries, India’s institutional capacity to enforce laws has been low. In the digital arena, too, ensuring that a law is passed will not help if the capacity to enforce it properly is not built alongside. Both have become priorities that cannot be delayed any more.
The writer is former editor of Business Today and Businessworld, and founder of Prosaic View, an editorial consultancy