Business Standard

Don’t miss the latest developments in business and finance.

AI ecosystem for India

One of the challenges faced by AI researchers in India is the limited availability of reliable and accurate public data across the diversity of Indian languages to be able to train such language model

Bs_logoAI
Illustration: AJAY MOHANTY
Shashi Shekhar Vempati
5 min read Last Updated : May 10 2023 | 9:44 PM IST
While researching for my recent book, Collective Spirit, Concrete Action, focused on Prime Minister Narendra Modi’s Mann Ki Baat, I stumbled on a substantial body of academic papers published in peer reviewed journals by artificial intelligence (AI) researchers across India who relied on the Mann Ki Baat corpus of text to train, test and improve machine learning models. From IIT Kanpur to IIIT Hyderabad, institutions across the country looking to develop artificial intelligence capabilities for natural language processing of Indian languages have found in the Mann Ki Baat corpus a rich and diverse dataset of substantial value in the development of ChatGPT-like language models for India.

One of the challenges faced by AI researchers in India is the limited availability of reliable and accurate public data across the diversity of Indian languages to be able to train such language models. To facilitate such efforts, Prasar Bharati had some time back made its corpus of audio-visual archives across Indian languages available to researchers at IIT Kanpur to develop artificial intelligence capabilities that can mine and learn from the archives. This, however, is a small subset of the vast knowledge pool that exists across India but remains poorly digitised and mostly inaccessible to researchers and AI developers.
Recognising this critical gap, a call to action was issued earlier in March, along with Dr Vivek Raghavan of EkStep Foundation, on the need for a public-private effort towards the development of an Indic large language model (LLM). This call to action emerged from a nascent effort to create a forum for conversations around artificial intelligence from an Indian perspective (AI4India.org), bringing together a diversity of stakeholders on a common platform. A similar notable effort in Bengaluru was the peopleplus.ai initiative by Nandan Nilekani, which sought to draw AI developers together to identify applications of public importance to India.
 
An important learning from the conversations within the AI4India.org forum has been how AI developers across the country are challenged by the steep costs associated with the high-performance computing infrastructure required for the development of ChatGPT-like LLMs. While academic and research institutes across India may procure between five and ten graphics processing units (GPUs) for their on-premise training, these are siloed in each institute and are not available as a cluster to the wider pool of smaller AI developers across the country. If the pool of computing power invested in by each institute is creatively combined and leveraged over the cloud, the cumulative computing power of GPUs could open up a lot of possibilities to train large deep learning models in India for the benefit of a wider pool of AI developers across the country.
 
A public initiative incentivising private cloud providers in India to make accessible GPU computing power in the form of credits to AI developers and researchers could go a long way in spurring the AI ecosystem in the country. To address the availability and accessibility of datasets, which are key to the development of large models by both academia and the AI start-up industry in India, there is a need to unlock the data within government and other public organisations, and make it accessible for research and building applications after suitably anonymising the datasets. Such a public pool of open datasets will be the springboard for AI innovation emerging from India, which at the moment is playing catch-up with the developments in AI globally, from the US to China. Global LLMs, which have predominantly been trained on datasets outside India, can also use this data to ensure the Indian context is included to a greater extent in future versions.
 
It is important that partnerships and investments find their way for the development of a much-needed vibrant AI technology ecosystem in India that can strike a balance between commercial applications and social applications as a public good. This need becomes acute given the exponential pace at which AI is advancing to further widen the digital divide between the traditional and the more tech-savvy new-age sectors of the economy. Democratising access to AI capabilities becomes essential so that application developers can bring the benefits of AI to a wide range of sectors of social importance such as healthcare, education and agriculture apart from finance.
 
While AI4India.org and PeoplePlus.Ai are the beginning of such a long overdue public discourse on AI in India beyond the confines of the technology industry, an effective public-private enabler for nurturing the AI ecosystem is yet to emerge. A possible approach to realising such an enabler could be the creation of a cloud-based sandbox where datasets can be loaned, models can be openly put to test and a wide pool of applications of public and commercial interest to India can emerge, leveraging innovative frugal approaches to high-performance computing.
 
The rich pool of Indian start-ups focused on AI has devised uniquely innovative approaches to overcome the hurdles and limitations on account of their limited scale. This has, however, not deterred them from developing pathbreaking solutions in a wide range of areas, from applying computing vision 
 
to common industrial problems in manufacturing and retail to leveraging open-source models to accelerate drug research and development. This AI start-up ecosystem needs active nurturing if India is to reap exponential benefits from the exponential advancement being witnessed in AI technology development outside the country. The AI sandbox as a digital public infrastructure for AI innovation at a scale of billion is an essential enabler to ensure the same.
 
The author is former CEO of Prasar Bharati

More From This Section

Disclaimer: These are personal views of the writer. They do not necessarily reflect the opinion of www.business-standard.com or the Business Standard newspaper

Topics :Artificial intelligenceBS Opinion

Next Story