The market for chatbots based on large language models (LLMs), the core software of a new artificial intelligence (AI) system, is growing as technology giants Google, Microsoft-backed Open AI, and Meta expand their services.
A key element of their expansion strategy is localised AI chatbots that support languages of a particular country. Last week, Google extended its Gemini app in India with support for nine Indian languages: Hindi, Bengali, Gujarati, Kannada, Malayalam, Marathi, Tamil, Telugu and Urdu.
Open AI’s ChatGPT already provides support for 10 Indian languages, besides English.
The strategy helps OpenAI, Google and other companies to gain traction for their AI-enabled conversational agents in a linguistically diverse country like India. They need to support local languages because English is understood by a few in the country.
AI fight
Apart from global technology giants, Indian startups have launched LLMs in local languages. Krutrim, Sarvam, HanoomanGPT are some of the popular ones besides government-led initiatives like Bhashini, Jugalbandhi by AI4Bharat and others.
The challenge for Indian AI companies is daunting as they have to compete with bigger players with unlimited resources.
“Hyper-scalers build and manage large-scale data centres, invest in AI-specific chips, and develop platforms closely integrated with these technologies. They operate services like search, social media, and e-commerce, leveraging vast amounts of human-generated data to train models. Indian LLM players definitely face significant challenges competing with Silicon Valley's advanced technology and talent,” said Paramdeep Singh, co-founder of Shorthills AI, a solutions platform for the technology.
India has the second largest user base of AI chatbots after the US, according to estimates.
Indian LLM firms say their global rivals have the advantage of resources but it would be challenging for them to understand the local context and design products accordingly.
“Global companies possess vast resources, extensive datasets, and cutting-edge technology, giving them a competitive edge over smaller Indian firms. They also enjoy market trust, regulatory ease, and the ability to attract top talent. However, local players have consistently found ways to make a significant impact,” said Vishnu Vardhan, founder of SML India, the parent company of AI platform HanoomanGPT.
Vardhan said that in India where there are 22 official languages and 85 per cent of the population does not speak English, a large, generic LLM from a technology giant like Google can only localise to a limited extent. “Despite advancements, services like maps often sound better in English than in Hindi or Telugu,” he said.
Indian companies can build LLMs that truly understand and replicate the way languages are used in real life, rather than relying on translation models. “By focusing on these hyper-localised LLMs, Indian companies can create solutions that resonate more deeply with the local population, expanding the market for generative AI and offering tailored, culturally relevant services,” he said.
Despite the challenge of giants like Google and Microsoft, Indian AI firms have sector-specific opportunities.
“My view is that there is space for multiple LLMs in a country like India. While there will be some global LLMs like ChatGPT and Gemini which will be all things to all people, there is plenty of scope to create LLMs and SLMs (small language models) focused on education, or health care, or even land records,” said Jaspreet Bindra, founder of Tech Whisperer, a technology consulting firm.
Knowing nuance
Indian AI companies should focus on dialects, cultural nuances and contextual understanding. “The key advantage that local players have here is the nuanced understanding of local dialects and meanings of different things to train, build and test these at scale,” said Rohit Pandharkar, partner, consulting, Generative AI, EY India.
Citing one example of linguistic diversity, Pandharkar said: “Marathi dialects in Konkan vs Central Maharashtra vs Vidarbha are very different and so will be the language constructs. Baking in these nuances in LLMs requires cultural context, access to local corpus of data and experts who can oversee at a high level what the models are getting trained on.”
Indian firms can do “hyperlocalisation” better as little public corpus and cultural understanding of nuances of languages is available digitally to train LLMs on. Training LLMs needs to go beyond purely crawling public data or just learning from user interactions data.
But as local datasets become digitised and synthetic data generation capabilities improve, the unique advantages local LLM players enjoy may diminish, said Pandharkar.
“Hence, in the longer run, Indian LLMs and SLMs should go beyond linguistic capabilities and collaborate with large enterprises to develop specialised LLMs tailored to specific industries. By integrating proprietary enterprise data, domain knowledge, and local language variations, they can create a compelling value proposition.”
Prashanth Kaddi, partner, consulting, at Deloitte India, said all AI companies are prone to challenges while catering to local languages. The challenge includes hallucination, when a LLM generates false information due to inaccurate or incomplete data.