Artificial intelligence has changed the perception of what machines can do. From replying to queries to creating art, the limits of imagination have been pushed far.
Adding to all this is the new conversation AI, Hume, which comes with emotional intelligence.
Hume.ai, a New York-based research lab and technology, calls its conversational AI as Empathic Voice Interface (EVI).
The conversational chatbot can differentiate 28 types of vocal expressions including disappointment and disgust, excitement, fear, confusion and even anger, among others.
Users can interact with it while asking questions, requesting recommendations or expressing their frustration in a way they would do to a friend or a loved one.
In return, Hume’s audio responses are characterised by an uncanny resemblance to human emotions visible in its voice tonalities, its thoughtful vocalised pauses, and even the admission of guilt for shortcomings. It comes with conversational AI services that are now enhanced by emotions or what the company calls an ‘empathetic AI’.
“Emotional intelligence is the missing ingredient needed to build AI systems that proactively find ways to improve your quality of life,” reads the company’s vision statement. Hume did not respond to queries sent by Business Standard.
The company’s empathetic voice interface is an API that is powered by its empathic large language model (eLLM). It is capable of understanding and emulating tones of voice, emphasis on words, among others aimed at improving conversations between an AI and a human.
Measuring human emotions
There is a clear use case for models like Hume. One is customer care. Hume can not only bring in the much-needed ‘human emotions’ to the calls but also act as a therapist.
We at Business Standard tried Hume.ai and found that it can accurately identify the emotions of the person talking to the machine.
“Conversational internet is a big new wave of innovation that will happen, and conversational AI is a big driver for it. What we see now is that the foundational large language models (LLMs) are coming out, and multiple companies are going after it,” said Gaurav Kachhawa, chief product officer (CPO) of Gupshup, a San Francisco-based conversational messaging company.
He explains that technology players can fine tune emotions in conversational AI on top of basic capabilities in an LLM.
“There is a base capability in an LLM, and the more you train it on data, different samples, and see how instructions are followed, you can work on top of it. Players like us can add value by fine tuning emotions on the basic foundational models,” Kachhawa added.
Building context around a situation
Meanwhile, to progressively improve on expressing emotions, voice-enabled AI services need to rely on the context they have been trained on.
“If one builds a long-term memory of what the user has been seeing and doing, one will be able to build better context around a situation. What is changing with technologies like generative AI is that we are moving away from hard coded list of scenarios and rules, to providing history and helping the model derive an emotive response. The response is also more empathetic to the user’s situation,” said Abhijit Khasnis, chief technology officer (CTO) of Healthify, an AI health and fitness app.
This way, he said, we are no longer hard-coding or algorithmically inserting in context for the system.
“We are showing a series of events to a machine learning model, letting it understand what this means, and then generate an engaging response that is emotionally aware,” Khasnis told Business Standard.
Conversational AI and Indic languages
India has a list of 22 official languages, with dialects changing from place to place in particular regions.
The current models are built on widely spoken languages such as English and Spanish. The challenge with Indian languages is that the base data is already small.
“The challenge with our languages and dialects is that the data set is itself pretty small. We need to figure out how we can transfer the learnings from languages like English, Spanish which are widely spoken to understand these languages,” Khasnis said.
Kachhawa from Gupshup agrees with Khasnis’s view. “It is necessary to bring India-specific local models. While dialects are a challenge, we also have to figure out how to collect data, how to refine it, train the models,” he said.
Despite these challenges, he said the cost of adoption of conversational AI will decline in the future as local Indian languages get better.
“The Indic languages will keep getting better and better for AI, and costs will go down since open source models are coming up. This brings down the cost of technology and eventually adoption picks up,” he added.
Use cases of ‘empathetic AI’
Brand-specific conversational avatars can be deployed to engage with customers and answer queries as the technology picks up pace. This includes professionals or sectors such as therapy, healthcare, call centres, education, among others.
“Bringing tonality and emotions to such conversations is the problem that we have been solving. When you have a bot that is representing a brand, it needs to be the company’s voice. It becomes necessary to understand what the right levels of tonality and emotionality are,” Kachhawa said.
Meanwhile, for example, Healthify’s AI-enabled nutritionist Ria interprets conversations in terms of diet plans, and health and fitness journeys. “What you are seeing as a result is much greater engagement with Ria. Depth of user conversations has increased by four times,” Khasnis said.
He added that the next evolution is something like Hume which connects on an emotional level. On that note, he explained that tech tied up with impact-based actions will help conversational AI services that exhibit emotions to serve the last mile.