MIT scientists have developed a new artificially intelligent, wearable system that can predict if a conversation is happy, sad or neutral based on a person's speech patterns and vitals.
"Imagine if, at the end of a conversation, you could rewind it and see the moments when the people around you felt the most anxious," said Tuka Alhanai, graduate student at Massachusetts Institute of Technology (MIT) in the US.
"Our work is a step in this direction, suggesting that we may not be that far away from a world where people can have an AI social coach right in their pocket," said Alhanai.
Also Read
As a participant tells a story, the system can analyse audio, text transcriptions and physiological signals to determine the overall tone of the story with 83 per cent accuracy.
Using deep-learning techniques, the system can also provide a "sentiment score" for specific five-second intervals within a conversation.
"As far as we know, this is the first experiment that collects both physical data and speech data in a passive but robust way, even while subjects are having natural, unstructured interactions," said Mohammad Ghassemi, PhD candidate at MIT.
"Our results show that it's possible to classify the emotional tone of conversations in real-time," Alhanai said.
The researchers say that the system's performance would be further improved by having multiple people in a conversation use it on their smartwatches, creating more data to be analysed by their algorithms.
Te system was developed with privacy strongly in mind: The algorithm runs locally on a user's device as a way of protecting personal information, researchers said.
Many emotion-detection studies show participants "happy" and "sad" videos, or ask them to artificially act out specific emotive states.
However, in an effort to elicit more organic emotions, the team instead asked subjects to tell a happy or sad story of their own choosing.
Subjects wore a research device that captures high-resolution physiological waveforms to measure features such as movement, heart rate, blood pressure, blood flow and skin temperature.
The system also captured audio data and text transcripts to analyse the speaker's tone, pitch, energy and vocabulary.
After capturing 31 different conversations of several minutes each, the team trained two algorithms on the data: One classified the overall nature of a conversation as either happy or sad, while the second classified each five-second block of every conversation as positive, negative, or neutral.
"The system picks up on how, for example, the sentiment in the text transcription was more abstract than the raw accelerometer data," said Alhanai.
"It's quite remarkable that a machine could approximate how we humans perceive these interactions, without significant input from us as researchers," Alhanai said.
Disclaimer: No Business Standard Journalist was involved in creation of this content