Home / India News / Hate speech-detecting AIs easily fooled by humans: Study

Hate speech-detecting AIs easily fooled by humans: Study

Last Updated : Sep 16 2018 | 6:10 PM IST

Artificial intelligence (AI) systems meant to screen out online hate speech can be easily duped by humans, a study has found.

Hateful text and comments are an ever-increasing problem in online environments, yet addressing the rampant issue relies on being able to identify toxic content.

Researchers from Aalto University in Finland have discovered weaknesses in many machine learning detectors currently used to recognise and keep hate speech at bay.

Many popular social media and online platforms use hate speech detectors. However, bad grammar and awkward spelling -- intentional or not -- might make toxic social media comments harder for AI detectors to spot.

The team put seven state-of-the-art hate speech detectors to the test. All of them failed.

Modern natural language processing techniques (NLP) can classify text based on individual characters, words or sentences. When faced with textual data that differs from that used in their training, they begin to fumble.

More From This Section

UP govt to make secondary education 'model for quality': Dy CM

Haryana gang rape: Khattar summons DGP, Rewari SP shunted out

Chair umpire Ramos hands Cilic warning for slamming racket

Three Indonesian hostages freed in Philippines: army

"We inserted typos, changed word boundaries or added neutral words to the original hate speech. Removing spaces between words was the most powerful attack, and a combination of these methods was effective even against Google's comment-ranking system Perspective," said Tommi Grondahl, a doctoral student at Aalto University.

Google Perspective ranks the 'toxicity' of comments using text analysis methods. In 2017, researchers from the University of Washington showed that Google Perspective can be fooled by introducing simple typos.

Researchers have now found that Perspective has since become resilient to simple typos yet can still be fooled by other modifications such as removing spaces or adding innocuous words like 'love'.

A sentence like 'I hate you' slipped through the sieve and became non-hateful when modified into 'Ihateyou love'.

The researchers note that in different contexts the same utterance can be regarded either as hateful or merely offensive.

Hate speech is subjective and context-specific, which renders text analysis techniques insufficient as stand-alone solutions.

The researchers recommend that more attention be paid to the quality of data sets used to train machine learning models -- rather than refining the model design.

The results indicate that character-based detection could be a viable way to improve current applications, they said.

Disclaimer: No Business Standard Journalist was involved in creation of this content

Also Read

Facebook bets big on Artificial Intelligence

Explore News

Market LIVE Parliament Winter Session 2024 LIVE Latest News LIVE Market Today Stocks To Watch Today Mobikwik IPO listing IGI IPO Allotment IND vs AUS 3rd Test Day 5 LIVE IPO News Business Standard at 50

Hate speech-detecting AIs easily fooled by humans: Study

More From This Section

UP govt to make secondary education 'model for quality': Dy CM

Haryana gang rape: Khattar summons DGP, Rewari SP shunted out

Chair umpire Ramos hands Cilic warning for slamming racket

Three Indonesian hostages freed in Philippines: army

Also Read

Facebook bets big on Artificial Intelligence

'AI to help unravel mystery of human brain'

Will you adopt new technologies in future? Ask your health

UK wants AI sans power to destroy human beings

'AI to help predict side effects of drug cocktails'

Explore News