Fear-mongering about Artificial Intelligence has become commonplace as AI adoption has spread. In mid-February, there was a storm when an open-source AI developer postponed putting one of its products into public domain due to concerns of misuse.
OpenAI is a San-Francisco based non-profit organisation with a hundred-person team and funding from billionaires Vinod Khosla, Elon Musk, Peter Thiel, etc. Its mission is to “ensure that Artificial General Intelligence (AGI) benefits all of humanity”, as and when AGI arrives.
The OpenAI definition of AGI is “highly autonomous systems that outperform humans at most economically valuable work”. OpenAI has released papers on AI systems that achieve superhuman gamescores, and trained robotic hands to hold and manipulate objects with dexterity, etc.
The programme in question was a text generator called GPT-2, trained through unsupervised machine learning (ML). This sounds anti-climactic. AI is extensively used in weapons that actually kill people, and autonomous vehicles have already logged quite a few fatal accidents. The hype around GPT-2 was perhaps, over the top. OpenAI has just postponed the full release by six months, while publishing a technical paper and explaining why they are withholding certain critical details.
What is GPT-2 and why is it so scary? OpenAI put together a database of millions of “3 karma” Reddit social media posts (Reddit “karma” is the equivalent of FB “likes”). Then it gave this 40Gb database to its AI model and told the AI to learn how to learn to predict the next (most probable) word.
We use text generators every day when we type email and SMS. But GPT-2 taught itself to perform tasks like question answering, reading comprehension, summarisation, and translation, from the raw text, without any task-specific training. This is “unsupervised learning” in ML terminology. GPT-2 searches for normal speech / writing patterns on its own. It used 10x as many resources as GPT (the original AI developed by OpenAI).
This is a serious improvement on the usual text generator, which may be able to come up with the next word in a sentence, but produces gibberish, if asked to do this in sustained fashion. OpenAI claims GPT-2 can “generate coherent paragraphs and perform rudimentary comprehension, machine translation, question answering, and summarisation — all without task-specific training”.
The full model is being withheld due to “concerns about the malicious applications of this technology”. In a blog post by the OpenAI Policy Director, Jack Clark, the decision is explained as “Due to concerns about large language models being used to generate deceptive, biased, or abusive language at scale, we are only releasing a much smaller version of GPT-2 along with sampling code. We are not releasing the dataset, training code, or GPT-2 model weights.” The smaller model is less capable of producing coherent text to fool readers. This decision, and the discussion it has sparked off, are being categorised as “experimental” by OpenAI.
GPT-2 displays broad capabilities. It has good scores on standard language and comprehension tests. It can also generate conditional synthetic text samples of “human” quality, if given an input and asked to expand.
One of the samples should explain why OpenAI is worried about malicious use. GPT-2 was told to expand on the human-written statement “Recycling is good for the world, no, you could not be more wrong”. It responded: “Recycling is NOT good for the world. It is bad for the environment, it is bad for our health, and it is bad for our economy. Recycling is not good for the environment. It is destructive to the earth and it is a major contributor to global warming. Recycling is not good for our health. It contributes to obesity and diseases like heart disease and cancer. Recycling is bad for our economy. It increases the cost of a product, and in turn, the price of everything that is made with that product. We pay a tremendous price for the privilege of having the world’s most advanced and efficient recycling system. Recycling is a huge, colossal waste of time, energy, money, and resources.”
Ok, it’s nonsense. But it’s not gibberish. It’s the sort of nonsense, a human with some education, might write. GPT-2 can indeed, generate scary samples of fake news, and it could easily do this at scale.
OpenAI has been criticised for postponing the release and the scare mongering about this program may be over-stated. Other developers will surely be able to produce text generators of similar quality. But the debate could be useful and it may help us develop some filters for flagging fake news.
To read the full story, Subscribe Now at just Rs 249 a month
Disclaimer: These are personal views of the writer. They do not necessarily reflect the opinion of www.business-standard.com or the Business Standard newspaper