OpenAI reveals Sora, a tool that makes instant videos from written prompts

Sora has a 'deep understanding of language' and can generate 'compelling characters that express vibrant emotions', said OpenAI on Thursday

5 min read Last Updated : Feb 16 2024 | 12:14 PM IST

Not to be outdone by competitors such as Google, who recently showcased a text-to-video tool, AI startup OpenAI on Thursday unveiled its own text-to-video model, Sora.

Like Google's Lumiere, Sora has limited availability. Unlike Lumiere, Sora can create videos up to one minute long.

Text-to-video has become the latest arms race in generative AI, as OpenAI, Google, Microsoft, and others look beyond text and image generation to cement their position in a sector expected to generate $1.3 trillion in revenue by 2032 -- and to win over consumers who have been intrigued by generative AI since ChatGPT debuted a little more than a year ago.

In a blog post on Thursday, the company said that the maker of both ChatGPT and Dall-E, Sora, will be available to "red teamers", or experts in areas such as misinformation, hateful content and bias, who will be "adversarially testing the model." It will also be engaging with visual artists, designers and filmmakers to gain additional feedback from creative professionals. That adversarial testing will be especially crucial in addressing the potential for convincing deepfakes, a major area of concern for the use of AI to generate images and video.

In addition to garnering feedback from outside the organisation, the AI startup stated that it wants to share its progress now to "give the public a sense of what AI capabilities are on the horizon."

Strengths of Sora

One feature that may distinguish Sora is its ability to interpret long prompts, including one example that clocked in at 135 words. The example video published by OpenAI on Thursday shows Sora creating a wide range of characters and scenes, including people, animals, and fluffy monsters, as well as cityscapes, landscapes, zen gardens, and even New York City submerged underwater.

This is due in part to OpenAI's previous work with the Dall-E and GPT models. The text-to-image generator Dall-E 3 was released in September 2023. Sora, in particular, uses Dall-E 3's recaptioning technique, which, according to OpenAI, generates "highly descriptive captions for the visual training data.

"Sora is able to generate complex scenes with multiple characters, specific types of motion and accurate details of the subject and background," the post said. "The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world," it added.

The sample videos OpenAI shared on X (formerly Twitter) are very lifelike, with the exception of a close-up of a human face and sea critters swimming. Otherwise, you may have difficulty distinguishing between what is real and what is not.

Similar to Lumiere, the model can generate video from still photos, extend existing videos, and fill in missing frames.

"Sora serves as a foundation for models that can understand and simulate the real world, a capability we believe will be an important milestone for achieving AGI," the post added.

AGI, or artificial general intelligence, is a more advanced version of AI that is closer to human-like intelligence and has the ability to perform a greater range of tasks. Meta and DeepMind have also shown an interest in reaching this benchmark.

Sora's weaknesses

OpenAI admitted that Sora has weaknesses, like struggling to accurately depict the physics of a complex scene and to understand cause and effect.

"For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark," the post said.

And anyone who still has to make an L with their hands to figure out which one is left can take heart: Sora also mixes up left and right.

Is OpenAI's Sora available to the public?

OpenAI did not specify when Sora will be widely available but noted it wants to take "several important safety steps" first. That includes adhering to OpenAI's existing safety standards, which prohibit extreme violence, sexual content, hateful imagery, celebrity likeness and the IP of others.

"Despite extensive research and testing, we cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it," the post said, adding, "That's why we believe that learning from real-world use is a critical component of creating and releasing increasingly safe AI systems over time."

Already subscribed? Log in

Subscribe to read the full story →

^*Subscribe to Business Standard digital and get complimentary access to The New York Times

Smart Quarterly

₹900

3 Months

₹300/Month

SAVE 25%

Smart Essential

₹2,700

1 Year

₹225/Month

SAVE 46%

*Complimentary New York Times access for the 2nd year will be given after 12 months

Super Saver

₹3,900

2 Years

₹162/Month

Renews automatically, cancel anytime

Here’s what’s included in our digital subscription plans

Exclusive premium stories online

Over 30 premium stories daily, handpicked by our editors

Complimentary Access to The New York Times

News, Games, Cooking, Audio, Wirecutter & The Athletic

Business Standard Epaper

Digital replica of our daily newspaper — with options to read, save, and share

Curated Newsletters

Insights on markets, finance, politics, tech, and more delivered to your inbox

Market Analysis & Investment Insights

In-depth market analysis & insights with access to The Smart Investor

Ad-free Reading

Uninterrupted reading experience with no advertisements

Seamless Access Across All Devices

Access Business Standard across devices — mobile, tablet, or PC, via web or app