OpenAI’s Sora – Introducing the Next Generation of AI Advancement 2024


OpenAI's Sora
OpenAI logo. REUTERS/Dado Ruvic/Illustration/File Photo


OpenAI is teaching AI to comprehend and replicate the dynamics of the physical world in motion, aiming to develop models that helps people to solve problems in which real-world interaction is necessary.

Introducing Sora, a system capable of transforming text into videos, thereby enhancing the convenience of real-world interaction. Sora can produce videos up to one minute in length, ensuring high visual quality and fidelity to the user’s input.

According to OpenAI, Sora has the ability in “complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background”. The model can understand how objects “exist in the physical world”, and “accurately interpret props and generate compelling characters that express vibrant emotions,” according to OpenAI’s blog post.

Nevertheless, OpenAI has advised that the model is still far from flawless and may encounter difficulties with more intricate prompts. Prior to its public launch, OpenAI will initiate an outreach program involving security experts and policymakers to mitigate the risk of Sora generating misinformation and hateful content, among other concerns.

Why Sora can be special?

While the generation of images and textual responses to prompts on GenAI platforms has notably improved in recent years, the conversion of text to video has largely lagged behind. This is due to the added complexity of analyzing moving objects in a three-dimensional space.

According to OpenAI “the model has a deep understanding of language, enabling it to accurately interpret prompts and generate compelling characters that express vibrant emotions. Sora can also create multiple shots within a single generated video that accurately persist characters and visual style”.

Is Sora available for everybody to use?

Sora is not available yet to use for everybody. The company has announced plans to implement “safety steps” prior to integrating Sora into OpenAI’s products. It will collaborate with red teamers- domain experts in such areas like misinformation, hateful content, and bias—who will conduct “adversarial” tests on the model.

The company is granting access to their program to a number of designers, filmmakers and visual artists to gather input on how to enhance the model to be most beneficial for creative professionals. “We’re also building tools to help detect misleading content such as a detection classifier that can tell when a video was generated by Sora. We plan to include C2PA metadata in the future if we deploy the model in an OpenAI product,” OpenAI said.

The company states that it will utilize the existing safety protocols in its products that utilize DALL·E 3, which are also relevant to Sora.

“Once in an OpenAI product, our text classifier will check and reject text input prompts that are in violation of our usage policies, like those that request extreme violence, sexual content, hateful imagery, celebrity likeness, or the IP of others. We’ve also developed robust image classifiers that are used to review the frames of every video generated to help ensure that it adheres to our usage policies, before it’s shown to the user,” according to OpenAI.

Check out about robotics in everyday life.

The company will also interact with policymakers, educators, and artists worldwide to “understand their concerns and to identify positive use cases for this new technology. Despite extensive research and testing, we cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it.”

Are there any obvious shortcomings of the model?

OpenAI acknowledges that the current model of Sora has weaknesses. It may encounter challenges in accurately simulating the physics of a complex scene and understanding specific instances of cause and effect. For instance, while a person might be depicted taking a bite out of a cookie, the resulting image may not accurately show the bite mark on the cookie.

“The model may also confuse spatial details of a prompt, for example, mixing up left and right, and may struggle with precise descriptions of events that take place over time, like following a specific camera trajectory,” according to OpenAI.


Leave a comment