Humans intuitively understand how the world works, which makes it easier for people, as opposed to machines, to envision how a scene will play out.
However, objects in a still image could move and interact in a multitude of different ways, making it very hard for machines to accomplish this feat.
The new deep-learning system is able to trick humans 20 per cent of the time when compared to real footage.
When the researchers asked workers on Amazon's Mechanical Turk crowd-sourcing platform to pick which videos were real, the users picked the machine-generated videos over genuine ones 20 per cent of the time, 'Live Science' reported.
The approach could eventually help robots and self-driving cars navigate dynamic environments and interact with humans, or let Facebook automatically tag videos with labels describing what is happening, researchers said.
Disclaimer: No Business Standard Journalist was involved in creation of this content