Computer systems that predict actions would open up new possibilities ranging from robots that can better navigate human environments, to emergency response systems that predict falls, to virtual reality headsets that feed you suggestions for what to do in different situations.
Scientists at Massachusetts Institute of Technology (MIT) in the US have made an important new breakthrough in predictive vision, developing an algorithm that can anticipate interactions more accurately than ever before.
In a second scenario, it could also anticipate what object is likely to appear in a video five seconds later.
While human greetings may seem like arbitrary actions to predict, the task served as a more easily controllable test case for the researchers to study.
Also Read
Researchers created an algorithm that can predict "visual representations," which are basically freeze-frames showing different versions of what the scene might look like.
The algorithm employs techniques from deep-learning, a field of artificial intelligence that uses systems called "neural networks" to teach computers to pore over massive amounts of data to find patterns on their own.
Each of the algorithm's networks predicts a representation is automatically classified as one of the four actions - in this case, a hug, handshake, high-five, or kiss.
For example, three networks might predict a kiss, while another might use the fact that another person has entered the frame as a rationale for predicting a hug instead.
After training the algorithm on 600 hours of unlabelled video, the team tested it on new videos showing both actions and objects.
When shown a video of people who are one second away from performing one of the four actions, the algorithm correctly predicted the action more than 43 per cent of the time, which compares to existing algorithms that could only do 36 per cent of the time.