The researchers at the Cornell University in New York call their project "RoboWatch."
There is a common underlying structure to most how-to videos and there is plenty of source material available, researchers said.
YouTube offers 180,000 videos on "How to make an omelet" and 281,000 on "How to tie a bowtie."
By scanning multiple videos on the same task, a computer can find what they all have in common and reduce that to simple step-by-step instructions in natural language.
More From This Section
The work is aimed at a future when we may have "personal robots" to perform everyday housework - cooking, washing dishes, doing the laundry, feeding the cat - as well as to assist the elderly and people with disabilities, researchers said.
A key feature of the system is that it is "unsupervised," said Sener who collaborated with colleagues at Stanford University, where he is currently a visiting researcher.
In the new method, a robot with a job to do can look up the instructions and figure them out for itself.
Faced with an unfamiliar task, the robot's computer brain begins by sending a query to YouTube to find a collection of how-to videos on the topic.
The algorithm includes routines to omit "outliers" - videos that fit the keywords but are not instructional.
Using these markers it matches similar segments in the various videos and orders them into a single sequence. From the subtitles of that sequence it can produce written instructions.
In other research, robots have learned to perform tasks by listening to verbal instructions from a human. In the future, information from other sources such as Wikipedia might be added.
The learned knowledge from the YouTube videos is made available via RoboBrain, an online knowledge base robots anywhere can consult to help them do their jobs.