Until now, viruses have been difficult to classify, said Gustavo Caetano-Anolles, professor at the University of Illinois, who led the new analysis.
Part of the confusion stems from the abundance and diversity of viruses. Less than 4,900 viruses have been identified and sequenced so far, even though scientists estimate there are more than a million viral species.
The new study focused on the vast repertoire of protein structures, called "folds," that are the structural building blocks of proteins, giving them their complex, three-dimensional shapes.
The researchers analysed all of the known folds in 5,080 organisms representing every branch of the tree of life, including 3,460 viruses.
Also Read
Using advanced bioinformatics methods, they identified 442 protein folds that are shared between cells and viruses, and 66 that are unique to viruses.
"This tells you that you can build a tree of life, because you've found a multitude of features in viruses that have all the properties that cells have," Caetano-Anolles said.
In fact, the analysis showed genetic sequences in viruses that are unlike anything seen in cells, Caetano-Anolles said.
This contradicts one hypothesis that viruses captured all of their genetic material from cells. This and other findings also support the idea that viruses are "creators of novelty," he said.
Using the protein-fold data available in online databases, the researchers used computational methods to build trees of life that included viruses.
The data suggest "that viruses originated from multiple ancient cells, and co-existed with the ancestors of modern cells," the researchers said.
Some scientists have argued that viruses are nonliving entities, bits of DNA and RNA shed by cellular life.
But much evidence supports the idea that viruses are not that different from other living entities, Caetano-Anolles said.
The study was published in the journal Science Advances.