"We demonstrated that the system was no worse than people on all the things we measured, and it was better in some categories," said Christopher Re, who guided the software development for the project while at the University of Wisconsin-Madison.
The development marks a milestone in the quest to rapidly and precisely summarise, collate and index the vast output of scientists around the globe, said first author Shanan Peters, a professor of geoscience at UW-Madison.
The knowledge produced by paleontologists is fragmented into hundreds of thousands of publications.
Yet many research questions require what Peters calls a "synthetic approach: For example, how many species were on the planet at any given time?"
More From This Section
Teaming up with Re, now at Stanford University, and UW-Madison computer sciences professor Miron Livny, the group built on the DeepDive machine reading system and the HTCondor distributed job management system to create PaleoDeepDive.
"Getting started required a million hours of computer time," said Peters.
"We extracted the same data from the same documents and put it into the exact same structure as the human researchers, allowing us to rigorously evaluate the quality of our system, and the humans," Peters said.
Computers often have trouble deciphering even simple-sounding statements, Re said.
"Information that was manually entered into the Paleobiology Database by humans cannot be assessed or enhanced without going back to the library and re-examining original documents. Our machine system, on the other hand, can extend and improve results essentially on the fly as new information is added," said Re.