A new way to review scientific literature is being tested.
- By The Economist
How do you measure progress? Kyle Van Houtan, an ecologist at the Monterey Bay Aquarium, in California, found himself asking this when he faced the task of working out whether methods of boosting the populations of endangered species in the wild have improved over the years.
Usually, those keen on studying the effectiveness of research write reviews of scientific literature. In a flourishing field, though, this may involve reading and extracting information from hundreds, possibly thousands, of papers. That requires a large team, and brings problems of coordination. Van Houtan, therefore, wondered whether getting computers to do the heavy lifting might help.
It does. His study on the matter, published in ‘Patterns’, tapped into a branch of machine learning called natural-language processing. He and his colleagues identified five existing natural-language-processing systems, and used them to search the abstracts of 4,313 papers on species-conservation projects published over four decades. The software’s task was to look for words associated with success, such as ‘protect’, ‘support’, ‘help’, ‘benefit’ and ‘growth’, and also words associated with failure, like ‘threaten’, ‘loss’, ‘kill’, ‘problem’ and ‘risk’. Different words had different values attached to them, depending on how positive or negative they were felt to be by the original model-makers.
In total, the team analysed 1,030,558 words. They found that in papers published in the 1980s, when conservation science was in its infancy, terms from the negative list were much more common than those from the positive one. During the past decade, by contrast, terms associated with success became more frequent. Average sentiment scores increased during the study period by 140%.
That is encouraging news for conservationists. It suggests that their methods are working in general, and are improving with experience. But more detailed analysis was also possible. Giant pandas, which numbered 1,864 when censused in 2014 and had their status upgraded from ‘endangered’ to merely ‘vulnerable’ in September, have seen the sentiment of the literature about them swing from negative to positive in a matching way. Papers on the California condor, by contrast, remain littered with negative sentiments even though its numbers have risen, according to a census in 2016, from an extinction-threatening 22 to 446. But only 276 of those birds were wild, and so the condor is still listed as ‘critically endangered’.
Whether Van Houtan’s method might be generalised to other fields of science is debatable. Conservation is, at bottom, an emotion-driven activity. People care about the results in a way that goes beyond professional amour propre. That researchers’ sentiments show up in their choice of words is little surprise, and might well not be true elsewhere. But the fact that Van Houtan has been able to use natural-language processing to expand the pool of papers which can be taken into a review from the hundreds to the thousands suggests that others might benefit from having a look at his achievement.