Cleaning up Google Search: How search engine has weeded out fake news

Fake news and people trying to game search ranking is a problem as old as the internet itself. However, it has never been under the spotlight like it has been in 2017.

Written by Nandagopal Rajan

December 12, 2017 05:37 IST

Internet giants Google and Facebook both came under flak as their algorithms were gamed by vested interests to amplify false information from sources that lacked credibility. (Reuters)

Fake news and people trying to game search ranking is a problem as old as the internet itself. However, it has never been under the spotlight like it has been in 2017. Internet giants Google and Facebook both came under flak as their algorithms were gamed by vested interests to amplify false information from sources that lacked credibility. Facebook started hiring more hands to clean up its feeds, and Google went on a revamp of its algorithm. Interestingly, it wasn’t such a big problem for Google, if you go by the scale of the queries affected—it was just 2% of the search volume. Ben Gomes, Google’s V-P for Engineering and Search, says these were important queries nonetheless and “we were very bothered by it.” Google reacted by fighting back with changes to rater guidelines and its algorithm itself, with what it called Project Owl internally. “Whenever we make a change in the search algorithm, we show the 10,000-odd raters A and B and ask them which is better. We ask them to judge based on rater guidelines, which is essentially a description of what search does. Using these raters, we change the algorithm, hopefully making them better and better,” Gomes explains.

The problem was also because Google was, to an extent, under the impression that news would come from good sources. But the last US election campaign changed all that. Google usually asks its raters to look at the relevance of the result and how authoritative the source is. “Now we have asked them to weigh the authoritativeness of the source more than how exactly the words match,” Gomes says.

Google also made it easier to provide feedback on the results and started cracking the whip on sites publishing unverified content. In all, the search giant launched close to 2,000 changes last year, resulting in a huge improvement in the kind of queries people were seeing.

With so much content being generated to game the system, this is not a problem that will go away that easily. In fact, every day, about 15% of the queries on Google are those that have never been seen before. Also, there are millions of new documents and pages to scan. “There are millions of things that will happen … people will try to deceive the algorithm.” Gomes says that is nothing new for Google, as people have been trying to game the system right from the early days of page ranking. “It is not new in that sense, we are just taking new approaches to tackling the particular problem.”

Meanwhile, the nature of search and the results offered are also changing drastically because of smartphones. The long pages of links doesn’t seem to be what mobile users want. They want answers and that too in a hurry. This is why Google has been offering answers for everything from symptoms of dengue to the latest cricket score right in the search itself as featured snippets. Also, it is now able to offer answers in your feed even before you ask because it knows your interests over time.

The effort has been to offer a proper answer aided by concepts from the real world. “So we created the Knowledge Graph, which contains a billion people, places and things and about 70 billion connections between them. For instance, if you ask who is the prime minister of India, we know there is a current prime minister, what his name is, we know how he is connected to other people and can thus answer your question in a natural way,” Gomes explains.

In the future, machine learning is expected to play a significant role in improving the quality of results Google is able to show. Gomes says that many of the hundreds of signals in search are now becoming machine learnt. “Gradually, we would be using machine learning in many different parts of search, from language understanding to how we combine the various signals. There are different types of signals. PageRank being one. Then there are words on a page, fonts, who points to this page and what is their page rank and so on. There are many different signals that go into evaluating if this page is a good result for this query and better than another page. Each of these signals can have a machine learning component to it,” he explains.

This article was first uploaded on December twelve, twenty seventeen, at thirty-seven minutes past five in the morning.