Search engine major Google has been granted an Indian patent for an invention related to an informational retrieval system that uses phrases to index, search, rank and describe documents in the document collection. The system is adapted to identify phrases that have sufficiently frequent and distinguished usage to indicate that they are valid or good phrases.
Google claims the system avoids the problem of having to identify and index every possible phrase resulting from the combination of all of the possible sequences of a given number of words.
Google had filed the patent application titled ‘phrase identification in an information retrieval system’ in 2005 at the Kolkata patent office. Nirmalya Sinha, deputy controller of patents & design, Kolkata, while granting the patent observed the technical solution, that is, the index consisting of good phrases that enables documents with related concepts to be provided in response to a query, is a technical advancement over the prior art.
According to a patent document filed by Google, the phrase spotting operation identifies good and bad phrases in the document collection that are useful to indexing and searching documents.
Good phrases are phrases that tend to occur in more than certain percentage of documents in the document collection and are indicated to as having a distinguished appearance in such documents. Another aspect of good phrases is that they are predictive and are not merely sequences of words that appear in the lexicon, the company submitted.
Citing example to describe how the invention is useful, Google said the phrase ‘President of United States’ predicts other phrases such as George Bush and Bill Clinton. However, other phrases are not predictive, such as ‘fell down the stairs’ or ‘top of the morning’, ‘out of the blue’, since idioms and colloquisms like these tend to appear with different and unrelated phrases.
You may also like to watch this video
Arguing for the patent, the company said that the invention is neither a mathematical algorithm nor a computer programme per se, but provides a technical solution to a technical problem of how to automatically identify phrases in a document collection. It also helps to determine groups of phrases that identify or relate to a single concept. The end product, which is an index stored in a memory that includes valid phrases, is inventive, the company said.