The technology that can throw up a suggestion is central to the efforts by internet companies to personalise user experience and target advertisements as people read articles or listen to music online. The algorithms that power Recommender Systems, as they are called, have been around for over 15 years, driven initially by the likes of e-commerce companies such as Amazon. From the basic problem of recommending just one item to one user, the focus of research is on more sophisticated applications, Ronny Lempel, chief data scientist at Yahoo! Labs was in Bangalore recently for the Yahoo! Big Thinkers India series lecture. He tells Ajay Sukumaran that the key problems researchers are looking at involve the ability to suggest sequences, frame the user?s context, find people with similar interests and, finally, recommend to people things their friends have experienced.

As chief data scientist, what are your key focus areas?

It is about looking at the areas where Yahoo! Labs interacts with our product groups to develop our Yahoo! products. A lot of it has to do with personalisation and ad targetting, which has a big reliance on recommendation technology. Machine learning, natural language processing and search technologies, are all aspects that are important as to me as chief data scientist. Beyond that, I interact also with our systems teams to define what the next generation of big data computational systems might look like or what is missing in our current generation of systems.

How personalised can recommendations get? Can suggestions be specific to context such as a particular time of day or even a user?s moods?

Everybody is talking about that as something to work on in the future. It?s the same with TV viewership. If you have a TV device, the viewerships patterns on that TV in the evenings might be very different from the pattern in midday. So mood or time of day or situation? obviously we cannot always anticipate what it is you are doing when you want to listen to music; are you going out for a run or are you reading a book??So perhaps, interfaces might actually solicit feedback from the user. Systems will build upon those cues to then deeply personalise not only to you but to your current context.

What have been the key progress on developing algorithms for these tasks?

Netflix, five or six years ago, had a challenge to the scientific community of improving their own recommender systems. So Netflix released 100 million ratings of users for movies and they offered $1 million to scientific teams that would improve their current recommender technology by 10%. And, that was a magnet to many people to try to win that. And hundreds of researchers who may have been in similar fields, but not focussed on recommender technology, suddenly looked at these problems in the field and started to make contributions. So new algebraic computational methods came into play, greater emphasis on precision on their systems and ensemble learning. And all that just advanced the state of the art in the systems and it got to the point where the vanilla recommendation problems are pretty much solved to some extent. But now there are domain specific issues, like sequence and set and the problems of cold starts. A cold start is when a new user comes in who hasn’t interacted with the system before. How quickly can the system learn enough about the user to start giving personalised recommendations to that user, are at the forefront of research.

What are you doing to push the barriers there?

This is exactly where the big data buzzword comes in because one of the key elements that improve recommendation technology is called collaborative filtering. Finding out people like you, who you may not know, they may live in another part of the globe and you may never interact with them but they exhibit consumption patterns of content that are very similar to you. Now, the wider we can spread the net in terms of how many people use the systems, and Yahoo! has hundreds of millions of users, the likelihood that some of those will show the same interests that you share becomes high. The more users we have, the more the likelihood of someone like you being somewhere in that crowd. It is a matter of scale.

What are some of the key problems people are trying to solve, in terms of applications?

Today many sites are ranking streams, FaceBook, Twitter, Google+, and Yahoo! Homepage. In terms of products, all of these companies are ranking streams of items and one of the challenges everybody is facing, and everybody is trying to solve, is how to merge typically 2-3 types of signals. One is content-based signals. The second is the collaborative filtering aspects. And then there is the explicit social signal.

What would motivate you more to consume an item? If we told you a bunch of folks you never met like it, or if we tell you that two of your friends saw this movie yesterday, read this post or liked this song. Obviously once we can relate to you the notion that your close social circle enjoyed an item, the trust and motivation to consume it yourself is much larger than if we just say some anonymous folk liked it. I think those are the aspects that everybody is looking into.

How mature would you say these systems are now?

I would say it is a semi-mature research field with yet a long way to go. Just like search technology, which has been around since 1994-95 and there?s tonnes of research but there are still challenges.