Skip and go to main content

Data & Intelligence October 12, 2018

How machine learning takes care of your dose of favourite content

350 million users, more than 10 million posts per month and tons of data. Behind the scenes at social news website Reddit, they don’t have time to get bored. But in the content overload, how do you ensure that your users see that one article they are looking for? At Dept Festival, Luis Bitencourt-Emilio explains how machine learning plays a major and indispensable role in this.

From health care to smart cars and from marketing to security: machine learning is increasingly being used in various sectors, also in the digital world, which affects our lives in all sorts of areas. For example, self-learning algorithms are increasingly being used to automate complex campaigns and dynamic content. Luis Bitencourt-Emilio knows exactly how this works behind the scenes. He previously worked for giants like Microsoft and IBM and later, he boarded Reddit.

For those who do not yet know the last platform, Reddit is a social news website annexe online community where you will not get bored that quickly. “You can endlessly look for cat videos, but you will also find heart-warming stories. For example, someone shared that she has found her kidney donor through the platform,” Luis explains. So there is something for everyone. But how do you ensure that users find the content that they are looking for, among thousands of different topics?

The cons of upvotes and downvotes

Despite the fact that Reddit has a huge amount of data, it took a while before it became clear how machine learning could be applied in the best way possible. The first step to machine learning algorithms was the introduction of the ‘recommended’ tab. With the collaborative filtering method, they created a vector of separate Reddit users, which was based on whether you gave an upvote or a downvote. This way, you were linked to content from people who had already enjoyed the same things as you before.

“One of the downsides was that only 10% of the Redditors actually voted for posts they liked or disliked”, Luis commented. “And if they did, it was often only one vote a day.” Because there was too little content to build on, machines were not yet able to link the right content to the right user. What was needed was, yes, a good piece of human insight.

From popular content to personal content

If there was anyone who had that insight, it was Luis. Together with his team, he looked at how they could use machine learning to further personalise Reddit’s various subpages. These pages, also called subreddits, vary from politics, machines, sports and GIFs. By making personalisation no longer dependent on collaborative filtering, but on content-based methods, a new model was developed linking users to the content they were specifically looking for. So that someone who scans the platform for a page full of high-fiving cats is not linked to a subreddit on political relations in Europe.

To include the renown homepage, Luis and his team dug into how they could link specific posts to users here as well. “Our homepage was dependent on an algorithm that could only filter for popularity for twelve years, so here, we also wanted to use a smarter algorithm.”

Don’t neglect your retention rate

And it’s evident from the results, that good, personal content works. The new technical infrastructure which was developed had a positive effect on the user experience. Thanks to machine learning, people saw fewer ‘recommended content’ which did not match their interests and consequently stayed on the website longer. “Especially the latter is important because it is precisely this ‘retention rate’ – in contrast to a ‘click rate’ – that shows that you do not tire your users with content that they do not want to see,” Luis adds. “In other words, a high ‘click rate’ is quite meaningless when people leave your website in no time.”

Suiting every taste

The abundance of data is both a blessing and a curse. It can provide us with a lot of insight, but without proper tooling, it becomes a huge task to not be overwhelmed. If there is something that Luis has made clear, is that machine learning is not something to be afraid of, but something that we should embrace. It can offer us that specific content we prefer to read and see. So if you as an active Redditor see a page full of cat pictures, you do not have to be surprised anymore, because now you know what’s going on behind the scenes. Or maybe you can endlessly scroll through dog videos, like Luis. “Because, honestly,” he adds jokingly, “dogs are much more fun than cats, right?”

Questions? We are here to help!