How do you adopt a startup mindset in a corporate world?

There is one thing we know for sure is that today’s pace of change is greater than ever before and we are not going back. The new normal is about having the ability to continuously adapt and fast…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Basic Understanding of Recommender System

In 2006 Netflix famously offered a price to solve a simple problem which have been around for years. All online commerce giant are fighting to get better at solving the same problem.

PROBLEM : Given observations of user past behavior predict which other things that same user will like. For example, based on Ali’s viewing history (Jurassic park, Anne Hall) will he like ‘The Matrix’ or ‘Crates of Wrath’ ?

We can represent user preferences graphically as connections with ‘people’ on one side and ‘things’ on other side such as Movies. Here the strength of connections is visually represented by the thickness of the lines.

User and Movies

So the problem is to fill all the missing connection we don’t yet know highlighted in red, but because of the messy unpredictable nature of peoples taste we can never perfectly predict what they will like because it involves the guess about the future based on something you have never seen. Also The answer is always in flux because peoples taste change over time.

What we can do is try to estimate those value as best as we can using whatever data we have access to. So regardless of the context we can describe the problem into following Mathematical form.

We use a matrix which is collection of number organized into rows and columns where the rows are users and columns are items (in this case, ‘movies’) and each cell in this matrix contains a number that describes how much each individual like each movie for example we can use a scale from 0 to 4 where 0-hate, 2-neutral, 4-Loved it.

Users( A, B, C, D) and Movies( M1, M2, M3, M4, M5)

At any given moment there will be some set of preference data for user and movies but it will be incomplete because user have only seen and rated few movies each. So the problem boils down to predict these missing values. How should we make these predictions?

It begins by labelling each person and movies with some known attributes or what we call features like ‘action’ and ‘comedy’ that means if Ali like ‘comedy’ and hate ‘action’ we can represent it as [3, 0]. Each movie is also mapped to each feature in the same way. For example, Matrix has no comedy and much action, so we can label it as [0, 4].

To determine if someone will like the movie we multiply these factors together. So we can represent the strength of connection between Ali and the Matrix as [3x0 + 0x4 = 0]. So our estimation is that she will hate the movie.

So to make our prediction we first need to gather these features data for every user and movie. To simplify let us return to our matrix representation. We can store this info in two matrices.

First defines the mappings between user and features and second matrix maps movies and features and by multiplying these matrices together (matrix multiplication), we can get an estimated strength if connection between any person and movie.

We can normalize our data so that values are between 0 and 4.

graphical representation of Content Based Filtering.

That is one way solve this problem known as content filtering.

The problem is, it’s overly simplistic and not very accurate that’s because there are obviously more relevant features of a movie than just comedy or action. The obvious way to improve this is just to include more features.

When Netflix started, it did just this, it would ask new users to fill out a laundry list of questions about their preferences before presenting them with suggested movies. This leads to the problem of having to collect all of this preference data on users. Not only it is a burden for users, it’s also prone to failure as we aren’t always great at describing our own preferences. Sometimes we simply can’t explain why we like things, we just do. Which brings us to the other approach called Collaborative based filtering

So to begin we can throw away the idea of dreaming up features used to connect peoples and movies. Instead we flip things around and use the user preference data to generate the features. For example we might have this incomplete set of preference data and we will instead learn or discover the relevant features based on patterns in this data. This is done by simply reversing the problem.

We first perform an approximate factorization of Matrix into two matrices and we can do this using a machine learning approach. The job of the machine learning algorithm is to guess values for those matrices which will match the existing data and the preference matrix as closely as possible.

The simplest approach is to simply guess the numbers over and over until you arrive at a set of numbers which predict the data with the lowest error overall. Once this estimation is finished we can multiply the matrices as before to fill in all of the missing values. It’s important to note that we won’t know exactly what to label these discovered features so we call them latent features because they arise out of the underlying patterns and the data. We can think of them as an average or weighted sum of the patterns in the data.

They are not based on a human defined feature, such as comedy and action. This is the key insight behind this method. With content filtering, the features come from the human mind whereas with collaborative filtering the features are extracted directly from the patterns in the data and this will predict the data in the same way but more accurately.

Add a comment

Related posts:

Gurkha Soldiers

Thousands of youngsters are seen on the hills of Nepal preparing them to fit in the tough selection that is held every year in Nepal. Only a few stand to their chances, some are lured by the better…

Jesus turns water into wine

His mission must follow God’s timing, not anyone else’s. “My hour has not yet come” always refers to Jesus dying and being exalted e.g. John 7:30 and several other mentions in this gospel. There…

Unit Testing AWS Lambda Functions in Node.js

Writing backend code — like web services or anything else really — with AWS Lambda functions is amazingly easy, in particular when you choose Node.js as your weapon of choice. The amount of code…