Collaborative Filtering

Average Rating

Basic algorithms is to make s(j) the average rating for j:

$s(i,j) = \frac{\sum_{ {i} \in \omega_j} r_{ij}}{| \omega_j |}$

s(i,j) can depend both on user i and item j
Omega_j = Set of all users who rated item j
r_ij = rating user i gave item j
i: [1,N], N is the number of users
j: [1,M], M is the number of items

Some Issues of Average Rating

It treats everyone’s rating of the movie equally.

Sparsity

R_{N*M} = user - items ratings matrix of size N * M, User-item matrix is sparse because most entries are empty!

Goal of Collaborative Filtering

We want to make recommendations
Most of r(i,j) doesn’t exist, this is good
Dense matrix is convenient mathematically, but sucks business-wise
It means every user has already seen every movie, so we have nothing to recommend
Therefore, the matrix must be sparse in order to actually have items to recommend

Regression

Since we want to predict a real number, so objective is MSE:

$MSE = \frac{1}{\omega} \sum_{i,j \in \omega}(r_{ij} - \bar{r}_{ij})^2$

omega is the set of pairs (i,j) where user i has rated item j.

User-User Collaborative Filtering

For UCF, the goal is to find the “users like me”. It’s intuitive that if they are “like me”, I’d like movies they’ve rated highly.

The user-item matrix is shown below:

We can see that Alice’s and Bob’s ratings are highly correlated.

Weighted Ratings

Intuitively, I want it to be small for users I don’t agree with, large for users I do agree with:

$s(i,j) = \frac{\sum_{ {i,i\prime} \in \omega_j} W_{ii\prime} r_{ij}}{ \sum_{i,i\prime \in \omega_j} W_{ii\prime}}$

W_ii’ is the weight between user i and user i’.

Deviation

We care how much it deviates user’s own average, but not the absolute rating:

$dev(i,j) = r(i,j) - \bar{r_i}, for \ a \ known \ rating$

The deviation score for use i gave item j is the rating user i gave j minus the average rating of user i across all movies.

Deviation + Weighted Ratings

The score function combining deviation and weighted ratings is shown below:

$s(i,j) = \bar{r_i} + \frac{\sum_{ {i,i\prime} \in \omega_j} W_{ii\prime} | r_{i\prime j} - \bar{r}_{i\prime} | }{ \sum_{i,i\prime \in \omega_j} W_{ii\prime}}$

How to Calculate Weights

Pearson Correlation Coefficient

$P(x,y) = \frac{\sum_{i=1}^{N} (x_i - \bar{x}) (y_i - \bar{y})}{\sqrt{\sum_{i=1}^{N} (x_i - \bar{x})^2} \sqrt{\sum_{i=1}^{N} (y_i - \bar{y})^2}}$

The problem is our matrix is sparse. So we could modify the formula to this:

Cosine Similarity

$\cos \theta = \frac{x^T y}{ |x| |y| } = \frac{\sum_{i=1}^{N} x_i y_i}{ \sqrt{\sum_{i=1}^{N} x_i ^2} \sqrt{\sum_{i=1}^{N}y_i^2}}$

Cosine Similarity VS Pearson Correlation Coefficient

They are the same! Except Pearson is centered. We want to center them anyway because we’re working with deviations, not absolute ratings.

Problem

If 2 users have zero movies in common, don’t consider i’ in user i’s calculation-it can’t be calculated
If few(e.g. < 5) then don’t use the weight, since not enough data to be accurate

Neighborhood

In practice, don’t sum over all users who rated movie j (takes too long)
It can help to pre-compute weights beforehand
Non-trivial: instead of summing over all users, take the ones with the highest weight. E.g. use K nearest neighbors, K = 25 up to 50

Item-Item Collaborative Filtering

The correlation between the column vectors is high. If you like Power Rangers, you’ll also like Transformers.

Hot to Calculate Weights for Item Correlation

Item Score

The deviation means how much user i likes item j’, compared to how much everyone else like j’. If user i really like j’(more than other users do) and j is similar to j’(w_jj’ is high), then user i probably likes j too.

UCF VS ICF

UCF: Choose items for a uses, because those items have been liked by similar users
ICF: Choose items for a user, because this user has liked similar items in the past

UCF and ICF are mathematically identical.

Summary

For average rating which considers only a single score for each item regardless of which user is looking. Some problems:

Not all rating should be treated equally
Users who I agree with should be weighted higher
Users I disagree with should be weighted lower

By using collaborative filtering method, the s(i,j) score depends on user i and item j. We used the pearson correlation as weights.

By flipping the rating matrix sideways, we can convert our UCF algorithm into an ICF algorithm.

But accuracy is not the most important thing we care about since it can leads to lack of diversity in recommendations which always suggesting “similar products”.

The user-item matrix doesn’t have to be ratings at all!

Note: Cover Picture

Super Agents of AI