Collaborative Filtering presentation

About This Presentation

Transcript and Presenter's Notes

Title: Collaborative Filtering

1
Collaborative Filtering

Rong Jin
Department of Computer Science and Engineering
Michigan State University

2
Outline

Brief introduction information filtering
Collaborative filtering
Major issues in collaborative filtering
Main methods for collaborative filtering
Flexible mixture model for collaborative
filtering
Decoupling model for collaborative filtering

3
Short vs. Long Term Info. Need

Short-term information need (Ad hoc retrieval)
Temporary need, e.g., info about used cars
Information source is relatively static
User pulls information
Application example library search, Web search
Long-term information need (Filtering)
Stable need, e.g., new data mining algorithms
Information source is dynamic
System pushes information to user
Applications news filter

4
Examples of Information Filtering

News filtering
Email filtering
Movie/book/product recommenders
Literature recommenders
And many others

5
Information Filtering

Basic filtering question Will user U like item
X?
Two different ways of answering it
Look at what U likes
? characterize X ? content-based filtering
Look at who likes X
? characterize U ? collaborative filtering
Combine content-based filtering and collaborative
filtering

6
Other Names for Information Filtering

Content-based filtering is also called
Adaptive Information Filtering in TREC
Selective Dissemination of Information (SDI) in
Library Information Science
Collaborative filtering is also called
Recommender systems

7
Example Content-based Filtering
History
8
Example Collaborative Filtering
User 1 1 5 3 4 3
User 2 4 1 5 2 5
User 3 2 ? 3 5 4
9
Collaborative Filtering (CF) vs. Content-based
Filtering (CBF)

CF do not need content of items while CBF relies
the content of items
CF is useful when content of items
are not available or difficult to acquire
are brief and insufficient
Example movie recommendation
A movie is preferred may because
its actor
its director
its popularity

10
Application of Collaborative Filtering
11
Collaborative Filtering

Goal Making filtering decisions for an
individual user based on the judgments of other
users

utest 3 4 1
12
Collaborative Filtering

Goal Making filtering decisions for an
individual user based on the judgments of other
users
General idea
Given a user u, find similar users u1, , um
Predict us rating based on the ratings of u1, ,
um

13
Example Collaborative Filtering
User 1 1 5 3 4 3
User 2 4 1 5 2 5
User 3 2 ? 3 5 4
14
Memory-based Approaches for CF

The key is to find users that are similar to the
test user
Traditional approach
Measure the similarity in rating patterns between
different users
Example Pearson Correlation Coefficient

15
Pearson Correlation Coefficient for CF

Similarity between a training user y and a test
user y0

16
Pearson Correlation Coefficient for CF

Estimate ratings for the test user

Weighted vote of normalized rates
17
Example
User 1 1 5 3 4 3
Normalized Rate
User 2 4 1 5 2 5
Normalized Rate
User 3 2 ? 3 5 4
Normalize Rate
18
Example
User 1 1 5 3 4 3
Normalized Rate -2.2 1.8 -0.2 0.8 -0.2
User 2 4 1 5 2 5
Normalized Rate 0.6 -2.4 1.6 -1.4 1.6
User 3 2 ? 3 5 4
Normalize Rate -1.5 -0.5 1.5 0.5
19
Example
User 1 1 5 3 4 3
Normalized Rate -2.2 1.8 -0.2 0.8 -0.2 0.85
User 2 4 1 5 2 5
Normalized Rate 0.6 -2.4 1.6 -1.4 1.6 -0.49
User 3 2 ? 3 5 4
Normalize Rate -1.5 -0.5 1.5 0.5
20
Problems with Memory-based Approaches
User 1 ? 5 3 4 2
User 2 4 1 5 ? 5
User 3 5 ? 4 2 5
User 4 1 5 3 5 ?

Most users only rate a few items
Two similar users can may not rate the same set
of items
? Clustering users and items

21
Flexible Mixture Model (FMM)

Cluster both users and items simultaneously

User 1 ? 5 3 4 2
User 2 4 1 5 ? 5
User 3 5 ? 4 2 5
User 4 1 5 3 5 ?
User clustering and item clustering are
correlated !
22
Flexible Mixture Model (FMM)

Cluster both users and items simultaneously

User Class I 1 p(4)1/4 p(5)3/4 3
User Class II p(4)1/4 p(5)3/4 p(1)1/2 p(2)1/2 p(4)1/2 p(5)1/2
Unknown ratings are gone!
23
Flexible Mixture Model (FMM)
Zu user class Zo item class U user O item R
rating Hidden variable Observed variable
Zu
Zo
24
Flexible Mixture Model Estimation

Annealed Expectation Maximization (AEM) algorithm
E-step calculate posterior probability for
hidden variables zu and Zo
b temperature for Annealed EM algorithm
M-step updated parameters

25
Flexible Mixture Model Predication
Key issue What user class does the test user
belong to ?

Fold-in process
Repeat the EM algorithm including ratings from
the test user
Fix all the parameters except for P(utzu)

26
Another Prob. with Memory-based Approaches
User 1 2 5 3 4 2
User 2 4 1 4 1 3
User 3 5 2 5 2 5
User 4 1 4 2 3 1

Users with similar interests can have different
rating patterns
? Decoupling preference patterns from rating
patterns

27
Decoupling Model (DM)
Zu user class Zo item class U user O item R
rating
Zo
Zu
28
Decoupling Model (DM)
Zu user class Zo item class U user O item R
rating
Zpref whether users like items
29
Decoupling Model (DM)
Zu user class Zo item class U user O item R
rating
Zpref whether users like items ZR rating class

Separating preference and rating patterns
User class Rating class ? rating R
Zu ? Zpref and ZR Zpref ? r

30
Experiment

Datasets EachMovie and MovieRating
Evaluation
Mean Absolute Error (MAE) average absolute
deviation of the predicted ratings to the actual
ratings on items.
The smaller MAE, the better the performance

MovieRating EachMovie
Number of Users 500 2000
Number of Items 1000 1682
Avg. of rated items/User 87.7 129.6
Number of ratings 5 6
31
Experiment Protocol

Test the sensitivity of the proposed model to the
amount of training data
Vary the number of training users
MovieRating dataset 100 and 200 training users
EachMovie dataset 200 and 400 training users
Test the sensitivity of the proposed model to the
information needed for the test user
Vary the number of rated items provided by the
test user
5, 10, and 20 items are given with ratings

32
Experimental ResultsFMM and other baseline
algorithms
MAE
MAE
A smaller MAE indicates better performance
Movie Rating, 200 Training Users
Movie Rating, 100 Training Users
MAE
MAE
Each Movie, 400 Training Users
Each Movie, 200 Training Users
33
FMM vs. DM
Smaller value indicates better performance
Training Users Size Algorithms 5 Items Given 10 Items Given 20 Items Given
100 FMM 0.829 0.822 0.807
100 DM 0.791 0774 0.751
200 FMM 0.800 0.787 0.768
200 DM 0.770 0.753 0.730
Results on Movie Rating
Training Users Size Algorithms 5 Items Given 10 Items Given 20 Items Given
200 FMM 1.07 1.04 1.02
200 DM 1.06 1.02 1.00
400 FMM 1.05 1.03 1.01
400 DM 1.04 1.01 0.99
Results on Each Movie

Write a Comment

User Comments (0)

About PowerShow.com

Collaborative Filtering PowerPoint PPT Presentation