Collaborative Filtering - PowerPoint PPT Presentation

About This Presentation
Title:

Collaborative Filtering

Description:

Description:A homicide detective and a fire marshall must stop a pair of ... Description: A high-school boy is given the chance to write a story about an up ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 34
Provided by: rong7
Learn more at: http://www.cse.msu.edu
Category:

less

Transcript and Presenter's Notes

Title: Collaborative Filtering


1
Collaborative Filtering
  • Rong Jin
  • Department of Computer Science and Engineering
  • Michigan State University

2
Outline
  • Brief introduction information filtering
  • Collaborative filtering
  • Major issues in collaborative filtering
  • Main methods for collaborative filtering
  • Flexible mixture model for collaborative
    filtering
  • Decoupling model for collaborative filtering

3
Short vs. Long Term Info. Need
  • Short-term information need (Ad hoc retrieval)
  • Temporary need, e.g., info about used cars
  • Information source is relatively static
  • User pulls information
  • Application example library search, Web search
  • Long-term information need (Filtering)
  • Stable need, e.g., new data mining algorithms
  • Information source is dynamic
  • System pushes information to user
  • Applications news filter

4
Examples of Information Filtering
  • News filtering
  • Email filtering
  • Movie/book/product recommenders
  • Literature recommenders
  • And many others

5
Information Filtering
  • Basic filtering question Will user U like item
    X?
  • Two different ways of answering it
  • Look at what U likes
  • ? characterize X ? content-based filtering
  • Look at who likes X
  • ? characterize U ? collaborative filtering
  • Combine content-based filtering and collaborative
    filtering

6
Other Names for Information Filtering
  • Content-based filtering is also called
  • Adaptive Information Filtering in TREC
  • Selective Dissemination of Information (SDI) in
    Library Information Science
  • Collaborative filtering is also called
  • Recommender systems

7
Example Content-based Filtering
History
8
Example Collaborative Filtering
User 1 1 5 3 4 3
User 2 4 1 5 2 5
User 3 2 ? 3 5 4
9
Collaborative Filtering (CF) vs. Content-based
Filtering (CBF)
  • CF do not need content of items while CBF relies
    the content of items
  • CF is useful when content of items
  • are not available or difficult to acquire
  • are brief and insufficient
  • Example movie recommendation
  • A movie is preferred may because
  • its actor
  • its director
  • its popularity

10
Application of Collaborative Filtering
11
Collaborative Filtering
  • Goal Making filtering decisions for an
    individual user based on the judgments of other
    users

utest 3 4 1
12
Collaborative Filtering
  • Goal Making filtering decisions for an
    individual user based on the judgments of other
    users
  • General idea
  • Given a user u, find similar users u1, , um
  • Predict us rating based on the ratings of u1, ,
    um

13
Example Collaborative Filtering
User 1 1 5 3 4 3
User 2 4 1 5 2 5
User 3 2 ? 3 5 4
14
Memory-based Approaches for CF
  • The key is to find users that are similar to the
    test user
  • Traditional approach
  • Measure the similarity in rating patterns between
    different users
  • Example Pearson Correlation Coefficient

15
Pearson Correlation Coefficient for CF
  • Similarity between a training user y and a test
    user y0

16
Pearson Correlation Coefficient for CF
  • Estimate ratings for the test user

Weighted vote of normalized rates
17
Example
User 1 1 5 3 4 3
Normalized Rate
User 2 4 1 5 2 5
Normalized Rate
User 3 2 ? 3 5 4
Normalize Rate
18
Example
User 1 1 5 3 4 3
Normalized Rate -2.2 1.8 -0.2 0.8 -0.2
User 2 4 1 5 2 5
Normalized Rate 0.6 -2.4 1.6 -1.4 1.6
User 3 2 ? 3 5 4
Normalize Rate -1.5 -0.5 1.5 0.5
19
Example
User 1 1 5 3 4 3
Normalized Rate -2.2 1.8 -0.2 0.8 -0.2 0.85
User 2 4 1 5 2 5
Normalized Rate 0.6 -2.4 1.6 -1.4 1.6 -0.49
User 3 2 ? 3 5 4
Normalize Rate -1.5 -0.5 1.5 0.5
20
Problems with Memory-based Approaches
User 1 ? 5 3 4 2
User 2 4 1 5 ? 5
User 3 5 ? 4 2 5
User 4 1 5 3 5 ?
  • Most users only rate a few items
  • Two similar users can may not rate the same set
    of items
  • ? Clustering users and items

21
Flexible Mixture Model (FMM)
  • Cluster both users and items simultaneously

User 1 ? 5 3 4 2
User 2 4 1 5 ? 5
User 3 5 ? 4 2 5
User 4 1 5 3 5 ?
User clustering and item clustering are
correlated !
22
Flexible Mixture Model (FMM)
  • Cluster both users and items simultaneously

User Class I 1 p(4)1/4 p(5)3/4 3
User Class II p(4)1/4 p(5)3/4 p(1)1/2 p(2)1/2 p(4)1/2 p(5)1/2
Unknown ratings are gone!
23
Flexible Mixture Model (FMM)
Zu user class Zo item class U user O item R
rating Hidden variable Observed variable
Zu
Zo
24
Flexible Mixture Model Estimation
  • Annealed Expectation Maximization (AEM) algorithm
  • E-step calculate posterior probability for
    hidden variables zu and Zo
  • b temperature for Annealed EM algorithm
  • M-step updated parameters

25
Flexible Mixture Model Predication
Key issue What user class does the test user
belong to ?
  • Fold-in process
  • Repeat the EM algorithm including ratings from
    the test user
  • Fix all the parameters except for P(utzu)

26
Another Prob. with Memory-based Approaches
User 1 2 5 3 4 2
User 2 4 1 4 1 3
User 3 5 2 5 2 5
User 4 1 4 2 3 1
  • Users with similar interests can have different
    rating patterns
  • ? Decoupling preference patterns from rating
    patterns

27
Decoupling Model (DM)
Zu user class Zo item class U user O item R
rating
Zo
Zu
28
Decoupling Model (DM)
Zu user class Zo item class U user O item R
rating
Zpref whether users like items
29
Decoupling Model (DM)
Zu user class Zo item class U user O item R
rating
Zpref whether users like items ZR rating class
  • Separating preference and rating patterns
  • User class Rating class ? rating R
  • Zu ? Zpref and ZR Zpref ? r

30
Experiment
  • Datasets EachMovie and MovieRating
  • Evaluation
  • Mean Absolute Error (MAE) average absolute
    deviation of the predicted ratings to the actual
    ratings on items.
  • The smaller MAE, the better the performance

MovieRating EachMovie
Number of Users 500 2000
Number of Items 1000 1682
Avg. of rated items/User 87.7 129.6
Number of ratings 5 6
31
Experiment Protocol
  • Test the sensitivity of the proposed model to the
    amount of training data
  • Vary the number of training users
  • MovieRating dataset 100 and 200 training users
  • EachMovie dataset 200 and 400 training users
  • Test the sensitivity of the proposed model to the
    information needed for the test user
  • Vary the number of rated items provided by the
    test user
  • 5, 10, and 20 items are given with ratings

32
Experimental ResultsFMM and other baseline
algorithms
MAE
MAE
A smaller MAE indicates better performance
Movie Rating, 200 Training Users
Movie Rating, 100 Training Users
MAE
MAE
Each Movie, 400 Training Users
Each Movie, 200 Training Users
33
FMM vs. DM
Smaller value indicates better performance
Training Users Size Algorithms 5 Items Given 10 Items Given 20 Items Given
100 FMM 0.829 0.822 0.807
100 DM 0.791 0774 0.751
200 FMM 0.800 0.787 0.768
200 DM 0.770 0.753 0.730
Results on Movie Rating
Training Users Size Algorithms 5 Items Given 10 Items Given 20 Items Given
200 FMM 1.07 1.04 1.02
200 DM 1.06 1.02 1.00
400 FMM 1.05 1.03 1.01
400 DM 1.04 1.01 0.99
Results on Each Movie
Write a Comment
User Comments (0)
About PowerShow.com