CSCI 7000 Modern Information Retrieval Jim Martin - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

CSCI 7000 Modern Information Retrieval Jim Martin

Description:

Netflix Prize. Improve the Netflix recommender system by 10 ... Netflix partially hacked. 5/31/09. CSCI 7000 - IR. 21. Readings for Collaborative Filtering ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 23
Provided by: csCol6
Category:

less

Transcript and Presenter's Notes

Title: CSCI 7000 Modern Information Retrieval Jim Martin


1
CSCI 7000Modern Information RetrievalJim Martin
  • Lecture 23
  • 12/1/2008

2
Today 12/1
  • Collaborative Filtering
  • Recommender systems

3
Collaborative Filtering
  • Filtering in the IR context generally means
    separating things (documents) into categories.
    That is, classification.
  • Binary special case
  • Yes/No. I care about it or I dont.
  • As weve seen its typically done by training
    supervised or unsupervised systems based on the
    contents of the items being classified.
  • By contrast, in collaborative filtering its done
    by looking primarily at the behavior of other
    users.

4
Recommender Systems
  • Recommender systems are a specific application of
    collaborative filtering.
  • Recommend a product (movie, book, music, etc.) to
    a user based on...
  • That users past behavior with respect to other
    products
  • Other users behavior with respect to other
    products
  • Behavior here could mean purchases, or could
    mean explicit ratings

5
Example CF Technology iTunes Genius
6
Amazon
  • Probably the most well-developed system.
  • Multiple presentations to the user
  • Recommendations for you
  • People who bought X also bought Y
  • Of the people who looked at X, some bought the
    following...
  • Good recommendations drive sales

7
Netflix Prize
  • Improve the Netflix recommender system by 10 and
    win 1M.

8
Basic Approach
  • You have
  • A bunch of items
  • A bunch of users
  • And data recording users behavior with respect to
    the items
  • Ratings (ordinals or reals)
  • Purchases (binary)
  • So lets do what always do and make matrix. In
    this case
  • A user x item matrix

9
Basic Approach
  • As usual the matrix is mostly empty.
  • These empty cells represent opportunities to make
    recommendations.

Items
Users
10
Basic Approaches
  • Given an User x Item matrix you can take three
    basic approaches.
  • User-based
  • Similarity among users
  • Item-based
  • Similarity among items
  • Model-based
  • Fill in the missing cells with plausible values
    based on the structure of the existing matrix

11
User Based Methods
  • For a given user of interest
  • Ie. A customer logged in to Amazon
  • Compute the similarity of all users to the
    current user and select a subset to use as
    predictors
  • Normalize ratings across users
  • Make predictions for current user by using a
    weighted combination of other user ratings.

12
User-Based Methods
  • Pearson correlation
  • Ranges from -1 to 1
  • 1 means the ratings between users are perfectly
    correlated
  • -1 means theyre perfectly anti-correlated
  • 0 means that theres no observable correlation.

13
Item-Based Methods
  • For the user of interest, select a subset of
    highly rated items from their vector.
  • Compute the similarity score between those
    vectors and other currently empty entries for
    that user.
  • Recommend the N closest unrated entries.
  • Basically the same method. Just flips users and
    items.

14
Model-Based Methods
  • Treat the missing entries like 0s in a language
    model, or a probabilistic model.
  • One approach is to just use LSA to do a
    dimensionality reduction on the matrix.

15
Evaluation
  • Given a training set, the typical scenario is to
    use a held-out scheme
  • Typically based on holding out a rating, not an
    item or a user
  • For example
  • Hold out 1 rating per user (row) and predict the
    missing entries.
  • Compare predicted values to known values.
  • Average over the users
  • Or
  • Sample K ratings per user and predict the missing
    values.

16
Datasets
  • www.grouplens.org

17
Lots of Issues
  • Fundamental problem underlying all of these
    approaches and the whole paradigm...
  • Stationarity assumption
  • Basically that things arent changing in ways
    that matter with respect to the probabilities in
    question
  • But of course thats silly tastes change and
    people change all the time.

18
Lots of Issues
  • The current way of casting the problem ignores
    the temporal dimension.
  • A users future behavior is based their past
    behavior in funny ways that the current model
    doesnt handle well.
  • For example, it doesnt capture marginal utility.
  • How many intro books on a given programming
    language does one person need?
  • The fact that a system might accurately predict
    my hypothetical rating of an item is not a
    prediction about whether or not I care about that
    item at all.

19
Lots of Issues
  • In many settings, various kinds of metadata is
    available
  • Demographic information about users (gender, age,
    location, etc.)
  • Information about the products (price, category,
    prices of similar items, age, features, etc.)
  • Incorporating such information into the basic
    scheme is problematic.

20
Lots of Issues
  • Privacy
  • Storing data on user choices is a tricky business
    with obvious privacy concerns
  • Most efforts rely on deidentification
    techniques...
  • Replace any real world identifier with a unique
    meaningless term...
  • But that doesnt always work
  • AOL fiasco
  • Netflix partially hacked

21
Readings for Collaborative Filtering
  • Read Chapter 2 of Collective Intelligence
  • And the Adomavicius et al and the McNee et all
    papers linked to on the web page.

22
Quiz
  • Quiz is Wednesday
  • Focus on material since last quiz
  • Information extraction
  • Sentiment and opinion
  • Recommender systems
  • But you should remember the earlier stuff
  • Retrieval, classification and clustering
Write a Comment
User Comments (0)
About PowerShow.com