Yehuda Koren - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Yehuda Koren

Description:

Recommend items based on past transactions of users. Specific data ... Can easily overfit, sensitive to regularization. Need to separate main effects... – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 30
Provided by: Yehuda4
Category:
Tags: koren | overfit | yehuda

less

Transcript and Presenter's Notes

Title: Yehuda Koren


1
Collaborative Filtering with Temporal Dynamics
  • Yehuda Koren

2
Recommender systems
We Know What You OughtTo Be Watching This Summer
3
Collaborative filtering
  • Recommend items based on past transactions of
    users
  • Specific data characteristics are irrelevant
  • Domain-free
  • Can identify elusive aspects
  • Two popular approaches
  • Matrix factorization
  • Neighborhood

4
Movie rating data
Training data
Test data
5
Achievable RMSEs on the Netflix data
Global average 1.1296
Find better items
User average 1.0651
Movie average 1.0533
Personalization
Cinematch 0.9514 baseline
Algorithmics
Static neighborhood 0.9002
Static factorization 0.8911
Time effects
Leader 0.8558 10.05 improvement
Inherent noise ????
6
Something Happened in Early 2004
2004
7
Are movies getting better with time?
8
Multiple sources of temporal dynamics
  • Item-side effects
  • Product perception and popularity are constantly
    changing
  • Seasonal patterns influence items popularity
  • User-side effects
  • Customers ever redefine their taste
  • Transient, short-term bias anchoring
  • Drifting rating scale
  • Change of rater within household

9
Temporal dynamics - challenges
  • Multiple sources Both items and users are
    changing over time
  • Multiple targets Each user/item forms a unique
    time series ? Scarce data per target
  • Inter-related targets Signal needs to be shared
    among users foundation of collaborative
    filtering ? cannot isolate multiple problems
  • ? Common concept drift methodologies wont
    hold.E.g., underweighting older instances is
    unappealing

10
Basic matrix factorization model
users

items
users

items
A rank-3 SVD approximation
11
Estimate unknown ratings as inner-products of
factors
users
?

items
users

items
A rank-3 SVD approximation
12
Estimate unknown ratings as inner-products of
factors
users
?

items
users

items
A rank-3 SVD approximation
13
Estimate unknown ratings as inner-products of
factors
users
2.4

items
users

items
A rank-3 SVD approximation
14
Matrix factorization model
  • Properties
  • SVD isnt defined when entries are unknown ? use
    specialized methods
  • Can easily overfit, sensitive to regularization
  • Need to separate main effects

15
Baseline predictors
  • Mean rating 3.7 stars
  • The Sixth Sense is 0.5 stars above avg
  • Joe rates 0.2 stars below avg
  • ?Baseline predictionJoe will rate The Sixth
    Sense 4 stars
  • No user-item interaction

16
Factor model correction
  • Both The Sixth Sense and Joe are placed high on
    the Supernatural Thrillers scale
  • ?Adjusted estimateJoe will rate The Sixth Sense
    4.5 stars

17
Matrix factorization with biases
Baseline predictors µ global average bu
bias of u bi bias of i
User-item interaction pu user us factors qi
item is factors
?Minimization problem
regularization
18
Addressing temporal dynamics
  • Factor model conveniently allows separately
    treating different aspects
  • We observe changes in
  • Rating scale of individual users
  • Popularity of individual items
  • User preferences

Baseline predictors
User factors
19
Parameterizing the model
  • Use functional forms bu(t)f(u,t), bi(t)g(i,t),
    pu(t)h(u,t)
  • Need to find adequate f(), g(), h()
  • General guidelines
  • Items show slower temporal changes
  • Users exhibit frequent and sudden changes
  • Factors pu(t) are expensive to model
  • Gain flexibility by heavily parameterizing the
    functions

20
Achievable RMSEs on the Netflix data
Global average 1.1296
Find better items
User average 1.0651
Movie average 1.0533
Personalization
Cinematch 0.9514 baseline
Algorithmics
Static neighborhood 0.9002
Static factorization 0.8911
Time effects
Dynamic factorization 0.8794
Grand Prize 0.8563 10 improvement
Inherent noise ????
21
Neighborhood-based CF
  • Earliest and most common collaborative filtering
    method
  • Derive unknown ratings from those of similar
    items (item-item variant)

22
Neighborhood modeling
Use item-item weights - wij - to relate items
Need to estimate rating of user u for item i
Deviation from baseline estimate for item j
Baseline predictor
Weight from j to i
Set of items rated by u
constants
learned from the data through optimization
23
Optimizing the model
Minimize the squared error function
24
Making the model time-aware
  • A popular scheme instance weightingdecay the
    significance of outdated events within cost
    function

time decay
Dont do this!
25
Why instance weighting isnt suitable?
  • Not enough data per user need to exploit all
    signal, including old one
  • The learnt parameters wij represent time
    invariant item-item relations. Can be also
    deduced from older actions.
  • Two items are related when users rated them
    similarly within a short timeframe, even if this
    happened long ago
  • How to do it right?

26
Time-aware neighborhood model
  • Decay item-item relations based on time distance
  • User-specific decay rate controlled by ßu
  • All past user behavior is equally considered,
    through cost function

27
Temporal neighborhood model delivers same
relative RMSE improvement (0.0117) as temporal
factor model (!)
Global average 1.1296
Find better items
User average 1.0651
Movie average 1.0533
Personalization
Cinematch 0.9514 baseline
Algorithmics
Static neighborhood 0.9002
Static factorization 0.8911
Dynamic neighborhood 0.8885
Time effects
Dynamic factorization 0.8794
Grand Prize 0.8563 10 improvement
Inherent noise ????
28
Lessons
  • Modeling temporal effects is significant in
    improving recommenders accuracy
  • Allow multiple time drifting patterns across
    users and items
  • Integrate all users within a single model to
    allow crucial cross-user collaboration
  • Model user behavior along full history, do not
    over-emphasize recent actions
  • Separate long term values, while excluding
    transient fluctuations from the model
  • Sudden, single-day effects are significant
  • Modeling past temporal fluctuations helps in
    predicting future behavior, even though we do not
    extrapolate future temporal dynamics

29
Yehuda Koren Yahoo! Research yehuda_at_yahoo-inc.com
Write a Comment
User Comments (0)
About PowerShow.com