Learning User Preferences - PowerPoint PPT Presentation

About This Presentation
Title:

Learning User Preferences

Description:

Learning User Preferences Jason Rennie MIT CSAIL jrennie_at_gmail.com Advisor: Tommi Jaakkola Information Extraction Informal Communication: e-mail, mailing lists ... – PowerPoint PPT presentation

Number of Views:109
Avg rating:3.0/5.0
Slides: 46
Provided by: Jason536
Category:

less

Transcript and Presenter's Notes

Title: Learning User Preferences


1
Learning User Preferences
Jason Rennie MIT CSAIL jrennie_at_gmail.com
Advisor Tommi Jaakkola
2
Information Extraction
  • Informal Communication e-mail, mailing lists,
    bulletin boards
  • Issues
  • Context switching
  • Abbreviations shortened forms
  • Variable punctuation, formatting, grammar

3
Thesis Advertisement Outline
  • Thesis is not end-to-end IE system
  • We address some IE problems
  • Identifying Resolving Named Entites
  • Tracking Context
  • Learning User Preferences

4
Identifying Named Entities
  • Rialto is now open until 11pm
  • Facts/Opinions usually about a named entity
  • Tools typically rely on punctuation,
    capitalization, formatting, grammar
  • We developed criterion to identify topic-oriented
    words using occurrence stats

Rennie Jaakkola, SIGIR 2005
5
Resolving Named Entites
  • Theyre now open until 11pm
  • What does they refer to?
  • Clustering
  • Group noun phrases that co-refer
  • McCallum Wellner (2005)
  • Excellent for proper nouns
  • Our contribution better modeling of non-proper
    nouns (incl. pronouns)

6
Tracking Context
  • The Swordfish was fabulous
  • Indirect comment on restaurant.
  • Restaurant identifed by context.
  • Use word statistics to find topic switches
  • Contribution new sentence clustering algorithm

7
Learning User Preferences
  • Examples
  • I loved Rialto last night.
  • Overall, Oleana was worth the money
  • Radius wasnt bad, but wasnt great
  • Om was purely pretentious
  • Issues
  • Translate text to partial ordering or rating
  • Predict unobserved ratings

8
Preference Problems
  • Single User w/ Item Features
  • Multi-user, no features
  • Aka Collaborative Filtering

9
Single User, Item Features
Ratings
10
Single User, Item Features
?
Preference Scores
11
Many Users, No Features
Features
Weights
Ratings
Preference Scores
12
Collaborative Filtering
  • Possible goals
  • Predict missing entries
  • Cluster users or items
  • Applications
  • Movies, Books
  • Genetic Interaction
  • Network routing
  • Sports performance

items
users
13
Outline
  • Single User, Features
  • Loss functions, Convexity, Large Margin
  • Loss function for Ratings
  • Many Users, No Features
  • Feature Selection, Rank, SVD
  • Regularization tie together multiple tasks
  • Optimization scale to large problems
  • Extensions

14
This Talk Contributions
  • Implementation and systematic evaluation of loss
    functions for Single User prediction.
  • Scaling Multi-user regularization to large
    (thousands of users/items) problems
  • Analysis of optimization
  • Extensions
  • Hybrid features multiple users
  • Observation model multiple ratings

15
Rating Classification
  • n ordered classes
  • Learn weight vector, thresholds

1
3
2
3
1
2
1
2
1
1
2
3
2
2
1
3
3
3
w
16
Loss Functions
0-1
Hinge
Logistic
Smooth Hinge
Mod. Least Squares
Margin Agreement
17
Convexity
  • Convex function gt no local minima
  • Set convex if all line segments within set

18
Convexity of Loss Functions
  • 0-1 loss is not convex
  • Local minima, sensitive to small changes
  • Convex Bound
  • Large margin solution with regularization
  • Stronger guarantees

19
Proportional Odds
  • McCullagh introduced original rating model
  • Linear interaction weights features
  • Thresholds
  • Maximum likelihood

McCullagh, 1980
20
Immediate-Thresholds
Shashua Levin, 2003
21
Some Errors are Better than Others
22
Not a Bound on Absolute Diff.
4
3
2
1
5
23
All-Thresholds Loss
Srebro, Rennie Jaakkola, NIPS 2004
24
Experiments
Multi-Class Imm-Thresh All-Thresh p-value
MLS .7486 .7491 .6700 1.7e-18
Hinge .7433 .7628 .6702 6.6e-17
Logistic .7490 .7248 .6623 7.3e-22
Least Squares 1.3368
Rennie Srebro, IJCAI 2005
25
Many Users, No Features
Features
Weights
Ratings
Preference Scores
26
Background Lp-norms
  • L0 non-zero entries lt0,2,0,3,4gt0 3
  • L1 absolute value sum lt2,-2,1gt1 5
  • L2 Euclidean length lt1,-1gt2 ?2
  • General vp (?i vip)1/p

27
Background Feature Selection
  • Objective Loss Regularization

L1
L2 Squared
28
Singular Value Decomposition
  • XUSV
  • U,V orthogonal (rotation)
  • S diagonal, non-negative
  • Eigenvalues of XXUSVVSUUSSU are squared
    singular values of X
  • Rank s0
  • SVD used to obtain least-squares low-rank
    approximation

29
Low Rank Matrix Factorization
V
U

X rank k
¼
  • Sum-Squared Loss
  • Fully Observed Y
  • Classification Error Loss
  • Partially Observed Y

Use SVD to find Global Optimum
Non-convex No explicit soln.
30
Low-Rank Non-Convex Set
31
Trace Norm Regularization
Fazel et al., 2001
32
Many Users, No Features
Features
V
X
U
Y
Weights
Ratings
Preference Scores
33
Max Margin Matrix Factorization
Trace Norm
All-Thresholds Loss
  • Convex function of X and ?
  • Low rank in X

Srebro, Rennie Jaakkola, NIPS 2004
34
Properties of the Trace Norm
The factorization U?S, V?S minimizes both
quantities
35
Factorized Optimization
  • Factorized Objective (tight bound)
  • Gradient descent O(n3) per round
  • Stationary points, but no local minima

Rennie Srebro, ICML 2005
36
Collaborative Prediction Results
size, sparsity EachMovie 36656x1648, 96 EachMovie 36656x1648, 96 MovieLens 6040x3952, 96 MovieLens 6040x3952, 96
Algorithm Weak Error Strong Error Weak Error Strong Error
URP .8596 .8859 .6946 .7104
Attitude .8787 .8845 .6912 .7000
MMMF .8548 .8439 .6650 .6725
URP Attitude Marlin, 2004
MMMF Rennie Srebro, 2005
37
Extensions
  • Multi-user Features
  • Observation model
  • Predict which restaurants a user will rate, and
  • The rating she will make
  • Multiple ratings per user/restaurant
  • E.g. Food, Service and Décor ratings
  • SVD Parameterization

38
Multi-User Features
  • Feature parameters (V)
  • Some are fixed
  • Some are learned
  • Learn weights (U) for all features
  • Fixed part of V does not affect regularization

V
39
Observation Model
  • Common assumption ratings observed at random
  • Restaurant selection
  • Geography, popularity, price, food style
  • Remove bias model observation process

40
Observation Model
  • Model as binary classification
  • Add binary classification loss
  • Tie together rating and observation models

?
XUXV WUWV
41
Multiple Ratings
  • Users may provide multiple ratings
  • Service, Décor, Food
  • Add in loss functions
  • Stack parameter matrices for regularization

42
SVD Parameterization
  • Too many parameters UAA-1VX is another
    factorization of X
  • Alternate U,S,V
  • U,V orthogonal, S diagonal
  • Advantages
  • Not over-parameterized
  • Exact objective (not a bound)
  • No stationary points

43
Summary
  • Loss function for ratings
  • Regularization for multiple users
  • Scaled MMMF to large problems (e.g. gt 1000x1000)
  • Trace norm widely applicable
  • Extensions

Code http//people.csail.mit.edu/jrennie/matlab
44
Thanks!
  • Helen, for supporting me for 7.5 years!
  • Tommi Jaakkola, for answering all my questions
    and directing me to the end!
  • Mike Collins and Tommy Poggio for addl guidance.
  • Nati Srebro John Barnett for endless valuable
    discussions and ideas.
  • Amir Globerson, David Sontag, Luis Ortiz, Luis
    Perez-Breva, Alan Qi, Patrycja Missiuro all
    past members of Tommis reading group for paper
    discussions, conference trips and feedback on my
    talks.
  • Many, many others who have helped me along the
    way!

45
Low-Rank Optimization
Write a Comment
User Comments (0)
About PowerShow.com