Recommender Systems Session B presentation

About This Presentation

Transcript and Presenter's Notes

Title: Recommender Systems Session B

1
Recommender SystemsSession B

Robin Burke
DePaul University
Chicago, IL

2
Roadmap

Session A Basic Techniques I
Introduction
Knowledge Sources
Recommendation Types
Collaborative Recommendation
Session B Basic Techniques II
Content-based Recommendation
Knowledge-based Recommendation
Session C Domains and Implementation I
Recommendation domains
Example Implementation
Lab I
Session D Evaluation I
Evaluation
Session E Applications
User Interaction
Web Personalization
Session F Implementation II
Lab II

3
Content-Based Recommendation

Collaborative recommendation
requires only ratings
Content-based recommendation
all techniques that use properties of the items
themselves
usually refers to techniques that only use item
features
Knowledge-based recommendation
a sub-type of content-based
in which we apply knowledge
about items and how they satisfy user needs

4
Content-Based Profiling

Suppose we have no other users
but we know about the features of the items rated
by the user
We can imagine building a profile based on user
preferences
here are the kinds of things the user likes
here are the ones he doesn't like
Usually called content-based recommendation

5
Recommendation Knowledge Sources Taxonomy
RecommendationKnowledge
Collaborative
Opinion Profiles
Demographic Profiles
User
Opinions
Query
Demographics
Constraints
Requirements
Preferences
Content
Item Features
Context
Means-ends
DomainKnowledge
FeatureOntology
Contextual Knowledge
DomainConstraints
6
Content-based Profiling
To find relevant items
? item a1 a2 a3 a4 ... ak
Recommend
? item a1 a2 a3 a4 ... ak
? item a1 a2 a3 a4 ... ak
? item a1 a2 a3 a4 ... ak
? item a1 a2 a3 a4 ... ak
Obtain rated items
Build classifier
? item a1 a2 a3 a4 ... ak
? item a1 a2 a3 a4 ... ak
Classifier
? item a1 a2 a3 a4 ... ak
? item a1 a2 a3 a4 ... ak
Predict
? item a1 a2 a3 a4 ... ak
? item a1 a2 a3 a4 ... ak
Y
N
? item a1 a2 a3 a4 ... ak
? item a1 a2 a3 a4 ... ak
? item a1 a2 a3 a4 ... ak
? item a1 a2 a3 a4 ... ak
? item a1 a2 a3 a4 ... ak
? item a1 a2 a3 a4 ... ak
7
Origins

Began with earliest forms of user models
Grundy (Rich, 1979)
Elaborated in information filtering
Selecting news articles (Dumais, 1990)
More recently spam filtering

8
Basic Idea

Record user ratings for item
Generate a model of user preferences over
features
Give as recommendations other items with similar
content

9
Movie Recommendation

Predictions for unseen (target) items are
computed based on their similarity (in terms of
content) to items in the user profile.
E.g., user profile Pu contains
recommend highly and recommend
mildly

10
Content-Based Recommender Systems
11
Personalized Search

How can the search engine determine the users
context?

?
Query Madonna and Child
?

Need to learn the user profile
User is an art historian?
User is a pop music fan?

12
Play List Generation

Music recommendations
Configuration problem
Must take into account other items already in
list

Example Pandora
13
Algorithms

kNN
Naive Bayes
Neural networks
Any classification technique can be used

14
Naive Bayes

p(A) probability of event A
p(A,B) probability of event A and event B
joint probability
p(AB) probability of event A given event B
we know B happened
conditional probability
Example
A is a student getting an "A" grade
p(A) 20
B is the event of a student coming to less than
50 of meetings
p(AB) is much less than 20
p(A,B) would be the probability of both things
how many students are in this category?
Recommender system question
Li is the event that the user likes item i
B is the set of features associated with item i
Estimate p(LiB)

15
Bayes Rule

p(AB) p(BA) p(A) / p(B)
We can always restate a conditional probability
in terms of
the reverse condition p(BA)
and two prior probabilities
p(A)
p(B)
Often the reverse condition is easier to know
we can count how often a feature appears in items
the user liked
frequentist assumption

16
Naive Bayes

Probability of liking an item given its features
p(Lia1, a2, ... , ak)
think of Li as the class for item i
By the theorem

17
Naive Assumption

Independence
the features a1, a2, ... , ak are independent
independent means
p(A,B) p(A)p(B)
Example
two coin flips P(heads) 0.5
P(heads,heads) 0.25
Anti-example
appearance of the word "Recommendation" and
"Collaborative" in papers by Robin Burke
P("Recommendation") 0.6
P("Collaborative") 0.3
P("Recommendation","Collaborative")0.3 not 0.18
In general
this assumption is false for items and their
features
but pretending it is true works well

18
Naive Assumption

For joint probability
For conditional probability
Bayes' Rule

19
Frequency Table

Iterate through all examples
if example is "liked"
for each feature a
add one to the cell for that feature under L
similar for L

L L
a1
a2
...
ak
20
Example

Total of movies 20
10 liked
10 not liked

21
Classification MAP

Maximum a posteriori
Calculate the probabilities for each possible
classification
pick the one with the highest probability
Examples
"12 Monkeys" Pitt Willis
p(L12 Monkeys)0.13
p(L12 Monkeys)1
not liked
"Devil's Own" Ford Pitt
p(LDevil's Own)0.67
p(LDevil's Own)0.53
liked

22
Classification LL

Log likelihood
For two possibilities
Calculate probabilities
Compute ln(p(Lia1, ... , ak)/p(Lia1, ... , ak)
If gt 0, then classify as liked
Examples
"12 Monkeys" Pitt Willis
ratio 0.13
ln -2.1
not liked
"Devil's Own" Ford Pitt
p(LDevil's Own)0.67
p(LDevil's Own)0.53
ratio 1.25
ln 0.22
liked

23
Smoothing

If a feature never appears in a class
p(ajL)0
that means that it will always veto the
classification
Example
new movie director
cannot be classified as "liked"
because there are no liked instances in which he
is a feature
Solution
Laplace smoothing
add a small random value to all attributes before
starting

24
Naive Bayes

Works surprisingly well
used in spam filtering
Simple implementation
just counting and multiplying
requires O(F) space
where F is the feature set used
easy to update the profile
classification is very fast
Learned classifier can be hard-coded
used in voice recognition and computer games
Try this first

25
Neural Networks
26
Biological inspiration
dendrites
axon
synapses
The information transmission happens at the
synapses.
27
How it works

Source (pre-synaptic)
Tiny voltage spikes travel along the axon
At dendrites, neurotransmitter released in the
synapse
Destination (post-synaptic)
Neurotransmiter absorbed by dendrites
Causes excitation or inhibition
Signals integrated
may produce spikes in the next neuron
Connections
Synaptic connections can be strong or weak

28
Artificial neurons
Neurons work by processing information. They
receive and provide information in form of
voltage spikes.
x1 x2 x3 xn-1 xn
w1
Output
w2
Inputs
y
w3
.
.
.
wn-1
wn
The McCullogh-Pitts model
29
Artificial neurons
Nonlinear generalization of the McCullogh-Pitts
neuron
y is the neurons output, x is the vector of
inputs, and w is the vector of synaptic
weights. Examples
sigmoidal neuron Gaussian neuron
30
Artificial neural networks
Output
Inputs
An artificial neural network is composed of many
artificial neurons that are linked together
according to a specific network architecture. The
objective of the neural network is to transform
the inputs into meaningful outputs.
31
Learning with Back-Propagation

Biological system
seems to modify many synaptic connections
simultaneously
we still don't totally understand this
A simplification of the learning problem
calculate first the changes for the synaptic
weights of the output neuron
calculate the changes backward starting from
layer p-1, and propagate backward the local error
terms
Still relatively complicated
much simpler than the original optimization
problem

32
Application to Recommender Systems

Inputs
features of products
binary features work best
otherwise tricky encoding is required
Output
liked / disliked neurons

33
NN Recommender
Item Features
Liked
Disliked

Calculate recommendation score as yliked -
ydisliked

34
Issues with ANN

Often many iterations are needed
1000s or even millions
Overfitting can be a serious problem
No way to diagnose or debug the network
must relearn
Designing the network is an art
input and output coding
layering
often learning simply fails
system never converges
Stability vs plasticity
Learning is usually one-shot
Cannot easily restart learning with new data
(Actually many learning techniques have this
problem)

35
Overfitting

The problem of training a learner too much
the learner continues to improve on the training
data
but gets worse on the real task

36
Other classification techniques

Lots of other classification techniques have been
applied to this problem
support vector machines
fuzzy sets
decision trees
Essentials are the same
learn a decision rule over the item features
apply the rule to new items

37
Content-Based Recommendation

Advantages
useful for large information-based sites (e.g.,
portals) or for domains where items have
content-rich features
can be easily integrated with content servers
Disadvantages
may miss important pragmatic relationships among
items (based on usage)
avante-garde jazz / classical
not effective in small-specific sites or sites
which are not content-oriented
cannot achieve serendipity novel connections

38
Break

10 minutes

39
Roadmap

Session A Basic Techniques I
Introduction
Knowledge Sources
Recommendation Types
Collaborative Recommendation
Session B Basic Techniques II
Content-based Recommendation
Knowledge-based Recommendation
Session C Domains and Implementation I
Recommendation domains
Example Implementation
Lab I
Session D Evaluation I
Evaluation
Session E Applications
User Interaction
Web Personalization
Session F Implementation II
Lab II

40
Knowledge-Based Recommendation

Sub-type of content-based
we use the features of the items
Covers other kinds of knowledge, too
means-ends knowledge
how products satisfy user needs
ontological knowledge
what counts as similar in the product domain
constraints
what is possible in the domain and why

41
Recommendation Knowledge Sources Taxonomy
RecommendationKnowledge
Collaborative
Opinion Profiles
Demographic Profiles
User
Opinions
Query
Demographics
Constraints
Requirements
Preferences
Content
Item Features
Context
Means-ends
DomainKnowledge
FeatureOntology
Contextual Knowledge
DomainConstraints
42
Diverse Possibilities

Utility
some systems concentrate on representing the
user's constraints in the form utility functions
Similarity
some systems focus on detailed knowledge-based
similarity calculations
Interactivity
some systems use knowledge to enhance the
collection of requirement information
For our purposes
concentrate on case-based recommendation and
constraint-based recommendation

43
Case-Based Recommendation

Based on ideas from case-based reasoning (CBR)
An alternative to rule-based problem-solving
A case-based reasoner solves new problems by
adapting solutions used to solve old problems
-- Riesbeck Schank 1987

44
CBR Solving Problems
Review
Retain
Database
Adapt
Retrieve
Similar
New Problem
45
CBR System Components

Case-base
database of previous cases (experience)
episodic memory
Retrieval of relevant cases
index for cases in library
matching most similar case(s)
retrieving the solution(s) from these case(s)
Adaptation of solution
alter the retrieved solution(s) to reflect
differences between new case and retrieved case(s)

46
Retrieval knowledge

Contents
features used to index cases
relative importance of features
what counts as similar
Issues
surface vs deep similarity

47
Analogy to the catalog

Problem
user need
Case
product
Retrieval
recommendation

48
Entree I
49
Entree II
50
Entree III
51
Critiquing Dialog

Mixed-initiative interaction
user offers input
system responds with possibilities
user critiques or offers additional input
Makes preference elicitation gradual
rather than all-at-once with a query
can guide user away from empty parts of the
product space

52
CBR retrieval

Knowledge-based nearest-neighbor
similarity metric defines distance between cases
usually on an attribute-by-attribute basis
Entree
cuisine
quality
price
atmosphere

53
How do we measure similarity?

complex multi-level comparison
goal sensitive
multiple goals
retrieval strategies
non-similarity relationships
Can be strictly numeric
weighted sum of similarities of features
local similarities
May involve inference
reasoning about the similarity of items

54
Price metric
55
Cuisine Metric
European
Asian
French
Chinese
Japanese
NouvelleCuisine
Vietnamese
Thai
PacificNew Wave
56
Metrics

Goal-specific comparison
How similar is target product to the source with
respect to this goal?
Asymmetric
directional effects
A small of general purpose types

57
Metrics

If they generate a true metric space
approaches using space-partitioning techniques
bsp, quad-trees, etc.
Not always the case
Hard to optimize
storing n2 distances/recalculating
FindMe calculates similarity at retrieval time

58
Combining metrics

Global metric
combination of attribute metrics
Hierarchical combination
lower metrics break ties in upper
Benefits
simple to acquire
easy to understand
Somewhat inflexible
More typical would be a weighted sum

59
Constraint-based Recommendation

Represent users needs as a set of constraints
Try to satisfy those constraints with products

60
Example

User needs a car
Gas mileage gt 25 mpg
Capacity gt 5 people
Price lt 18,000
A solution would be a list of models satisfying
these requirements

61
Configurable Products

Constraints important where products are
configurable
computers
travel packages
business services
(cars)
The relationships between configurable components
need to be expressed as constraints anyway
a GT 6800 graphics card needs power supply gt 300
W

62
Product Space
Weight lt x
Screen gt y
Weight
PossibleRecommendations
Screen Size
63
Utility

In order to rank products
we need a measure of utility
can be slack
how much the product exceeds the constraints
can be another measure
price is typical
can be a utility calculation that is a function
of product attributes
but generally this is user-specific
value of weight vs screen size

64
Product Space
Weight lt x
Screen gt y
Weight
C
A
B
Screen Size
65
Utility

SlackA (X WeightA) (SizeA - Y)
not really commensurate
PriceA
ignores product differences
UtilityA ? (X WeightA) ? (SizeA - Y ) ?
(X WeightA) (SizeA - Y )
usually we ignore ? and treat utilities as
independent
how do we know what ? and ? are?
make assumptions
infer from user behavior

66
Knowledge-Based Recommendation

Hard to generalize
Advantages
no cold start issues
great precision possible
very important in some domains
Disadvantages
knowledge engineering required
can be substantial
expert opinion may not match user preferences

Recommender Systems Session B PowerPoint PPT Presentation