Recommender Systems Session E - PowerPoint PPT Presentation

About This Presentation

Title:

Recommender Systems Session E

Description:

profile matching score (similarity to session window) and ... find the longest rule with at least one non-matching entry. sort by confidence ... – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 33

Provided by: CTI79

Category:

more less

Transcript and Presenter's Notes

Title: Recommender Systems Session E

1
Recommender SystemsSession E

Robin Burke
DePaul University
Chicago, IL

2
Roadmap

Session A Basic Techniques I
Introduction
Knowledge Sources
Recommendation Types
Collaborative Recommendation
Session B Basic Techniques II
Content-based Recommendation
Knowledge-based Recommendation
Session C Domains and Implementation I
Recommendation domains
Example Implementation
Lab I
Session D Evaluation I
Evaluation
Session E Applications
User Interaction
Web Personalization
Session F Implementation II
Lab II

3
Web Personalization

A subtype of recommendation
personalizing the browsing experience of a user
by dynamically tailoring the look, feel, and
content of a Web site to the users needs and
interests.
Items
web pages
Data
instead of ratings
log data from web sites

4
Raw data
5
Challenges and Pitfalls

Technical Challenges
data collection and data preprocessing
defining actionable knowledge
choosing personalization algorithms
Implementation/Deployment Challenges
what to personalize
when to personalize
degree of personalization or customization
how to target information without being intrusive

6
Usage-Based Profiles

Characteristics
implicit ratings
preferences inferred from actions
passive stance
system takes no activity to gather more rating
information
anonymity
many users will be anonymous
cannot rely on long-term identity
data characteristics
usually more voluminous than explicit rating data
more noise than explicit rating data

7
Heuristic Preference Indicators

How to know if user "likes" something?
basic heuristic length of time spent
the longer the more liked
What if the user takes a phone call?
threshold
What if the page is at the end of the session?
either ignore
possibly lose crucial information
or consider positive
significant noise in failed interactions
What if the page is really short?
normalize on page length
What if the back button doesn't reload page?
infer session activity
What if the "user" is actually a web crawler?
recognize pattern of behavior

8
Problems

Cold start
cannot count on long-term profiling
must be able to make predictions with short
profiles
Covert model
user has no direct input
can't express interests directly
even if they are willing to
Sparsity
much worse than for explicit profiles
many more users
many more items (pages)
profiles much shorter
"Hidden web"
many pages are produced by database queries
page differences not reflected in the log
can't recommend indistinguishable items

9
Advantages

No explicit user ratings or interaction with
users
Helps preserve user privacy, by making effective
use of anonymous data
Large user base makes CF effective
if we can implement it scalably
Content-based / knowledge-based approaches hard
to use on web data

10
Web Personalization Process

Generate aggregate user models
not single users but clusters of similar users
guess why?
off-line process
Steps
Clustering user transactions
Clustering items / pageviews
Association rule mining
Sequential pattern discovery
Provide recommendation
on-line
match a users active session
to provide dynamic content

11
Off-line Process
12
On-line Process
13
Representation
Pageview/objects
Session/user data
Raw weights are usually based on time spent on a
page, but in practice, need to normalize and
transform.
14
Clustering

k-means clustering algorithm
specify a number of clusters
system finds an arrangement of items that
minimizes the mean distance within the items in
each cluster is minimized
for each cluster
think of each items as a vector
generate the centroid

15
Clustering

Transaction clusters as Aggregate Profiles
Each transaction is viewed as a pageview vector
Each cluster contains a set of transaction
vectors with a centroid
Each centroid acts as an aggregate profile with
representing the weight for pageview pi in
the profile
Compute similarity between a current users
profile (or the active user session) and the
cluster centroids

16
Recommendation Algorithm

Keep track of users navigational history through
the site
a fixed-size sliding window over the active
session to capture the current users
short-term history depth
Match current users activity against the
discovered profiles
profiles either can be based on aggregate usage
profiles, or are obtained directly from
association rules or sequential patterns
Dynamically generated recommendations are added
to the returned page
each pageview can be assigned a recommendation
score based on
matching score to user profiles (e.g., aggregate
usage profiles)
information value of the pageview based on
domain knowledge (e.g., link distance of the
candidate recommendation to the active session)

17
Matching Sessions

Matching score computed using cosine similarity
Users active session (pageviews in the current
window) is compared to each aggregate profile
(both are viewed as pageview vectors)
Weight of items in the profile vector is the
significance weight of the item for that profile
Weight of items in the session vector can be all
1s, or based on some method for determining
their significance in the current session

18
Recommendations

from each matching profile
recommend the items not already in the user
session window, and
not directly linked from the pages in the current
session window
the recommendation score for an item is based on
profile matching score (similarity to session
window) and
the weight of the item in that profile
can include novelty
weight items farther away from the current
location of user higher

19
Example
Sample cluster centroid from dept. Web site
(cluster size 330)
20
Using Clusters for Personalization
Original Session/user data
Given an active session A ? B, the best matching
profile is Profile 1. This may result in a
recommendation for page F.html, since it appears
with high weight in that profile.
Result of Clustering
PROFILE 0 (Cluster Size 3) ---------------------
----------------- 1.00 C.html 1.00 D.html PROFILE
1 (Cluster Size 4) ----------------------------
---------- 1.00 B.html 1.00 F.html 0.75 A.html 0.2
5 C.html PROFILE 2 (Cluster Size
3) -------------------------------------- 1.00 A.h
tml 1.00 D.html 1.00 E.html 0.33 C.html
21
Association Rules

An alternative to clusters is to build
association rules
Association rule
a tuple lti1, i2, .., ikgt
all of which appear together with some frequency
Can be used for prediction
if a user has seen
lti1, i2, ...ik-1gt then predict ik
lti2, i3, ...ikgt then predict i1

22
Learning Association Rules

Multiple passes through a database of
transactions
1st time we collect all items that occur with a
certain frequency
2nd time we collect all pairs
both items must be in the set above
3rd time collect all triples
until there are none left

23
Recommending with Association Rules

Keep track of users navigational history through
the site
Match current users activity against the
association rules
find the longest rule with at least one
non-matching entry
sort by confidence
predict the entry or entries not in the user's
profile
Dynamically generated recommendations are added
to the returned page

24
Sequential methods

Sequential patterns as profiles
similar to association rules, but the ordering of
accessed items is taken into account
use Markov models
systems with discrete states and probabilistic
transitions
commonly used for pre-fetching pages in web
servers
Characteristics
high accuracy
but usually low coverage
few users get recommendations
sometimes this is OK

25
Example Frequent Itemsets
Sample Transactions
Frequent itemsets (using min. support frequency
4)
26
Example Sequential Patterns
Sample Transactions
CSP (min. support frequency 4)
SP (min. support frequency 4)
27
Example An Itemset Graph
Frequent Itemset Graph for the Example
Given an active session window ltB,Egt, the
algorithm finds items A and C with recommendation
scores of 1 and 4/5 (corresponding to confidences
of the rules B,E gt A and B,E gt C ).
28
Example Frequent Sequence Trie
Frequent Sequence Trie for the Example
Given an active session window ltA,Bgt, the
algorithm finds item E with recommendation score
of 1 (corresponding to confidences of the rules
A,B gt E .
29
Impact of Window Size

Increasing window sizes (using larger portion of
users history) generally leads to improvement in
precision

This example is based on the association rule
approach
30
Associations vs. Sequences

Comparison of recommendations based on
association rules, sequential patterns,
contiguous sequential patterns, and standard
k-nearest neighbor

Support threshold for Association, SP, CSP 0.04
31
Problems with Web Usage Mining

New item problem
Patterns will not capture new items recently
added
Bad for dynamic Web sites
Poor machine interpretability
Hard to generalize and reason about patterns
No domain knowledge used to enhance results
E.g., Knowing a user is interested in a program,
we could recommend the prerequisites, core or
popular courses in this program to the user
Poor insight into the patterns themselves
The nature of the relationships among items or
users in a pattern is not directly available

32
Roadmap