Recommender Systems Session E - PowerPoint PPT Presentation

About This Presentation
Title:

Recommender Systems Session E

Description:

profile matching score (similarity to session window) and ... find the longest rule with at least one non-matching entry. sort by confidence ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 33
Provided by: CTI79
Category:

less

Transcript and Presenter's Notes

Title: Recommender Systems Session E


1
Recommender SystemsSession E
  • Robin Burke
  • DePaul University
  • Chicago, IL

2
Roadmap
  • Session A Basic Techniques I
  • Introduction
  • Knowledge Sources
  • Recommendation Types
  • Collaborative Recommendation
  • Session B Basic Techniques II
  • Content-based Recommendation
  • Knowledge-based Recommendation
  • Session C Domains and Implementation I
  • Recommendation domains
  • Example Implementation
  • Lab I
  • Session D Evaluation I
  • Evaluation
  • Session E Applications
  • User Interaction
  • Web Personalization
  • Session F Implementation II
  • Lab II

3
Web Personalization
  • A subtype of recommendation
  • personalizing the browsing experience of a user
    by dynamically tailoring the look, feel, and
    content of a Web site to the users needs and
    interests.
  • Items
  • web pages
  • Data
  • instead of ratings
  • log data from web sites

4
Raw data
5
Challenges and Pitfalls
  • Technical Challenges
  • data collection and data preprocessing
  • defining actionable knowledge
  • choosing personalization algorithms
  • Implementation/Deployment Challenges
  • what to personalize
  • when to personalize
  • degree of personalization or customization
  • how to target information without being intrusive

6
Usage-Based Profiles
  • Characteristics
  • implicit ratings
  • preferences inferred from actions
  • passive stance
  • system takes no activity to gather more rating
    information
  • anonymity
  • many users will be anonymous
  • cannot rely on long-term identity
  • data characteristics
  • usually more voluminous than explicit rating data
  • more noise than explicit rating data

7
Heuristic Preference Indicators
  • How to know if user "likes" something?
  • basic heuristic length of time spent
  • the longer the more liked
  • What if the user takes a phone call?
  • threshold
  • What if the page is at the end of the session?
  • either ignore
  • possibly lose crucial information
  • or consider positive
  • significant noise in failed interactions
  • What if the page is really short?
  • normalize on page length
  • What if the back button doesn't reload page?
  • infer session activity
  • What if the "user" is actually a web crawler?
  • recognize pattern of behavior

8
Problems
  • Cold start
  • cannot count on long-term profiling
  • must be able to make predictions with short
    profiles
  • Covert model
  • user has no direct input
  • can't express interests directly
  • even if they are willing to
  • Sparsity
  • much worse than for explicit profiles
  • many more users
  • many more items (pages)
  • profiles much shorter
  • "Hidden web"
  • many pages are produced by database queries
  • page differences not reflected in the log
  • can't recommend indistinguishable items

9
Advantages
  • No explicit user ratings or interaction with
    users
  • Helps preserve user privacy, by making effective
    use of anonymous data
  • Large user base makes CF effective
  • if we can implement it scalably
  • Content-based / knowledge-based approaches hard
    to use on web data

10
Web Personalization Process
  • Generate aggregate user models
  • not single users but clusters of similar users
  • guess why?
  • off-line process
  • Steps
  • Clustering user transactions
  • Clustering items / pageviews
  • Association rule mining
  • Sequential pattern discovery
  • Provide recommendation
  • on-line
  • match a users active session
  • to provide dynamic content

11
Off-line Process
12
On-line Process
13
Representation
Pageview/objects
Session/user data
Raw weights are usually based on time spent on a
page, but in practice, need to normalize and
transform.
14
Clustering
  • k-means clustering algorithm
  • specify a number of clusters
  • system finds an arrangement of items that
    minimizes the mean distance within the items in
    each cluster is minimized
  • for each cluster
  • think of each items as a vector
  • generate the centroid

15
Clustering
  • Transaction clusters as Aggregate Profiles
  • Each transaction is viewed as a pageview vector
  • Each cluster contains a set of transaction
    vectors with a centroid
  • Each centroid acts as an aggregate profile with
    representing the weight for pageview pi in
    the profile
  • Compute similarity between a current users
    profile (or the active user session) and the
    cluster centroids

16
Recommendation Algorithm
  • Keep track of users navigational history through
    the site
  • a fixed-size sliding window over the active
    session to capture the current users
    short-term history depth
  • Match current users activity against the
    discovered profiles
  • profiles either can be based on aggregate usage
    profiles, or are obtained directly from
    association rules or sequential patterns
  • Dynamically generated recommendations are added
    to the returned page
  • each pageview can be assigned a recommendation
    score based on
  • matching score to user profiles (e.g., aggregate
    usage profiles)
  • information value of the pageview based on
    domain knowledge (e.g., link distance of the
    candidate recommendation to the active session)

17
Matching Sessions
  • Matching score computed using cosine similarity
  • Users active session (pageviews in the current
    window) is compared to each aggregate profile
    (both are viewed as pageview vectors)
  • Weight of items in the profile vector is the
    significance weight of the item for that profile
  • Weight of items in the session vector can be all
    1s, or based on some method for determining
    their significance in the current session

18
Recommendations
  • from each matching profile
  • recommend the items not already in the user
    session window, and
  • not directly linked from the pages in the current
    session window
  • the recommendation score for an item is based on
  • profile matching score (similarity to session
    window) and
  • the weight of the item in that profile
  • can include novelty
  • weight items farther away from the current
    location of user higher

19
Example
Sample cluster centroid from dept. Web site
(cluster size 330)
20
Using Clusters for Personalization
Original Session/user data
Given an active session A ? B, the best matching
profile is Profile 1. This may result in a
recommendation for page F.html, since it appears
with high weight in that profile.
Result of Clustering
PROFILE 0 (Cluster Size 3) ---------------------
----------------- 1.00 C.html 1.00 D.html PROFILE
1 (Cluster Size 4) ----------------------------
---------- 1.00 B.html 1.00 F.html 0.75 A.html 0.2
5 C.html PROFILE 2 (Cluster Size
3) -------------------------------------- 1.00 A.h
tml 1.00 D.html 1.00 E.html 0.33 C.html
21
Association Rules
  • An alternative to clusters is to build
    association rules
  • Association rule
  • a tuple lti1, i2, .., ikgt
  • all of which appear together with some frequency
  • Can be used for prediction
  • if a user has seen
  • lti1, i2, ...ik-1gt then predict ik
  • lti2, i3, ...ikgt then predict i1

22
Learning Association Rules
  • Multiple passes through a database of
    transactions
  • 1st time we collect all items that occur with a
    certain frequency
  • 2nd time we collect all pairs
  • both items must be in the set above
  • 3rd time collect all triples
  • until there are none left

23
Recommending with Association Rules
  • Keep track of users navigational history through
    the site
  • Match current users activity against the
    association rules
  • find the longest rule with at least one
    non-matching entry
  • sort by confidence
  • predict the entry or entries not in the user's
    profile
  • Dynamically generated recommendations are added
    to the returned page

24
Sequential methods
  • Sequential patterns as profiles
  • similar to association rules, but the ordering of
    accessed items is taken into account
  • use Markov models
  • systems with discrete states and probabilistic
    transitions
  • commonly used for pre-fetching pages in web
    servers
  • Characteristics
  • high accuracy
  • but usually low coverage
  • few users get recommendations
  • sometimes this is OK

25
Example Frequent Itemsets
Sample Transactions
Frequent itemsets (using min. support frequency
4)
26
Example Sequential Patterns
Sample Transactions
CSP (min. support frequency 4)
SP (min. support frequency 4)
27
Example An Itemset Graph
Frequent Itemset Graph for the Example
Given an active session window ltB,Egt, the
algorithm finds items A and C with recommendation
scores of 1 and 4/5 (corresponding to confidences
of the rules B,E gt A and B,E gt C ).
28
Example Frequent Sequence Trie
Frequent Sequence Trie for the Example
Given an active session window ltA,Bgt, the
algorithm finds item E with recommendation score
of 1 (corresponding to confidences of the rules
A,B gt E .
29
Impact of Window Size
  • Increasing window sizes (using larger portion of
    users history) generally leads to improvement in
    precision

This example is based on the association rule
approach
30
Associations vs. Sequences
  • Comparison of recommendations based on
    association rules, sequential patterns,
    contiguous sequential patterns, and standard
    k-nearest neighbor

Support threshold for Association, SP, CSP 0.04
31
Problems with Web Usage Mining
  • New item problem
  • Patterns will not capture new items recently
    added
  • Bad for dynamic Web sites
  • Poor machine interpretability
  • Hard to generalize and reason about patterns
  • No domain knowledge used to enhance results
  • E.g., Knowing a user is interested in a program,
    we could recommend the prerequisites, core or
    popular courses in this program to the user
  • Poor insight into the patterns themselves
  • The nature of the relationships among items or
    users in a pattern is not directly available

32
Roadmap
  • Session A Basic Techniques I
  • Introduction
  • Knowledge Sources
  • Recommendation Types
  • Collaborative Recommendation
  • Session B Basic Techniques II
  • Content-based Recommendation
  • Knowledge-based Recommendation
  • Session C Domains and Implementation I
  • Recommendation domains
  • Example Implementation
  • Lab I
  • Session D Evaluation I
  • Evaluation
  • Session E Applications
  • User Interaction
  • Web Personalization
  • Session F Implementation II
  • Lab II
Write a Comment
User Comments (0)
About PowerShow.com