Crawling the Algorithmic Foundations of Recommendation Technologies - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Crawling the Algorithmic Foundations of Recommendation Technologies

Description:

Crawling the Algorithmic Foundations of Recommendation Technologies A presentation given in partial fulfillment of the requirements for the degree of – PowerPoint PPT presentation

Number of Views:99
Avg rating:3.0/5.0
Slides: 42
Provided by: Papa65
Category:

less

Transcript and Presenter's Notes

Title: Crawling the Algorithmic Foundations of Recommendation Technologies


1
Crawling the Algorithmic Foundations of
Recommendation Technologies
  • A presentation given in partial fulfillment of
    the requirements for the degree of
  • Master of Science

Manos Papagelis Computer Science
Department School of Sciences and
Engineering University of Crete
Institute of Computer Science Foundation of
Research and Technology, Hellas Heraklion,
Greece, March 24, 2005 Email papaggel_at_csd.uoc.gr
Supervisor Dimitris Plexousakis Associate
Professor
2
Presentation Outline
  • Part I Recommendation Algorithms
  • Part II Qualitative Analysis of Prediction
    Algorithms
  • Part III Addressing the Scalability Problem
  • Part IV Addressing the Sparsity Problem
  • Part V Conclusions

3
Recommendation Algorithms
  • Part I

4
Motivation Placement of the Research Topic
  • Motivation

Information Overload Need for Personalization
New Items, Books, Journals, Research Papers
TV Programs, Music CDs, Movie Titles
E-commerce products, Matchmaking and other
e-Services
Web pages, Usenet Articles, emails
  • Research Topic Placement

5
Introduction to Recommendation Systems
  • Recommendation Systems were developed to address
    two problems
  • Overwhelming numbers of on-topic documents
  • Filtering non-text documents mainly based on
    rating activity
  • Formulation of the Recommendation Problem
  • Estimation of user ratings to not seen items
    (Predictions)
  • Recommendation of the top-N predictions
  • Classification of Recommendation Algorithms
  • Content-based
  • Collaborative Filtering
  • Hybrid
  • Challenges and Limitations of Collaborative
    Filtering Methods
  • Scalability
  • Sparsity
  • Cold Start

6
Qualitative Analysis of Prediction Algorithms
  • Part II

7
Unfolding the Recommendation Process
  • Which items to recommend?
  • The list of N items with respect to the top N
    predictions
  • How could predictions be achieved?
  • Exploitation of other users activity
  • Which users activity to take up?
  • Those who share the same or relevant interests
  • it may be of benefit to ones search for
    information to consult the behavior of other
    users who share the same or relevant interests

i1 i2 i3
u1 2 ? ? 5 ? 6
i1 i2 i3
u1 2 - 4 5 - 6
u2
u3 5 2 - 2 - 9

?
?
?
Collaborative Filtering
8
Collaborative Filtering (CF)
Co-rated Items
How similar are they?
i1 i2 i3
u1 2 - 4 5 - 6
u2
u3 5 2 - 2 - 9
  • - Cosine Vector Similarity
  • - Spearman Correlation
  • - Mean-squared Difference
  • - Entropy-based Uncertainty
  • Pearson Correlation Coefficient

2 5 6
5 2 9
9
Rating Activity
Explicit Rating A rating that expresses the
preference of a user to a specific item Implicit
Rating Each explicit rating of a user to a
specific item implicitly identifies the users
preference to the categories that this item
belongs to
Example
Action
Result
r
Implicit
Explicit
item
user
CatA CatB CatC
user f(r) f(r) f(r)
item
user r
CatA
CatC
CatB
10
Similarity Measures
  • Distinctions
  • User-based vs. Item-based Similarity
  • Explicit Rating vs. Implicit Rating
  • Definition of three matrices
  • User-Item, User-Category, Item-Category Matrices

C1pos C1neg Cppos Cpneg

ux 2 - 4 - 6

uy 5 2 - - 9

C1 C2 Cx

ix 1 0 1

iy 1 0 0

ix iy

ux 2 - 4 5 -

uy 5 2 - 2 -

User-Item Matrix
User-Category Matrix
Item-Category Bitmap
  • User-based Similarity derived from
  • Explicit Ratings (kx,y)
  • Implicit Ratings (?x,y)
  • Item-based Similarity derived from
  • Explicit Ratings (µx,y)
  • Implicit Ratings (?x,y)

11
Prediction Algorithms
Prediction Average Adjustment
  • User-based Prediction algorithms
  • CFUB-ER Based on Explicit Ratings
  • CFUB-ER-CB Based on Explicit Ratings, Content
    Boosted
  • CFUB-IR Based on Implicit Ratings
  • Item-based Prediction algorithms
  • CFIB-ER Based on Explicit Ratings
  • CFIB-IR Based on Implicit Ratings

12
Experimental Evaluation Results
  • Data Set
  • 2100 ratings (range from 1 to 10), 115 users, 650
    items, 20 item categories
  • Sparsity 97
  • 300-item sample sets
  • Accuracy Metrics
  • Mean Absolute Error (MAE)
  • Receiver Operating Curve (ROC)

13
Mean Absolute Error (MAE)
We plot MAE vs. Sparsity
1,703
1,385
1, 35
1, 34
0,838
14
Receiver Operating Curve (ROC)
We plot True Positive Fraction vs. False
Positive Fraction
0,71
0,59
0,55
0,53
0,39
15
Addressing the Scalability Problem
  • Part III

16
The Scalability Challenge
  • Facts
  • Large numbers of users and items (e.g.
    Amazon.com)
  • CF requires expensive computations that grow with
    the number of items and users
  • Requirements
  • Need for quick formulation of recommendations
  • Need for immediate incorporation of new rating
    information
  • Need for preservation of CFs quality

17
Related Work
  • Clustering Approaches
  • -Breese et al. 1998, Ungar and Foster 1998
  • Dimensionality Reduction of the User-Item Matrix
  • -Sarwar et al. 2001
  • Data reduction or data focusing techniques
  • -Yu et al. 2002, Zeng et al. 2003
  • Offline Computations
  • -Linden et al. 2003

18
Intuitive
Rating Process
Recommendation Process
Classic Collaborative Filtering
Compute User-to- User Similarities
Request
Find Neighbors
Recommend High Rated Items
Response
Recommendation Engine
Our Method
Request
Find Neighbors
Recommend High Rated Items
Response
Recommendation Engine
19
Incremental Collaborative Filtering (ICF)
  • Classic Collaborative Filtering (Based on Pearson
    Correlation)

Number of co-rated items between ua and uy
Actual Rating of ua and uy to item ih
Average rating of ua and uy
  • Incremental Collaborative Filtering (ICF)
  • Key Idea Incremental computation B, C, D factors
    after each single rating

20
ICF Cases to be examined in the Rating Process
Case 1 Submission of a new rating
uy
uy
ua
ua


ia
ia
Item ia has been rated by user uy
Item ia has not been rated by user uy
Case 2 Update of an existing rating
uy
uy
ua
ua


ia
ia
Item ia has been rated by user uy
Item ia has not been rated by user uy
21
Caching
Computation of the factors that appear in
increments e, f, g
Factors Calculation
B, C, D Cached Information (For All pairs of Users)
m Cached Information (The number of items a user has rated)
Cached Information (The average ratings of all users)
Cached Information (For each pair of users, the sum of their ratings to co-rated items)
Active users new average rating Submission of a new rating Update of an existing rating
Via Interface
Difference of previous and active average Rating
Database query (The rating of the user uy to the item ia)
22
Complexity Issues
Worst-case and approximation complexities of
Classic CF and Incremental CF
Classic CF Classic CF Incremental CF Incremental CF
Worst Approximation Worst Approximation
Complexity for maintaining the Similarity Matrix O(m2n) O(mmn) O(mn) O(mn)
Complexity for Providing a recommendation to active user O(mn) O(mn)O(n) O(n) O(n)
Complexity for Providing a recommendation to active user Pre-computed Offline Pre-computed Offline O(n) O(n)
Complexity for Providing a recommendation to active user O(n) O(n) O(n) O(n)
  • m The number of users
  • n The number of items
  • mltltm The number of users with at least one
    co-rated item with the active user
  • nltltn The number of items that have not been
    rated by the active user and have been rated by
    at least one of its similar users
  • nltltn The number of co-rated items between the
    active user and another user

23
Experimental Evaluation of ICF
Evaluation metric Response Time in relation to
Accuracy
User-Item matrix Size Classic CF Classic CF Classic CF Incremental CF Incremental CF
User-Item matrix Size Samples (users) Time (sec) Accuracy () Time (sec) Accuracy ()
100 users x 100 items 10 0.17 22 0.045 100
100 users x 100 items 30 0.55 49.5 0.045 100
100 users x 100 items 50 0.765 67.5 0.045 100
100 users x 100 items 99 1.38 100 0.045 100
1000 users x 1000 items 100 6.81 26.7 0.46 100
1000 users x 1000 items 300 20 53.8 0.46 100
1000 users x 1000 items 500 33 66.8 0.46 100
1000 users x 1000 items 999 66 100 0.46 100
  • Remarks
  • Performance-accuracy tradeoff in Classic CF is
    confirmed
  • ICF proves to be highly scalable by retaining the
    best quality of CF
  • Performance of ICF grows linearly only with the
    number of items

24
Addressing the Sparsity Problem
  • Part IV

25
The Sparsity Challenge
  • Facts
  • Large number of users and items
  • Even active users result in rating only a
    fraction of items in db
  • It is possible that the similarity between two
    users cannot be defined
  • Negative impact on the effectiveness of CF
  • Requirements
  • Be able to define similarity between two users
  • Be able to recommend new and obscure items
  • Be able to recommend items to new users

26
Related Work
  • Use of profile information when calculating
    similarities (e.g. demographic filtering)
  • -Pazzani 1999
  • Dimensionality reduction (e.g. Singular Value
    Decomposition, Latent Semantic Indexing,
    Principle Component Analysis)
  • -Sarwar et al. 2000, Deerwester et al. 1990,
    Goldberg et al. 2001
  • Content-boosted Collaborative Filtering
  • -Melville et al. 2002
  • Item-based similarity
  • -Sarwar et. al. 2001, Popescul et al. 2001

27
Social Networks in RS
  • Underlying Social Networks in Recommendation
    Systems
  • Associations based on trust
  • Trust through user-to-user similarity (Pearson
    correlation)


ix iy

Ux rx,x rx,y

uy ry,x ry,y

Item Space
User Space
User-Item Matrix
28
Trust Inferences and Paths
  • Trust Inferences
  • are transitive associations between users in the
    underlying network
  • are sources of additional information for
    recommendation purposes
  • form trust paths between distant users

i1
i2
S
N
T
S
N1
T
S
N2
Trust Inferences
Trust Paths
Web of Trusts
Inferred Association
29
Confidence, Uncertainty and Subjectiveness
Confidence and Uncertainty in Trust Paths
User with the most Co-rated items
Number of co-rated Items
umax_conf
Uncertainty
u1

1


0.57
S
un-1
0.57
Confidence
0.43
u2
umax_conf
u1
u2
u3
un-1
Users
Confidence
Uncertainty
Subjectiveness
S
0.57

0.34
T

30
Managing Multiple Paths
  • Path Composition
  • Average Composition
  • Weighted Average Composition
  • Path Selection
  • Maximum Path Confidence
  • Minimum Mean Absolute Deviation

TS?T(pA)0.44 CS?T(pA)0.14
PA
T0.9 C0.4 n(IN1nIN2)5
T0.2 C0.5 n(IN2nIT)6
T0.5 C0.7 n(ISnIN1)8
N1
N2
S
T
T0.4 C0.7 n(ISnIN3)7
T0.6 C0.8 n(IN3nIT)3
N3
TS?T(pB)0.46 CS?T(pB)0.56
PB
Illustrating Example
31
Power-law Distribution of Users Ratings
32
Trust Inference Impact
33
Statistical Accuracy of Our Method (MAE)
34
Decision-support Accuracy of Our Method (ROC)
35
Conclusions
  • Part V

36
Extensions of Recommendation Technologies (1/2)
  • More Advanced Profiling Techniques
  • Currently rely on rating information
  • E.g. data mining rules, sequences, signatures to
    describe users interests
  • Adoption of advancements in mathematical
    approximation theory (e.g. radial basis
    functions)
  • Multidimensionality of Recommendations
  • Currently operates on the two-dimensional
    User-Item space
  • Need for contextual recommendations (taking into
    account time, conditions, etc.)
  • Multi-criteria Ratings
  • Need to incorporate ratings for a variety of
    criteria concerning a single item

37
Extensions of Recommendation Technologies (2/2)
  • Non-intrusiveness
  • Implicit Rating (e.g. time spent in a webpage),
    HCI issues
  • Flexibility in Integration of Recommendation
    Technologies
  • RECOMMEND Movie TO User
  • BASED ON Rating SHOW TOP 3
  • FROM MovieRecommender
  • WHERE Movie.Length gt 120 AND User.City
    Toronto
  • Effectiveness of Recommendations
  • Need for metrics that adequately capture
    usefulness and quality
  • Trustworthiness and Online Feedback Mechanisms
    Issues
  • Privacy issues

38
Conclusions and Discussion
  • Qualitative Analysis of user- and item-based
    prediction algorithms
  • Incremental Collaborative Filtering (ICF) to deal
    with Scalability
  • Trust Inferences to deal with Sparsity and
    Cold-start
  • Roadmap to Future Research Work

39
Published Work
  1. Papagelis, M. and Plexousakis, D. Recommendation
    Based Discovery of Dynamic Virtual Communities.
    In Short Paper Proceedings of the 15th Conference
    on Advanced Information Systems Engineering, 2003
  2. Papagelis, M. and Plexousakis, D. Qualitative
    Analysis of User-based and Item-based Prediction
    Algorithms for Recommendation Agents. Eighth
    International Workshop on Cooperative Information
    Agents, 2004
  3. Papagelis, M. and Plexousakis, D. Qualitative
    Analysis of User-based and Item-based Prediction
    Algorithms for Recommendation Agents. Journal of
    Engineering Applications of Artificial
    Intelligence, 18(4), June, 2005
  4. Papagelis, M., Plexousakis, Kutsuras, T. A method
    for alleviating the Sparsity Problem in
    Collaborative Filtering Using Trust Inferences.
    Proceedings of the 3rd International Conference
    on Trust Management, 2005
  5. Papagelis, M., Plexousakis, D., Rousidis, I., and
    Theoharopoulos, E. Qualitative Analysis of
    User-based and Item-based Prediction Algorithms
    for Recommendation Systems. 3rd Hellenic Data
    Management Symposium, 2004
  6. Papagelis, M., Rousidis, I., Plexousakis, D., and
    Theoharopoulos, E. Incremental Collaborative
    Filtering for Highly-Scalable Recommendation
    algorithms. 15th International Symposium on
    Methodologies of Intelligent Systems, 2005

40
Questions?
41
Thanks!
Write a Comment
User Comments (0)
About PowerShow.com