Title: Crawling the Algorithmic Foundations of Recommendation Technologies
1Crawling the Algorithmic Foundations of
Recommendation Technologies
- A presentation given in partial fulfillment of
the requirements for the degree of - Master of Science
Manos Papagelis Computer Science
Department School of Sciences and
Engineering University of Crete
Institute of Computer Science Foundation of
Research and Technology, Hellas Heraklion,
Greece, March 24, 2005 Email papaggel_at_csd.uoc.gr
Supervisor Dimitris Plexousakis Associate
Professor
2Presentation Outline
- Part I Recommendation Algorithms
- Part II Qualitative Analysis of Prediction
Algorithms - Part III Addressing the Scalability Problem
- Part IV Addressing the Sparsity Problem
- Part V Conclusions
3Recommendation Algorithms
4Motivation Placement of the Research Topic
Information Overload Need for Personalization
New Items, Books, Journals, Research Papers
TV Programs, Music CDs, Movie Titles
E-commerce products, Matchmaking and other
e-Services
Web pages, Usenet Articles, emails
5Introduction to Recommendation Systems
- Recommendation Systems were developed to address
two problems - Overwhelming numbers of on-topic documents
- Filtering non-text documents mainly based on
rating activity - Formulation of the Recommendation Problem
- Estimation of user ratings to not seen items
(Predictions) - Recommendation of the top-N predictions
- Classification of Recommendation Algorithms
- Content-based
- Collaborative Filtering
- Hybrid
- Challenges and Limitations of Collaborative
Filtering Methods - Scalability
- Sparsity
- Cold Start
6Qualitative Analysis of Prediction Algorithms
7Unfolding the Recommendation Process
- Which items to recommend?
- The list of N items with respect to the top N
predictions - How could predictions be achieved?
- Exploitation of other users activity
- Which users activity to take up?
- Those who share the same or relevant interests
- it may be of benefit to ones search for
information to consult the behavior of other
users who share the same or relevant interests
i1 i2 i3
u1 2 ? ? 5 ? 6
i1 i2 i3
u1 2 - 4 5 - 6
u2
u3 5 2 - 2 - 9
?
?
?
Collaborative Filtering
8Collaborative Filtering (CF)
Co-rated Items
How similar are they?
i1 i2 i3
u1 2 - 4 5 - 6
u2
u3 5 2 - 2 - 9
- - Cosine Vector Similarity
- - Spearman Correlation
- - Mean-squared Difference
- - Entropy-based Uncertainty
- Pearson Correlation Coefficient
2 5 6
5 2 9
9Rating Activity
Explicit Rating A rating that expresses the
preference of a user to a specific item Implicit
Rating Each explicit rating of a user to a
specific item implicitly identifies the users
preference to the categories that this item
belongs to
Example
Action
Result
r
Implicit
Explicit
item
user
CatA CatB CatC
user f(r) f(r) f(r)
item
user r
CatA
CatC
CatB
10Similarity Measures
- Distinctions
- User-based vs. Item-based Similarity
- Explicit Rating vs. Implicit Rating
- Definition of three matrices
- User-Item, User-Category, Item-Category Matrices
C1pos C1neg Cppos Cpneg
ux 2 - 4 - 6
uy 5 2 - - 9
C1 C2 Cx
ix 1 0 1
iy 1 0 0
ix iy
ux 2 - 4 5 -
uy 5 2 - 2 -
User-Item Matrix
User-Category Matrix
Item-Category Bitmap
- User-based Similarity derived from
- Explicit Ratings (kx,y)
- Implicit Ratings (?x,y)
- Item-based Similarity derived from
- Explicit Ratings (µx,y)
- Implicit Ratings (?x,y)
11Prediction Algorithms
Prediction Average Adjustment
- User-based Prediction algorithms
- CFUB-ER Based on Explicit Ratings
- CFUB-ER-CB Based on Explicit Ratings, Content
Boosted - CFUB-IR Based on Implicit Ratings
- Item-based Prediction algorithms
- CFIB-ER Based on Explicit Ratings
- CFIB-IR Based on Implicit Ratings
12Experimental Evaluation Results
- Data Set
- 2100 ratings (range from 1 to 10), 115 users, 650
items, 20 item categories - Sparsity 97
- 300-item sample sets
- Accuracy Metrics
- Mean Absolute Error (MAE)
- Receiver Operating Curve (ROC)
13Mean Absolute Error (MAE)
We plot MAE vs. Sparsity
1,703
1,385
1, 35
1, 34
0,838
14Receiver Operating Curve (ROC)
We plot True Positive Fraction vs. False
Positive Fraction
0,71
0,59
0,55
0,53
0,39
15Addressing the Scalability Problem
16The Scalability Challenge
- Facts
- Large numbers of users and items (e.g.
Amazon.com) - CF requires expensive computations that grow with
the number of items and users - Requirements
- Need for quick formulation of recommendations
- Need for immediate incorporation of new rating
information - Need for preservation of CFs quality
17Related Work
- Clustering Approaches
- -Breese et al. 1998, Ungar and Foster 1998
- Dimensionality Reduction of the User-Item Matrix
- -Sarwar et al. 2001
- Data reduction or data focusing techniques
- -Yu et al. 2002, Zeng et al. 2003
- Offline Computations
- -Linden et al. 2003
18Intuitive
Rating Process
Recommendation Process
Classic Collaborative Filtering
Compute User-to- User Similarities
Request
Find Neighbors
Recommend High Rated Items
Response
Recommendation Engine
Our Method
Request
Find Neighbors
Recommend High Rated Items
Response
Recommendation Engine
19Incremental Collaborative Filtering (ICF)
- Classic Collaborative Filtering (Based on Pearson
Correlation)
Number of co-rated items between ua and uy
Actual Rating of ua and uy to item ih
Average rating of ua and uy
- Incremental Collaborative Filtering (ICF)
- Key Idea Incremental computation B, C, D factors
after each single rating
20ICF Cases to be examined in the Rating Process
Case 1 Submission of a new rating
uy
uy
ua
ua
ia
ia
Item ia has been rated by user uy
Item ia has not been rated by user uy
Case 2 Update of an existing rating
uy
uy
ua
ua
ia
ia
Item ia has been rated by user uy
Item ia has not been rated by user uy
21Caching
Computation of the factors that appear in
increments e, f, g
Factors Calculation
B, C, D Cached Information (For All pairs of Users)
m Cached Information (The number of items a user has rated)
Cached Information (The average ratings of all users)
Cached Information (For each pair of users, the sum of their ratings to co-rated items)
Active users new average rating Submission of a new rating Update of an existing rating
Via Interface
Difference of previous and active average Rating
Database query (The rating of the user uy to the item ia)
22Complexity Issues
Worst-case and approximation complexities of
Classic CF and Incremental CF
Classic CF Classic CF Incremental CF Incremental CF
Worst Approximation Worst Approximation
Complexity for maintaining the Similarity Matrix O(m2n) O(mmn) O(mn) O(mn)
Complexity for Providing a recommendation to active user O(mn) O(mn)O(n) O(n) O(n)
Complexity for Providing a recommendation to active user Pre-computed Offline Pre-computed Offline O(n) O(n)
Complexity for Providing a recommendation to active user O(n) O(n) O(n) O(n)
- m The number of users
- n The number of items
- mltltm The number of users with at least one
co-rated item with the active user - nltltn The number of items that have not been
rated by the active user and have been rated by
at least one of its similar users - nltltn The number of co-rated items between the
active user and another user
23Experimental Evaluation of ICF
Evaluation metric Response Time in relation to
Accuracy
User-Item matrix Size Classic CF Classic CF Classic CF Incremental CF Incremental CF
User-Item matrix Size Samples (users) Time (sec) Accuracy () Time (sec) Accuracy ()
100 users x 100 items 10 0.17 22 0.045 100
100 users x 100 items 30 0.55 49.5 0.045 100
100 users x 100 items 50 0.765 67.5 0.045 100
100 users x 100 items 99 1.38 100 0.045 100
1000 users x 1000 items 100 6.81 26.7 0.46 100
1000 users x 1000 items 300 20 53.8 0.46 100
1000 users x 1000 items 500 33 66.8 0.46 100
1000 users x 1000 items 999 66 100 0.46 100
- Remarks
- Performance-accuracy tradeoff in Classic CF is
confirmed - ICF proves to be highly scalable by retaining the
best quality of CF - Performance of ICF grows linearly only with the
number of items
24Addressing the Sparsity Problem
25The Sparsity Challenge
- Facts
- Large number of users and items
- Even active users result in rating only a
fraction of items in db - It is possible that the similarity between two
users cannot be defined - Negative impact on the effectiveness of CF
- Requirements
- Be able to define similarity between two users
- Be able to recommend new and obscure items
- Be able to recommend items to new users
26Related Work
- Use of profile information when calculating
similarities (e.g. demographic filtering) - -Pazzani 1999
- Dimensionality reduction (e.g. Singular Value
Decomposition, Latent Semantic Indexing,
Principle Component Analysis) - -Sarwar et al. 2000, Deerwester et al. 1990,
Goldberg et al. 2001 - Content-boosted Collaborative Filtering
- -Melville et al. 2002
- Item-based similarity
- -Sarwar et. al. 2001, Popescul et al. 2001
27Social Networks in RS
- Underlying Social Networks in Recommendation
Systems - Associations based on trust
- Trust through user-to-user similarity (Pearson
correlation)
ix iy
Ux rx,x rx,y
uy ry,x ry,y
Item Space
User Space
User-Item Matrix
28Trust Inferences and Paths
- Trust Inferences
- are transitive associations between users in the
underlying network - are sources of additional information for
recommendation purposes - form trust paths between distant users
i1
i2
S
N
T
S
N1
T
S
N2
Trust Inferences
Trust Paths
Web of Trusts
Inferred Association
29Confidence, Uncertainty and Subjectiveness
Confidence and Uncertainty in Trust Paths
User with the most Co-rated items
Number of co-rated Items
umax_conf
Uncertainty
u1
1
0.57
S
un-1
0.57
Confidence
0.43
u2
umax_conf
u1
u2
u3
un-1
Users
Confidence
Uncertainty
Subjectiveness
S
0.57
0.34
T
30Managing Multiple Paths
- Path Composition
- Average Composition
- Weighted Average Composition
- Path Selection
- Maximum Path Confidence
- Minimum Mean Absolute Deviation
TS?T(pA)0.44 CS?T(pA)0.14
PA
T0.9 C0.4 n(IN1nIN2)5
T0.2 C0.5 n(IN2nIT)6
T0.5 C0.7 n(ISnIN1)8
N1
N2
S
T
T0.4 C0.7 n(ISnIN3)7
T0.6 C0.8 n(IN3nIT)3
N3
TS?T(pB)0.46 CS?T(pB)0.56
PB
Illustrating Example
31Power-law Distribution of Users Ratings
32Trust Inference Impact
33Statistical Accuracy of Our Method (MAE)
34Decision-support Accuracy of Our Method (ROC)
35Conclusions
36Extensions of Recommendation Technologies (1/2)
- More Advanced Profiling Techniques
- Currently rely on rating information
- E.g. data mining rules, sequences, signatures to
describe users interests - Adoption of advancements in mathematical
approximation theory (e.g. radial basis
functions) - Multidimensionality of Recommendations
- Currently operates on the two-dimensional
User-Item space - Need for contextual recommendations (taking into
account time, conditions, etc.) - Multi-criteria Ratings
- Need to incorporate ratings for a variety of
criteria concerning a single item
37Extensions of Recommendation Technologies (2/2)
- Non-intrusiveness
- Implicit Rating (e.g. time spent in a webpage),
HCI issues - Flexibility in Integration of Recommendation
Technologies - RECOMMEND Movie TO User
- BASED ON Rating SHOW TOP 3
- FROM MovieRecommender
- WHERE Movie.Length gt 120 AND User.City
Toronto - Effectiveness of Recommendations
- Need for metrics that adequately capture
usefulness and quality - Trustworthiness and Online Feedback Mechanisms
Issues - Privacy issues
38Conclusions and Discussion
- Qualitative Analysis of user- and item-based
prediction algorithms - Incremental Collaborative Filtering (ICF) to deal
with Scalability - Trust Inferences to deal with Sparsity and
Cold-start - Roadmap to Future Research Work
39Published Work
- Papagelis, M. and Plexousakis, D. Recommendation
Based Discovery of Dynamic Virtual Communities.
In Short Paper Proceedings of the 15th Conference
on Advanced Information Systems Engineering, 2003 - Papagelis, M. and Plexousakis, D. Qualitative
Analysis of User-based and Item-based Prediction
Algorithms for Recommendation Agents. Eighth
International Workshop on Cooperative Information
Agents, 2004 - Papagelis, M. and Plexousakis, D. Qualitative
Analysis of User-based and Item-based Prediction
Algorithms for Recommendation Agents. Journal of
Engineering Applications of Artificial
Intelligence, 18(4), June, 2005 - Papagelis, M., Plexousakis, Kutsuras, T. A method
for alleviating the Sparsity Problem in
Collaborative Filtering Using Trust Inferences.
Proceedings of the 3rd International Conference
on Trust Management, 2005 - Papagelis, M., Plexousakis, D., Rousidis, I., and
Theoharopoulos, E. Qualitative Analysis of
User-based and Item-based Prediction Algorithms
for Recommendation Systems. 3rd Hellenic Data
Management Symposium, 2004 - Papagelis, M., Rousidis, I., Plexousakis, D., and
Theoharopoulos, E. Incremental Collaborative
Filtering for Highly-Scalable Recommendation
algorithms. 15th International Symposium on
Methodologies of Intelligent Systems, 2005
40Questions?
41Thanks!