Crawling the Algorithmic Foundations of Recommendation Technologies - PowerPoint PPT Presentation

1 / 41

About This Presentation

Title:

Crawling the Algorithmic Foundations of Recommendation Technologies

Description:

Crawling the Algorithmic Foundations of Recommendation Technologies A presentation given in partial fulfillment of the requirements for the degree of – PowerPoint PPT presentation

Number of Views:99

Avg rating:3.0/5.0

Slides: 42

Provided by: Papa65

Category:

more less

Transcript and Presenter's Notes

Title: Crawling the Algorithmic Foundations of Recommendation Technologies

1
Crawling the Algorithmic Foundations of
Recommendation Technologies

A presentation given in partial fulfillment of
the requirements for the degree of
Master of Science

Manos Papagelis Computer Science
Department School of Sciences and
Engineering University of Crete
Institute of Computer Science Foundation of
Research and Technology, Hellas Heraklion,
Greece, March 24, 2005 Email papaggel_at_csd.uoc.gr
Supervisor Dimitris Plexousakis Associate
Professor
2
Presentation Outline

Part I Recommendation Algorithms
Part II Qualitative Analysis of Prediction
Algorithms
Part III Addressing the Scalability Problem
Part IV Addressing the Sparsity Problem
Part V Conclusions

3
Recommendation Algorithms

Part I

4
Motivation Placement of the Research Topic

Motivation

Information Overload Need for Personalization
New Items, Books, Journals, Research Papers
TV Programs, Music CDs, Movie Titles
E-commerce products, Matchmaking and other
e-Services
Web pages, Usenet Articles, emails

Research Topic Placement

5
Introduction to Recommendation Systems

Recommendation Systems were developed to address
two problems
Overwhelming numbers of on-topic documents
Filtering non-text documents mainly based on
rating activity
Formulation of the Recommendation Problem
Estimation of user ratings to not seen items
(Predictions)
Recommendation of the top-N predictions
Classification of Recommendation Algorithms
Content-based
Collaborative Filtering
Hybrid
Challenges and Limitations of Collaborative
Filtering Methods
Scalability
Sparsity
Cold Start

6
Qualitative Analysis of Prediction Algorithms

Part II

7
Unfolding the Recommendation Process

Which items to recommend?
The list of N items with respect to the top N
predictions
How could predictions be achieved?
Exploitation of other users activity
Which users activity to take up?
Those who share the same or relevant interests
it may be of benefit to ones search for
information to consult the behavior of other
users who share the same or relevant interests

i1 i2 i3
u1 2 ? ? 5 ? 6
i1 i2 i3
u1 2 - 4 5 - 6
u2
u3 5 2 - 2 - 9

?
?
?
Collaborative Filtering
8
Collaborative Filtering (CF)
Co-rated Items
How similar are they?
i1 i2 i3
u1 2 - 4 5 - 6
u2
u3 5 2 - 2 - 9

- Cosine Vector Similarity
- Spearman Correlation
- Mean-squared Difference
- Entropy-based Uncertainty
Pearson Correlation Coefficient

2 5 6
5 2 9
9
Rating Activity
Explicit Rating A rating that expresses the
preference of a user to a specific item Implicit
Rating Each explicit rating of a user to a
specific item implicitly identifies the users
preference to the categories that this item
belongs to
Example
Action
Result
r
Implicit
Explicit
item
user
CatA CatB CatC
user f(r) f(r) f(r)
item
user r
CatA
CatC
CatB
10
Similarity Measures

Distinctions
User-based vs. Item-based Similarity
Explicit Rating vs. Implicit Rating
Definition of three matrices
User-Item, User-Category, Item-Category Matrices

C1pos C1neg Cppos Cpneg

ux 2 - 4 - 6

uy 5 2 - - 9

C1 C2 Cx

ix 1 0 1

iy 1 0 0

ix iy

ux 2 - 4 5 -

uy 5 2 - 2 -

User-Item Matrix
User-Category Matrix
Item-Category Bitmap

User-based Similarity derived from
Explicit Ratings (kx,y)
Implicit Ratings (?x,y)

Item-based Similarity derived from
Explicit Ratings (µx,y)
Implicit Ratings (?x,y)

11
Prediction Algorithms
Prediction Average Adjustment

User-based Prediction algorithms
CFUB-ER Based on Explicit Ratings
CFUB-ER-CB Based on Explicit Ratings, Content
Boosted
CFUB-IR Based on Implicit Ratings
Item-based Prediction algorithms
CFIB-ER Based on Explicit Ratings
CFIB-IR Based on Implicit Ratings

12
Experimental Evaluation Results

Data Set
2100 ratings (range from 1 to 10), 115 users, 650
items, 20 item categories
Sparsity 97
300-item sample sets
Accuracy Metrics
Mean Absolute Error (MAE)
Receiver Operating Curve (ROC)

13
Mean Absolute Error (MAE)
We plot MAE vs. Sparsity
1,703
1,385
1, 35
1, 34
0,838
14
Receiver Operating Curve (ROC)
We plot True Positive Fraction vs. False
Positive Fraction
0,71
0,59
0,55
0,53
0,39
15
Addressing the Scalability Problem

Part III

16
The Scalability Challenge

Facts
Large numbers of users and items (e.g.
Amazon.com)
CF requires expensive computations that grow with
the number of items and users
Requirements
Need for quick formulation of recommendations
Need for immediate incorporation of new rating
information
Need for preservation of CFs quality

17
Related Work

Clustering Approaches
-Breese et al. 1998, Ungar and Foster 1998
Dimensionality Reduction of the User-Item Matrix
-Sarwar et al. 2001
Data reduction or data focusing techniques
-Yu et al. 2002, Zeng et al. 2003
Offline Computations
-Linden et al. 2003

18
Intuitive
Rating Process
Recommendation Process
Classic Collaborative Filtering
Compute User-to- User Similarities
Request
Find Neighbors
Recommend High Rated Items
Response
Recommendation Engine
Our Method
Request
Find Neighbors
Recommend High Rated Items
Response
Recommendation Engine
19
Incremental Collaborative Filtering (ICF)

Classic Collaborative Filtering (Based on Pearson
Correlation)

Number of co-rated items between ua and uy
Actual Rating of ua and uy to item ih
Average rating of ua and uy

Incremental Collaborative Filtering (ICF)

Key Idea Incremental computation B, C, D factors
after each single rating

20
ICF Cases to be examined in the Rating Process
Case 1 Submission of a new rating
uy
uy
ua
ua

ia
ia
Item ia has been rated by user uy
Item ia has not been rated by user uy
Case 2 Update of an existing rating
uy
uy
ua
ua

ia
ia
Item ia has been rated by user uy
Item ia has not been rated by user uy
21
Caching
Computation of the factors that appear in
increments e, f, g
Factors Calculation
B, C, D Cached Information (For All pairs of Users)
m Cached Information (The number of items a user has rated)
Cached Information (The average ratings of all users)
Cached Information (For each pair of users, the sum of their ratings to co-rated items)
Active users new average rating Submission of a new rating Update of an existing rating
Via Interface
Difference of previous and active average Rating
Database query (The rating of the user uy to the item ia)
22
Complexity Issues
Worst-case and approximation complexities of
Classic CF and Incremental CF
Classic CF Classic CF Incremental CF Incremental CF
Worst Approximation Worst Approximation
Complexity for maintaining the Similarity Matrix O(m2n) O(mmn) O(mn) O(mn)
Complexity for Providing a recommendation to active user O(mn) O(mn)O(n) O(n) O(n)
Complexity for Providing a recommendation to active user Pre-computed Offline Pre-computed Offline O(n) O(n)
Complexity for Providing a recommendation to active user O(n) O(n) O(n) O(n)

m The number of users
n The number of items
mltltm The number of users with at least one
co-rated item with the active user
nltltn The number of items that have not been
rated by the active user and have been rated by
at least one of its similar users
nltltn The number of co-rated items between the
active user and another user

23
Experimental Evaluation of ICF
Evaluation metric Response Time in relation to
Accuracy
User-Item matrix Size Classic CF Classic CF Classic CF Incremental CF Incremental CF
User-Item matrix Size Samples (users) Time (sec) Accuracy () Time (sec) Accuracy ()
100 users x 100 items 10 0.17 22 0.045 100
100 users x 100 items 30 0.55 49.5 0.045 100
100 users x 100 items 50 0.765 67.5 0.045 100
100 users x 100 items 99 1.38 100 0.045 100
1000 users x 1000 items 100 6.81 26.7 0.46 100
1000 users x 1000 items 300 20 53.8 0.46 100
1000 users x 1000 items 500 33 66.8 0.46 100
1000 users x 1000 items 999 66 100 0.46 100

Remarks
Performance-accuracy tradeoff in Classic CF is
confirmed
ICF proves to be highly scalable by retaining the
best quality of CF
Performance of ICF grows linearly only with the
number of items

24
Addressing the Sparsity Problem

Part IV

25
The Sparsity Challenge

Facts
Large number of users and items
Even active users result in rating only a
fraction of items in db
It is possible that the similarity between two
users cannot be defined
Negative impact on the effectiveness of CF
Requirements
Be able to define similarity between two users
Be able to recommend new and obscure items
Be able to recommend items to new users

26
Related Work

Use of profile information when calculating
similarities (e.g. demographic filtering)
-Pazzani 1999
Dimensionality reduction (e.g. Singular Value
Decomposition, Latent Semantic Indexing,
Principle Component Analysis)
-Sarwar et al. 2000, Deerwester et al. 1990,
Goldberg et al. 2001
Content-boosted Collaborative Filtering
-Melville et al. 2002
Item-based similarity
-Sarwar et. al. 2001, Popescul et al. 2001

27
Social Networks in RS

Underlying Social Networks in Recommendation
Systems
Associations based on trust
Trust through user-to-user similarity (Pearson
correlation)

ix iy

Ux rx,x rx,y

uy ry,x ry,y

Item Space
User Space
User-Item Matrix
28
Trust Inferences and Paths

Trust Inferences
are transitive associations between users in the
underlying network
are sources of additional information for
recommendation purposes
form trust paths between distant users

i1
i2
S
N
T
S
N1
T
S
N2
Trust Inferences
Trust Paths
Web of Trusts
Inferred Association
29
Confidence, Uncertainty and Subjectiveness
Confidence and Uncertainty in Trust Paths
User with the most Co-rated items
Number of co-rated Items
umax_conf
Uncertainty
u1

1

0.57
S
un-1
0.57
Confidence
0.43
u2
umax_conf
u1
u2
u3
un-1
Users
Confidence
Uncertainty
Subjectiveness
S
0.57

0.34
T

30
Managing Multiple Paths

Path Composition
Average Composition
Weighted Average Composition
Path Selection
Maximum Path Confidence
Minimum Mean Absolute Deviation

TS?T(pA)0.44 CS?T(pA)0.14
PA
T0.9 C0.4 n(IN1nIN2)5
T0.2 C0.5 n(IN2nIT)6
T0.5 C0.7 n(ISnIN1)8
N1
N2
S
T
T0.4 C0.7 n(ISnIN3)7
T0.6 C0.8 n(IN3nIT)3
N3
TS?T(pB)0.46 CS?T(pB)0.56
PB
Illustrating Example
31
Power-law Distribution of Users Ratings
32
Trust Inference Impact
33
Statistical Accuracy of Our Method (MAE)
34
Decision-support Accuracy of Our Method (ROC)
35
Conclusions

Part V

36
Extensions of Recommendation Technologies (1/2)

More Advanced Profiling Techniques
Currently rely on rating information
E.g. data mining rules, sequences, signatures to
describe users interests
Adoption of advancements in mathematical
approximation theory (e.g. radial basis
functions)
Multidimensionality of Recommendations
Currently operates on the two-dimensional
User-Item space
Need for contextual recommendations (taking into
account time, conditions, etc.)
Multi-criteria Ratings
Need to incorporate ratings for a variety of
criteria concerning a single item

37
Extensions of Recommendation Technologies (2/2)

Non-intrusiveness
Implicit Rating (e.g. time spent in a webpage),
HCI issues
Flexibility in Integration of Recommendation
Technologies
RECOMMEND Movie TO User
BASED ON Rating SHOW TOP 3
FROM MovieRecommender
WHERE Movie.Length gt 120 AND User.City
Toronto
Effectiveness of Recommendations
Need for metrics that adequately capture
usefulness and quality
Trustworthiness and Online Feedback Mechanisms
Issues
Privacy issues

38
Conclusions and Discussion

Qualitative Analysis of user- and item-based
prediction algorithms
Incremental Collaborative Filtering (ICF) to deal
with Scalability
Trust Inferences to deal with Sparsity and
Cold-start
Roadmap to Future Research Work

39
Published Work

Papagelis, M. and Plexousakis, D. Recommendation
Based Discovery of Dynamic Virtual Communities.
In Short Paper Proceedings of the 15th Conference
on Advanced Information Systems Engineering, 2003
Papagelis, M. and Plexousakis, D. Qualitative
Analysis of User-based and Item-based Prediction
Algorithms for Recommendation Agents. Eighth
International Workshop on Cooperative Information
Agents, 2004
Papagelis, M. and Plexousakis, D. Qualitative
Analysis of User-based and Item-based Prediction
Algorithms for Recommendation Agents. Journal of
Engineering Applications of Artificial
Intelligence, 18(4), June, 2005
Papagelis, M., Plexousakis, Kutsuras, T. A method
for alleviating the Sparsity Problem in
Collaborative Filtering Using Trust Inferences.
Proceedings of the 3rd International Conference
on Trust Management, 2005
Papagelis, M., Plexousakis, D., Rousidis, I., and
Theoharopoulos, E. Qualitative Analysis of
User-based and Item-based Prediction Algorithms
for Recommendation Systems. 3rd Hellenic Data
Management Symposium, 2004
Papagelis, M., Rousidis, I., Plexousakis, D., and
Theoharopoulos, E. Incremental Collaborative
Filtering for Highly-Scalable Recommendation
algorithms. 15th International Symposium on
Methodologies of Intelligent Systems, 2005