Survey of Recommendation Systems and Algorithms - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Survey of Recommendation Systems and Algorithms

Description:

free annotations and explicit 'like it'or 'hate it' annotations ... Weights are calculated based on: Mean Squared Differences. Correlation (Pearson Correlation) ... – PowerPoint PPT presentation

Number of Views:166
Avg rating:3.0/5.0
Slides: 33
Provided by: hil766
Category:

less

Transcript and Presenter's Notes

Title: Survey of Recommendation Systems and Algorithms


1
Survey of Recommendation Systems and Algorithms
Yuan Qu Xiaoyun Yang Tianping Huang
2
WHY?
  • The amount of information available is growing
    steadily, which needs automated methods to locate
    and retrieve information with respect to users
    individual preferences.
  • The number of users accessing the Internet is
    also growing, which opens up new possibilities to
    organize and recommend information.

3
OBJECTIVES
  • Searching all the recommendation systems
    available from websites
  • Introducing the algorithms of famous
    recommendation systems

4
INTRODUCTION
  • The techniques used in todays recommendation
    systems fall into two categories
  • content-based filtering
  • uses actually content features of products.
  • collaborative filtering
  • predicts active users preference using other
    users rating, assuming that like-minded people
    tend to have similar choices.

5
Recommendation systems
6
Firefly
  • Based on similarities between the interest
    profile of that user and those of other users.
  • At beginning, used for music and movies, now
    extend to other media
  • Characteristics
  • system maintains a user profile
  • word-of mouth recommendations
  • vector matching based on simple rating scale

7
Tapestry
  • use annotations for recommendation
  • Characteristics
  • first collaborative filtering
  • free annotations and explicit like itor hate
    it annotations
  • depends on a lot of peoples reading and voting
  • hard for new areas exploration

8
GroupLens
  • People who agreed in their subjective evaluation
    of past article are likely to agree again in the
    future. According to the similarity, provide
    recommendation
  • Characteristics
  • openness
  • easy to use,vote(explicit 1-5 scale vote)
  • scalability

9
Lotus Notes
  • In the group, the users have similar goals and
    information interests. The system provides a
    feature to let people annotate document and send
    them to others
  • Characteristics
  • closed system of similar users
  • annotate the documents
  • use agent to represent an individual to protect
    privacy

10
Phoaks(People Helping One Another Know Stuff)
  • automatically recognizing web resource
    references in a new group message and then
    attempts to classify it, then introduces it to
    other users
  • Characteristics
  • scan the occurrences of URLs posted in group
    messages and get the most important to users
  • use implicit feedback
  • role specialization
  • reuse, it reuses recommendations from existing
    online conversations.

11
Pointer
  • Mediator to distribute information. The pointer
    consists of URL link, contextual information, and
    optimal comments by the sender.
  • Characteristics
  • package contextual info with hypertext links
  • easy to use( easy to add annotation)
  • not anonymous

12
Mosaic
  • Idea let users publish and distribute notes as
    comments added to web pages
  • Characteristics
  • users could publish or distribute bookmarks
  • comments added to web pages

13
WebWatcher
  • the server acts as tour guide for web. It
    provides interactive communication between server
    and users and provides recommendation
  • Characteristics
  • use previous tour, calculating the similarity of
    users
  • reinforcement learning
  • not the same thing as keyword-based search engine

14
GAB
  • Idea collects and merges bookmark/hotlists files
    of participating users and then serves these
    files to users
  • Characteristics
  • ability to get users bookmarks
  • multi-tree data structure
  • sibling relation to avoid losing in hyperspace
  • cousin relation to avoid sparse connectivity in a
    merged subject tree data base
  • monitor the change of content

15
Yahoo!
  • Idea manual way, one people uses tools to update
    the index as quickly as possible. Every site to
    yahoo! is examined by an expert.
  • Characteristics
  • expert classifiers
  • user contributions, the end user also guesses and
    classifies the article

16
Fab
  • Idea combination of two filters, to overcome
    problems such as, cold-start, and changing of
    users interests
  • Characteristics
  • update users profile, based on the content-based
    filtering
  • use 7-point scale to rank
  • use a series of agents to collect web pages

17
Bayesian-Mixed
  • incorporates the components of both content-based
    filtering and collaborative filtering
  • Characteristics
  • use of all of the available data
  • solve the cold-start problem

18
Mark-combination of two filters
  • Idea combination of the weighted average of two
    filters( content-based and collaborative)
  • Characteristics
  • fully realize the advantages of two filters
  • weights are determined by a per-user basis
  • weights are determined by a per-item basis

19
Trend
  • cold-start problem
  • easy for users to participate or vote
  • algorithm
  • privacy

20
Algorithms on Collaborative Filtering
Example of users rating database
21
Collaborative filtering Algorithms
  • Breese et al. classified CF algorithms into two
    general classes.
  • Memory-based methods
  • Predications are calculated over the entire
    database of users rating.
  • Model-based methods
  • An underlying model of users preference is first
    constructed, from which predictions are inferred.

22
Memory-based Methods
  • Prediction of active users rating is a weighted
    sum of the ratings of other users.
  • Weights are calculated based on
  • Mean Squared Differences
  • Correlation (Pearson Correlation)
  • Vector Similarity

23
Improvements to Memory-based Algorithm
  • Default voting
  • Inverse User Frequency
  • Case Amplification
  • Voting by Category

24
Voting by Category
25
Model-based Methods
  • Probabilistic Models
  • Cluster Models
  • standard cluster method Using EM algorithm to
    find clusters.
  • Repeated Clustering
  • Gibbs Sampling
  • Bayesian Network Models
  • Neural Network Models

26
Repeated Clustering
27
Other Algorithms
  • A hybrid Memory- and Model-based approach
    (Personal Diagnosis)
  • Assuming that all users report their ratings with
    Gaussian noise.
  • Calculate the expectation of the active user
    rating.
  • Have the same time-complexity as MBM.

28
Problems Improvements
  • Problems
  • Cold start
  • High dimension, Sparse data
  • Missing data
  • Missing Real similarity
  • Improvements
  • Predictions based on users rating on latent
    features.
  • Combination of CBF and CF.

29
Missing Similarity
30
Ratings to Latent Features
  • Assuming that users are rating their products
    based on the latent features of products. All
    products in the database share a set of common
    features.
  • Singular Value Decomposition (SVD)
  • U is representative of the response of each user
    to certain features.
  • V is representative of the amount of each feature
    present in each product.
  • S is a matrix related to the feature importance
    in overall determination of the rating.

31
A Example of Dimension reduction using SVD
32
Recommendation Systems
Data Cleaning
Beginning
Capturing Latent Features (SVD)
Content-based Filtering
Memory-based Models
Model-based Models
Write a Comment
User Comments (0)
About PowerShow.com