Beyond Algorithms: A HumanCentered Evaluation of Recommender Systems - PowerPoint PPT Presentation

1 / 62
About This Presentation
Title:

Beyond Algorithms: A HumanCentered Evaluation of Recommender Systems

Description:

Sleeper and Amazon books average highest ratings. Split opinions on Reel, MovieCritic ... Amazon received highest liking rating for Study 1 (for books & movies) ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 63
Provided by: Kirs68
Category:

less

Transcript and Presenter's Notes

Title: Beyond Algorithms: A HumanCentered Evaluation of Recommender Systems


1
Beyond Algorithms A Human-Centered Evaluation
of Recommender Systems
  • Kirsten Swearingen Rashmi Sinha
  • SIMS 213, UC Berkeley
  • April 11, 2002

2
Overview
  • Introduction to recommender systems
  • Motivation for project
  • Method and findings - User Study 1
  • Method and findings - User Study 2
  • Discussion and design recommendations
  • Limitations of study
  • Future work

3
In the newsA bet on humans vs. machines
  • Ray Kurzweil maintains that a computer (i.e., a
    machine intelligence) will pass the Turing test
    by 2029. Mitchell Kapor believes this will not
    happen.
  • In a 1950 paper Alan Turing describes his concept
    of the Turing Test, in which one or more human
    judges interview computers and human foils using
    terminals (so that the judges won't be prejudiced
    against the computers for lacking a human
    appearance).
  • If the human judges are unable to reliably unmask
    the computers (as imposter humans) then the
    computer is considered to have demonstrated
    human-level intelligence.

4
Recommender systems are a technological proxy for
a social process
Which one should I read?
Recommendations from Online Systems
Recommendations from friends
5
Basic interaction paradigm of recommender systems
Input (ratings of books) I recently enjoyed
Snow Crash, Seabiscuit, The Soloist, and Love in
a Cold Climate
What should I read next?
Output (Recommendations) Books you might enjoy
are
6
Approaches Back End
  • Content-based recommendations
  • Rely on metadata describing items
  • You like action-adventure movies and movies
    starring Meryl Streep.
  • Collaborative filtering
  • Rely on correlations between individual ratings.
  • You like most of the same movies Joe and Carol
    like, so you might like these other movies they
    liked.

7
Collaborative Filtering Algorithms Depend Upon
Correlations
Meg David correlation .52 Meg Amy
correlation -.67 Meg Joe correlation .23
Recommendations for Meg Books 7 and 8
8
Approaches Front End
  • Implicit rating (by browsing, clicking, or
    purchasing)
  • Explicit rating, differing in the
  • Number of items to rate
  • Rating scale used
  • Number of items recommended
  • Amount of personal information required
  • Opportunity for feedback on recommended items

9
Amazons Recommendation Process
Input One artist/author name
Output List of recommendations
Opportunity to Explore / Refine
10
Sleepers Recommendation Process
Input Ratings of 10 books for all users
continuous scale
Output Display 1 book at a time, with degree of
confidence in prediction
(System designed by Ken Goldberg, UC Berkeley)
11
Song Explorers Recommendation Process
Input 20 ratings
Output List of songs/albums
12
I know what youll read next summer (Amazon,
BarnesNoble)
  • what movies you should watch (Reel, RatingZone,
    Amazon)
  • what music you should listen to (CDNow, Mubu,
    Gigabeat)
  • what websites you should visit (Alexa)
  • what jokes you will like (Jester)
  • and who you should date (Yenta)

13
The recommendation process from the users
perspective
Time and effort to input Privacy concerns
User inputs preferences
receives recommendations
Time and effort to review recs
and decides if he/she will sample recommendation
In the end, a user benefits only if
recommendations turn out to be good ones.
14
What Users Want
New to me
Engaging
Fast
RECOMMENDATIONS
PROCESS
Good
Easy
To succeed, collaborative filtering recommender
systems need a LOT of motivated regular users.
15
Issues with Recommender Systems
  • Coldstart problem
  • Latency problem
  • Unusual users
  • Privacy concerns
  • Scalability
  • Speed of transaction
  • User interface

16
Motivation for Project
  • General need plenty of research on rec system
    backend, little on interface
  • Personal interest
  • Kirsten -- designing Reading Tree
  • Rashmi -- interested in community-oriented sites

17
Our Project Beyond Algorithms Only -- An HCI
Perspective on Recommender Systems
  • Compare the social recommendation process to
    online recommender systems
  • Understand the factors that go into an effective
    recommendation by studying user interaction with
    systems

18
Stages of Project
  • Study 1
  • Began as class project for SIMS 271 user study
    of 6 book and movie systems
  • Focused on humans vs. recommenders comparison
  • Study 2
  • User study comparing 5 music recommender systems
  • Focused on identifying factors that contribute to
    system success

19
General Methodology
  • Not an experiment, but designed like one.
    Conducted in lab environment.
  • Broad overview to start with, then zeroed in on
    some systems
  • Meshing of quantitative and qualitative methods
    (one informing the other)
  • Pre-test, pre-test, pre-test
  • User motivation ascertained before study
  • Within-subjects design used wherever possible
  • Multiple small studies, rather than 1 big study

20
General Methodology
  • Comprehensive data collection observation,
    behavior logging with time-stamps,
    questionnaires, post-test interviews.
  • The Slim Logger a simple, Excel-based tool for
    recording timed observations.

21
Study 1 The Human vs. Recommenders Death Match
22
3 Book Systems
Amazon Books
Rating Zone
Sleeper
23
3 Movie Systems
Amazon Movies
Movie Critic
Reel
24
3 Friends Per Person
Participants were asked to choose friends who
knew their tastes in books or movies.
25
Method
  • 19 participants, age18 to 34 years
  • For each of 3 online systems
  • Registered at site
  • Rated items
  • Reviewed and evaluated recommendation set
  • Completed questionnaire
  • Also reviewed and evaluated sets of
    recommendations from 3 friends each

26
Defining Success
Good Recs. (Precision)
  • items user felt interested in

USEFUL (New to user)
Useful Recs.
  • Subset of Good Recs.
  • User felt interested in and had not read / viewed
    yet

Previously experienced
ALL GOOD RECOMMENDATIONS
27
Comparing Human Recommenders to RS Good and
Useful Recommendations
Good Recommendations
100
Useful Recommendations
90
80
70
60
50
40
30
20
10
0
Amazon (15)
Sleeper (10)
Friends (9)
Rating Zone (8)
Amazon (15)
Reel (5-10)
Movie Critic (20)
Friends (9)
Movies
Books
(x) No. of Recommendations
RS Average
28
However, users like online RS.
This result was supported by post test interviews.
29
Why systems over friends?
  • Suggested a number of things I hadnt heard of,
    interesting matches.
  • It was like going to Codyslooking at that
    table up front for new and interesting books.
  • Systems can pull from a large databaseno one
    person knows about all the movies I might like.

30
Recommender systems broaden horizons
  • while friends mostly recommend familiar items.

31
Which of the systems did users prefer?
Yes
No
Movies
Books
  • Sleeper and Amazon books average highest ratings
  • Split opinions on Reel, MovieCritic

32
Why did some systems
  • Provide useful recommendations but leave users
    unsatisfied?
  • RatingZone
  • MovieCritic
  • Reel

33
Searching for Reasons
  • Previously Liked Items Adequate Item
    Description are correlated with Usefulness
    ratings.
  • Time to Receive Recommendations No. of Items to
    Rate not important!

34
A Question of Trust
  • Post-test interviews showed that users trusted
    systems if they had already sampled (and enjoyed)
    some recommendations
  • Positive Experiences lead to trust
  • Negative Experiences with recommended items lead
    to mistrust of system

USEFUL (New to user)
TRUST-GENERATING Previously experienced
ALL GOOD RECOMMENDATIONS
35
A Question of Trust
Books
Movies
Difference between Amazon and Sleeper highlights
the fact that there are different kinds of good
Recommender Systems
36
Adequate Item Description The RatingZone Story
0 of Version 1 and 60 of Version 2 users found
item description adequate
An adequate item description and links to other
sources of information about item were crucial
factors in users being convinced by a
recommendation.
37
System Transparency
  • Do users think they understand why an item was
    recommended?

Users mentioned this factor in post test
interview during Study 2, we explored it in
greater detail.
38
Study 2 Music Recommenders
Amazon, CDNow, MediaUnbound, MoodLogic, and
SongExplorer
39
Method
  • 12 participants
  • Very similar to Study 1 method
  • Registered at site
  • Rated items
  • Reviewed and evaluated recommendation set
  • Completed questionnaire
  • Focused on music systems only (eliminate domain
    differences)
  • Participants listened to clips and evaluated
    recommended items (this was not possible with
    book and movie systems)

40
Findings Effect of Familiarity
  • Familiar recommendations liked more than
    unfamiliar ones for all five systems

41
Transparency Again
User perception that they understand why an item
was recommended
  • Transparent recommendations liked more than
    not-transparent ones for all five systems

42
Side note once trust is established,
transparency may become less important
  • The serious-minded, 65-year-old father of one of
    my friends uses NetFlix (DVDs)
  • Based on the items he had rented, he received
    this recommendationand ordered the film!
  • His comment They think Ill like it and they
    have done pretty well in the past so Ill take a
    chance.

New Wave teen / 20somethings search for love on
New Year's Eve 1981 in this episodic comedy.
43
2 Models of Recommender System Success
  • Recommendations from Amazon received highest
    liking rating for Study 1 (for books movies)
    and second highest for Study 2 (Music)
  • Recommendations from MediaUnbound outperformed
    Amazon in Study 2 (Music)
  • Both systems were well liked but differed
    dramatically in interaction style

44
Amazons Bare-Bones Recommendation Process
45
Media-Unbounds long, extended (35 questions)
recommendation process
Genre Selection
46
Setting level of familiarity
Rating some songs
Feedback at every stage
47
Setting system expectations
More feedback about users tastes
48
Users find MediaUnbounds Recommendations More
Useful
Also, most users preferred MediaUnbound over
Amazon
But whose recommendations would they buy?
49
Users Express More Interest in Buying Amazons
Recommendations
50
Different System Strengths
  • Amazon
  • Safe, conservative approach to recommendations
  • Recommendations are familiar--few new items
  • Users find system logic transparent
  • Users dont feel like they learnt anything new
  • MediaUnbound
  • Verifies from user how familiar they want
    recommendations to be
  • Long input process seems to generate trust
  • Recommendations are often new, but well liked

51
Discussion and Design Recommendations
52
Justify Your Recommendations
  • Adequate Item Information Provide enough detail
    about item for user to make choice
  • System Transparency Generate (at least some)
    recommendations which are clearly linked to the
    rated items
  • Explanation Explain why the item was
    recommended.
  • Community Ratings Provide link to ratings /
    reviews by other users. If possible, present
    numerical summary of ratings.

53
Accuracy vs. Less Input
  • Dont sacrifice accuracy for the sake of
    generating quick recommendations. Users dont
    mind rating more items to receive quality
    recommendations.
  • Multi-level recommendations. Users can initially
    use the system by providing a single rating, and
    are offered subsequent opportunities to refine
    recommendations
  • Provide a happy medium between too little input
    (leading to low accuracy) and too much input
    (leading to user impatience)
  • Unlike with search engines, users are not willing
    to try again and again.

54
Include New, Unexpected Items
  • Users like rec. systems because they provide
    information about new, unexpected items.
  • List of recommended items should include new
    items which the user might not learn about in any
    other way.
  • List could also include some unexpected items
    (e.g., from other topics / genres) which users
    might not have thought of themselves.

55
Trust Generating Items
  • Users (especially first time users) need to
    develop trust in the system.
  • Trust in system is enhanced by the presence of
    items that the user has already enjoyed.
  • Including some very popular items (which have
    probably been experienced previously) in the
    initial recommendation set might be one way to
    achieve this.

56
The Right Mix of Items
  • Transparent Items At least some items for which
    the user can see the clear link between the items
    he /she rated and the recommendation.
  • Unexpected Items Some unexpected items, whose
    purpose is to allow users to broaden horizons.
  • New Items Some items which are new / just
    released.
  • Trust-Generating Items A few very popular ones,
    in which the system has high confidence

Question Should these be presented as a sorted
list / unsorted list / different categories of
recommendations?
57
Verify Degree of Familiarity User Wants
This can help produce the right mix of items
for each user.
58
What kind of system do you want to design?
  • One to sell as many items as possible?
  • Or one to help users explore and expand their
    tastes?
  • The 2 goals are often contradictory, at least in
    the short term.
  • Important for system designer to keep goals in
    mind while designing system.

59
Limitations of Study
  • Simulated first-time visitdid not allow system
    to learn user preferences over time.
  • Fairly homogenous group of subjects, no novice
    users
  • Study 1 source of recommendations known to
    subjectsmight have been biased towards friends.

60
The Recommender Community Responds Favorably to
our Work
  • DELOS-NSF 2001 Workshop on Personalization and
    Recommender Systems in Digital Libraries (Dublin)
  • SIGIR 2001 Workshop on Recommender Systems (New
    Orleans)

61
Future Work
  • Develop model to describe interfaces for music
    discovery
  • Build our own system and manipulate interface to
    more fully test our hypotheses
  • Administer Turing test of music recommenders
  • Compare systems, friends and experts
  • Anonymize the source of recommendation

62
Thanks to.
  • Rashmi Sinha
  • Marti Hearst
  • All the user study participants (you know who you
    are)
Write a Comment
User Comments (0)
About PowerShow.com