Beyond Algorithms: A HumanCentered Evaluation of Recommender Systems

About This Presentation

Title:

Beyond Algorithms: A HumanCentered Evaluation of Recommender Systems

Description:

Sleeper and Amazon books average highest ratings. Split opinions on Reel, MovieCritic ... Amazon received highest liking rating for Study 1 (for books & movies) ... – PowerPoint PPT presentation

Number of Views:93

Avg rating:3.0/5.0

Slides: 63

Provided by: Kirs68

Category:

more less

Transcript and Presenter's Notes

Title: Beyond Algorithms: A HumanCentered Evaluation of Recommender Systems

1
Beyond Algorithms A Human-Centered Evaluation
of Recommender Systems

Kirsten Swearingen Rashmi Sinha
SIMS 213, UC Berkeley
April 11, 2002

2
Overview

Introduction to recommender systems
Motivation for project
Method and findings - User Study 1
Method and findings - User Study 2
Discussion and design recommendations
Limitations of study
Future work

3
In the newsA bet on humans vs. machines

Ray Kurzweil maintains that a computer (i.e., a
machine intelligence) will pass the Turing test
by 2029. Mitchell Kapor believes this will not
happen.
In a 1950 paper Alan Turing describes his concept
of the Turing Test, in which one or more human
judges interview computers and human foils using
terminals (so that the judges won't be prejudiced
against the computers for lacking a human
appearance).
If the human judges are unable to reliably unmask
the computers (as imposter humans) then the
computer is considered to have demonstrated
human-level intelligence.

4
Recommender systems are a technological proxy for
a social process
Which one should I read?
Recommendations from Online Systems
Recommendations from friends
5
Basic interaction paradigm of recommender systems
Input (ratings of books) I recently enjoyed
Snow Crash, Seabiscuit, The Soloist, and Love in
a Cold Climate
What should I read next?
Output (Recommendations) Books you might enjoy
are
6
Approaches Back End

Content-based recommendations
Rely on metadata describing items
You like action-adventure movies and movies
starring Meryl Streep.
Collaborative filtering
Rely on correlations between individual ratings.
You like most of the same movies Joe and Carol
like, so you might like these other movies they
liked.

7
Collaborative Filtering Algorithms Depend Upon
Correlations
Meg David correlation .52 Meg Amy
correlation -.67 Meg Joe correlation .23
Recommendations for Meg Books 7 and 8
8
Approaches Front End

Implicit rating (by browsing, clicking, or
purchasing)
Explicit rating, differing in the
Number of items to rate
Rating scale used
Number of items recommended
Amount of personal information required
Opportunity for feedback on recommended items

9
Amazons Recommendation Process
Input One artist/author name
Output List of recommendations
Opportunity to Explore / Refine
10
Sleepers Recommendation Process
Input Ratings of 10 books for all users
continuous scale
Output Display 1 book at a time, with degree of
confidence in prediction
(System designed by Ken Goldberg, UC Berkeley)
11
Song Explorers Recommendation Process
Input 20 ratings
Output List of songs/albums
12
I know what youll read next summer (Amazon,
BarnesNoble)

what movies you should watch (Reel, RatingZone,
Amazon)
what music you should listen to (CDNow, Mubu,
Gigabeat)
what websites you should visit (Alexa)
what jokes you will like (Jester)
and who you should date (Yenta)

13
The recommendation process from the users
perspective
Time and effort to input Privacy concerns
User inputs preferences
receives recommendations
Time and effort to review recs
and decides if he/she will sample recommendation
In the end, a user benefits only if
recommendations turn out to be good ones.
14
What Users Want
New to me
Engaging
Fast
RECOMMENDATIONS
PROCESS
Good
Easy
To succeed, collaborative filtering recommender
systems need a LOT of motivated regular users.
15
Issues with Recommender Systems

Coldstart problem
Latency problem
Unusual users
Privacy concerns
Scalability
Speed of transaction
User interface

16
Motivation for Project

General need plenty of research on rec system
backend, little on interface
Personal interest
Kirsten -- designing Reading Tree
Rashmi -- interested in community-oriented sites

17
Our Project Beyond Algorithms Only -- An HCI
Perspective on Recommender Systems

Compare the social recommendation process to
online recommender systems
Understand the factors that go into an effective
recommendation by studying user interaction with
systems

18
Stages of Project

Study 1
Began as class project for SIMS 271 user study
of 6 book and movie systems
Focused on humans vs. recommenders comparison
Study 2
User study comparing 5 music recommender systems
Focused on identifying factors that contribute to
system success

19
General Methodology

Not an experiment, but designed like one.
Conducted in lab environment.
Broad overview to start with, then zeroed in on
some systems
Meshing of quantitative and qualitative methods
(one informing the other)
Pre-test, pre-test, pre-test
User motivation ascertained before study
Within-subjects design used wherever possible
Multiple small studies, rather than 1 big study

20
General Methodology

Comprehensive data collection observation,
behavior logging with time-stamps,
questionnaires, post-test interviews.
The Slim Logger a simple, Excel-based tool for
recording timed observations.

21
Study 1 The Human vs. Recommenders Death Match
22
3 Book Systems
Amazon Books
Rating Zone
Sleeper
23
3 Movie Systems
Amazon Movies
Movie Critic
Reel
24
3 Friends Per Person
Participants were asked to choose friends who
knew their tastes in books or movies.
25
Method

19 participants, age18 to 34 years
For each of 3 online systems
Registered at site
Rated items
Reviewed and evaluated recommendation set
Completed questionnaire
Also reviewed and evaluated sets of
recommendations from 3 friends each

26
Defining Success
Good Recs. (Precision)

items user felt interested in

USEFUL (New to user)
Useful Recs.

Subset of Good Recs.
User felt interested in and had not read / viewed
yet

Previously experienced
ALL GOOD RECOMMENDATIONS
27
Comparing Human Recommenders to RS Good and
Useful Recommendations
Good Recommendations
100
Useful Recommendations
90
80
70
60
50
40
30
20
10
0
Amazon (15)
Sleeper (10)
Friends (9)
Rating Zone (8)
Amazon (15)
Reel (5-10)
Movie Critic (20)
Friends (9)
Movies
Books
(x) No. of Recommendations
RS Average
28
However, users like online RS.
This result was supported by post test interviews.
29
Why systems over friends?

Suggested a number of things I hadnt heard of,
interesting matches.
It was like going to Codyslooking at that
table up front for new and interesting books.
Systems can pull from a large databaseno one
person knows about all the movies I might like.

30
Recommender systems broaden horizons

while friends mostly recommend familiar items.

31
Which of the systems did users prefer?
Yes
No
Movies
Books

Sleeper and Amazon books average highest ratings
Split opinions on Reel, MovieCritic

32
Why did some systems

Provide useful recommendations but leave users
unsatisfied?
RatingZone
MovieCritic
Reel

33
Searching for Reasons

Previously Liked Items Adequate Item
Description are correlated with Usefulness
ratings.
Time to Receive Recommendations No. of Items to
Rate not important!

34
A Question of Trust

Post-test interviews showed that users trusted
systems if they had already sampled (and enjoyed)
some recommendations
Positive Experiences lead to trust
Negative Experiences with recommended items lead
to mistrust of system

USEFUL (New to user)
TRUST-GENERATING Previously experienced
ALL GOOD RECOMMENDATIONS
35
A Question of Trust
Books
Movies
Difference between Amazon and Sleeper highlights
the fact that there are different kinds of good
Recommender Systems
36
Adequate Item Description The RatingZone Story
0 of Version 1 and 60 of Version 2 users found
item description adequate
An adequate item description and links to other
sources of information about item were crucial
factors in users being convinced by a
recommendation.
37
System Transparency

Do users think they understand why an item was
recommended?

Users mentioned this factor in post test
interview during Study 2, we explored it in
greater detail.
38
Study 2 Music Recommenders
Amazon, CDNow, MediaUnbound, MoodLogic, and
SongExplorer
39
Method

12 participants
Very similar to Study 1 method
Registered at site
Rated items
Reviewed and evaluated recommendation set
Completed questionnaire
Focused on music systems only (eliminate domain
differences)
Participants listened to clips and evaluated
recommended items (this was not possible with
book and movie systems)

40
Findings Effect of Familiarity

Familiar recommendations liked more than
unfamiliar ones for all five systems

41
Transparency Again
User perception that they understand why an item
was recommended

Transparent recommendations liked more than
not-transparent ones for all five systems

42
Side note once trust is established,
transparency may become less important

The serious-minded, 65-year-old father of one of
my friends uses NetFlix (DVDs)
Based on the items he had rented, he received
this recommendationand ordered the film!
His comment They think Ill like it and they
have done pretty well in the past so Ill take a
chance.

New Wave teen / 20somethings search for love on
New Year's Eve 1981 in this episodic comedy.
43
2 Models of Recommender System Success

Recommendations from Amazon received highest
liking rating for Study 1 (for books movies)
and second highest for Study 2 (Music)
Recommendations from MediaUnbound outperformed
Amazon in Study 2 (Music)
Both systems were well liked but differed
dramatically in interaction style

44
Amazons Bare-Bones Recommendation Process
45
Media-Unbounds long, extended (35 questions)
recommendation process
Genre Selection
46
Setting level of familiarity
Rating some songs
Feedback at every stage
47
Setting system expectations
More feedback about users tastes
48
Users find MediaUnbounds Recommendations More
Useful
Also, most users preferred MediaUnbound over
Amazon
But whose recommendations would they buy?
49
Users Express More Interest in Buying Amazons
Recommendations
50
Different System Strengths

Amazon
Safe, conservative approach to recommendations
Recommendations are familiar--few new items
Users find system logic transparent
Users dont feel like they learnt anything new
MediaUnbound
Verifies from user how familiar they want
recommendations to be
Long input process seems to generate trust
Recommendations are often new, but well liked

51
Discussion and Design Recommendations
52
Justify Your Recommendations

Adequate Item Information Provide enough detail
about item for user to make choice
System Transparency Generate (at least some)
recommendations which are clearly linked to the
rated items
Explanation Explain why the item was
recommended.
Community Ratings Provide link to ratings /
reviews by other users. If possible, present
numerical summary of ratings.

53
Accuracy vs. Less Input

Dont sacrifice accuracy for the sake of
generating quick recommendations. Users dont
mind rating more items to receive quality
recommendations.
Multi-level recommendations. Users can initially
use the system by providing a single rating, and
are offered subsequent opportunities to refine
recommendations
Provide a happy medium between too little input
(leading to low accuracy) and too much input
(leading to user impatience)
Unlike with search engines, users are not willing
to try again and again.

54
Include New, Unexpected Items

Users like rec. systems because they provide
information about new, unexpected items.
List of recommended items should include new
items which the user might not learn about in any
other way.
List could also include some unexpected items
(e.g., from other topics / genres) which users
might not have thought of themselves.

55
Trust Generating Items

Users (especially first time users) need to
develop trust in the system.
Trust in system is enhanced by the presence of
items that the user has already enjoyed.
Including some very popular items (which have
probably been experienced previously) in the
initial recommendation set might be one way to
achieve this.

56
The Right Mix of Items

Transparent Items At least some items for which
the user can see the clear link between the items
he /she rated and the recommendation.
Unexpected Items Some unexpected items, whose
purpose is to allow users to broaden horizons.
New Items Some items which are new / just
released.
Trust-Generating Items A few very popular ones,
in which the system has high confidence

Question Should these be presented as a sorted
list / unsorted list / different categories of
recommendations?
57
Verify Degree of Familiarity User Wants
This can help produce the right mix of items
for each user.
58
What kind of system do you want to design?

One to sell as many items as possible?
Or one to help users explore and expand their
tastes?
The 2 goals are often contradictory, at least in
the short term.
Important for system designer to keep goals in
mind while designing system.

59
Limitations of Study

Simulated first-time visitdid not allow system
to learn user preferences over time.
Fairly homogenous group of subjects, no novice
users
Study 1 source of recommendations known to
subjectsmight have been biased towards friends.

60
The Recommender Community Responds Favorably to
our Work

DELOS-NSF 2001 Workshop on Personalization and
Recommender Systems in Digital Libraries (Dublin)
SIGIR 2001 Workshop on Recommender Systems (New
Orleans)

61
Future Work

Develop model to describe interfaces for music
discovery
Build our own system and manipulate interface to
more fully test our hypotheses
Administer Turing test of music recommenders
Compare systems, friends and experts
Anonymize the source of recommendation

Beyond Algorithms: A HumanCentered Evaluation of Recommender Systems - PowerPoint PPT Presentation

Beyond Algorithms: A HumanCentered Evaluation of Recommender Systems

Sleeper and Amazon books average highest ratings. Split opinions on Reel, MovieCritic ... Amazon received highest liking rating for Study 1 (for books & movies) ... – PowerPoint PPT presentation