COLLATE - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

COLLATE

Description:

Evaluation of digital libraries: Testbeds, measurements, and metrics. June 6-7, ... Navigability degree to which the user can move around the application. ... – PowerPoint PPT presentation

Number of Views:120
Avg rating:3.0/5.0
Slides: 36
Provided by: szt4
Category:
Tags: collate

less

Transcript and Presenter's Notes

Title: COLLATE


1
Virtual Agents for a Bookstore an Empirical
Evaluation
P. Lops, V. Andersen, H.H.K. Andersen, F.
Abbattista and G. Semeraro () Dipartimento
di Informatica, Università di Bari, Italy ()
Risoe National Lasboratory, Denmark
2
Overview
  • Introduction
  • E-commerce problems and solutions
  • The COGITO project
  • Empirical evaluation
  • Conversation Log Analysis
  • Visual Behaviour Analysis (eye-tracking)
  • Questionnaire Analysis
  • Some general considerations

3
Introduction
  • A Digital Library is not merely a collection of
    electronic information. It is a distributed
    technology environment that dramatically reduces
    barriers to the creation, dissemination,
    manipulation, storage, integration and reuse of
    information by individuals and groups (Lesk).
  • Digital Library can play a relevant role in
    several key areas of the e-era, such as
    e-government, e-learning, e-publishing, and
    e-commerce.

4
DLs vs e-commerce the (obvious) differences
  • DLs
  • You need to be a paying member
  • You have to stay on the same site
  • Often dedicated to specific domains
  • No specific stimulation needed
  • E-commerce
  • You may search freely until you choose to buy
  • You may jump from one site to another
  • Usually broader in domains
  • Stimulation is very important

5
DLs vs e-commerce the evaluation
  • For DLs the evaluation is based on objective
    measures (way of sorting, recall, precision of
    documents compared with the requested topic)
  • For e-commerce the evaluation is highly based on
    subjective feelings (satisfaction of the
    individual customer related to user interface,
    functionality, and overall performance).
  • The trust of correctness of information is much
    more important for DLs than for e-commerce.

6
Problems concerning E-Commerce
  • Getting people started on the Web and making
    their first purchase
  • Using traditional metaphors for shopping on web
    sites
  • Users are forced to make their model of shopping
    fit into a web structure with which they are not
    familiar
  • Getting people to submit personal information
  • Summarizing UNCERTAINTY

7
The goal of the COGITO project
  • COGITO aims at improving consumer-supplier
    relationships in e-commerce through intelligent
    personalized agents which can play the role of
    virtual assistants for users.

HOW ?
  • Using intelligent retrieval process integrated
    with chatterbot technology
  • Extracting and exploiting User Models

8
The COGITO application scenario
  • Virtual shop of books, CDs, DVDs, gifts (BOL.de)
  • User profiles the key for personal
    recommendations
  • Improvement of search capabilities exploiting the
    knowledge about users (query expansion)
  • Interaction by means of a chatterbot

9
BOL web site
10
Scenario 1 unknown user
  • The user is not registered
  • The profile is not available to the system
  • The user requires a book written by King

11
Scenario 1 unknown user
List of books belonging to several categories by
authors whose last name is King
12
Scenario 2 registered user
  • The user is already registered
  • The profile is available to the system
  • The user likes Science Technique
  • The user dislikes Narrative
  • The user requires a book written by King

13
Scenario 2 registered user
List of books by authors whose last name is
King belonging to the book category
Naturwissenschaften
14
Empirical evaluation the framework
  • Used for evaluating the performance of the agent
    based on the means-end hierarchy.
  • Also used during the phase of requirements
    specification
  • Requirements are classified in three levels
  • the strategic-level
  • the procedural-level
  • the operational-level

15
Means-end hierarchy
General hierarchy
Condensed hierarchy
each level is specified by the next upper level
concerning the reason for an action, and by the
next lower level concerning how this action may
be supported
16
Empirical evaluation example of requirements
  • Strategic requirements
  • Increase trust
  • Increase customer loyalty
  • Increase conversion rate
  • Procedural requirements
  • Improve naturalness and effort involved in user
    giving information to the system
  • Support a natural dialogue users are happy to
    take part in
  • Provide guided tours of the system
  • Operational requirements
  • Encounter few dead ends in a conversation
  • Run without plug-ins
  • No login procedure

17
Evaluation of COGITO to check...
  • whether the interface succeeds in getting the
    user into a dialogue
  • whether users notice the tailored parts of the
    dialogue
  • whether users notice and accept the chatterbot
  • Performed by letting groups of test persons solve
    various tasks related to searching
    general/specific information using the agent on
    the BOL site

18
Empirical evaluation methods
  • Partly based on quantitative measures
  • Analysis of the conversation log
  • Analysis of eye-tracking
  • Partly based on qualitative measures
  • Fulfilment of detailed questionnaires
  • The COGITO proactive agent has been compared with
    the BOL agent, a state-of-the-art agent with no
    proactive features

19
The two agents
BOL state-of-the-art agent (no proactive)
COGITO proactive agent
20
Set-up of the experiment
Test person, moderator and the eye-tracking system
21
Session introduction
  • Give your honest opinion, we are not the
    programmers and will therefore not be personally
    offended
  • We are testing the system, you are not to be
    tested
  • The agent is not perfect, if so, we didnt need
    to test it
  • Please, think aloud indicate what you think and
    intend to do

22
Conversation log analysis
  • Purpose
  • Measure the conversation performance in terms
    of number of
  • Correct text output
  • Fallback sentences
  • Proactive sentences
  • Search results
  • Average length of user queries

23
Conversation log analysis measures
  • Correct text output
  • Manual analysis of successful elements of the
    agent-user dialogue consisting of one user text
    input string, e.g. a query, and one agent output
    text string, e.g. delivering a correct answer
  • Fallback sentences
  • Degree of the heterogeneousness of the
    conversations. A large occurrence of fallback
    sentences is an expression of poor conversation
    performance
  • Proactive sentences
  • A contextually meaningful response to user input

24
Conversation log analysis measures
  • Search results successful query
  • If the agent on the basis of the Query Expansion
    process prompts the BOL search engine and
    produces a correct list of results in terms of
    relevance for a given task
  • The queries listed in the conversation log have
    been repeated and the results analysed wrt the
    users tasks.

25
Conversation log analysis results
26
Conversation log analysis results
  • Analysis of the average length of user queries
  • Query to the Cogito system average length 5.05
    words
  • Query to the Excite system average length 2.21
    words
  • Query to traditional IR systems average length
    7-15 words
  • Good performance of the Cogito system (see search
    results) because the agent allows the users to
    type their queries in a conversational manner
    without the use of Boolean operators.

27
Eye-tracking analysis
  • Measurement of the respondents visual behaviour
    during the evaluation session
  • The device is non-intrusive
  • Video recording of the eye-movements together
    with the graphic signal from the computer
  • Division of the screen into 5 Areas of
    Interests (AOIs)

28
Eye-tracking analysis AOI
29
Eye-tracking analysis results
The BOL prototype respondents used more time in
checking their keyboard strokes than the Cogito
respondents
Much viewing time has been spent outside the
display because the agent requires input in terms
of written text using keyboard.
The BOL agent deep links did not function well
enough, so more often the BOL respondents used
the BOL site on its own
The BOL agent animation has attracted double
visual attention wrt the Cogito agent probably
due to the more photo-like appearance, obliging
attitude and a larger repertoire of gesticulations
Most viewing time spent at the text output field.
This is not surprising because users need to read
the text.
30
Questionnaire analysis
  • Four groups of 8 persons each were recruited for
    the test session
  • 2 groups of novices
  • 2 groups of experienced users
  • The members of the test groups indicated their
    impression and comparison of the two agents by
    filling out a detailed questionnaire
  • Results need to be re-analysed in various ways
    and is necessary to make statistical
    considerations concerning significance of the
    results

31
Questionnaire analysis measures
  • 7 evaluation criteria
  • Impression user feeling or emotions when using
    the software
  • Command measure to which the user feels that
    she is in control.
  • Effectiveness degree to which the user feels
    that she can complete the task while using the
    system.
  • Navigability degree to which the user can move
    around the application.
  • Learnability degree to which the user feels
    that the application is easy to become familiar
    with.
  • Aidability degree to which the application
    assists the user to resolve a situation.
  • Comprehension degree to which the interaction
    with the application is satisfying.

32
Questionnaire analysis impression
  • The questions related to the impression of the
    agent are based on
  • the agent is enjoyable or a bit awkward to use
  • the user recommends the use of the agent to
    colleagues

33
Questionnaire analysis

BOL agent

COGITO agent
  • Novices had negative feelings for both agents,
    probably because they expect an agent should act
    unimpeachably in all situations
  • Experienced users are aware of the need of a
    period for maturing a new product

34
Some general considerations
  • General approach for DLs evaluation
  • System specification
  • Methodology framework based on the means-end
    hierarchy
  • General measures of
  • Usability by means of log analysis, eye-tracking
    analysis, and questionnaire analysis
  • Effectiveness of search functions by means of
    log analysis and questionnaire analysis
  • User satisfaction by means of log analysis,
    eye-tracking analysis, and questionnaire analysis

35
Contacts
  • Pasquale Lops
  • Dipartimento di Informatica
  • Università di Bari
  • Tel (39) 080 5442276
  • Fax (39) 080 5443196Email lops_at_di.uniba.it
Write a Comment
User Comments (0)
About PowerShow.com