Title: Unobtrusively Tracking Information Needs Implicit Solutions to Explicit Problems
1Unobtrusively Tracking Information
NeedsImplicit Solutions to Explicit
Problems
- Ryen W. White
- Information Retrieval Group
- Department of Computing Science
- University of Glasgow
- ryen_at_dcs.gla.ac.uk
2Me, me, me
- Final year Ph.D. student in the Department of
Computing Science, University of Glasgow - Thank you for inviting me!
- Main aim of visit to UW was to develop a plan for
the final user evaluation of my Ph.D. - Think I have done this!
- I will describe a bit of my work so far
3Aims
- Searching can be problematic help struggling
searchers find what they seek - Develop a means of better representing searcher
needs whilst minimising the burden of explicitly
reformulating queries or directly providing
relevance information - Use implicit (unobtrusive, hidden) monitoring of
interaction to generate an expanded query that
estimates the information need of the searcher
4Information Seeking Metaphor
- Information seeking consists of three major
components - the user with their request for information
- the document collection on which to apply their
request - the response of the IR system to this request
- Interactive IR systems allow the user to conduct
searching tasks dynamically and correspondingly
reacts to system responses over session time
IR System
query
corpus
user
results
5The Information Need
- The transformation of a users information need
into a query is known as query formulation - One of the most challenging activities in
information seeking - Amplified if the information need is vague or
collection of knowledge is poor - IR systems assume that the query is a close
representation of real information need, this is
often not true - Systems need a way of understanding relevance
6Ostension
- Searchers know what is relevant, but can have
problems choosing terms to express relevance - How would you describe red to a child?
- Perhaps by using red things as examples
- We can describe relevance to an IR system in a
similar way, by identifying which documents have
relevant attributes
7Relevance Feedback
Relevance Feedback is an automatic iterative
process designed to produce improved query
formulations following an initial retrieval
operation RF typically expands a
searchers query
query moved closer to relevant documents
query moved closer to relevant documents
original query
revised query
0
not relevant
1
relevant
corpus
8RF Procedure
- 1 User poses initial request
- 2 The retrieval system returns a list of
documents judged relevant - i Based on query and internal document
representations and retrieval algorithms - ii Usually returns a list of titles and abstracts
- 3 User selects relevant documents from this set
- 4 Query is modified automatically and the search
process is repeated
9(No Transcript)
10Relevance Feedback
- The initial query is enhanced to become more
attuned to the searchers information need
through an iterative process of feedback - However
- relies on explicit relevance assessments
- visiting documents to gauge relevance is a
demanding and time-consuming process - use a binary notion of relevance, what about
partially relevant? - searchers may be unwilling/unable to provide
feedback
11Implicit Feedback
- Search system unobtrusively monitors search
behaviour - Removes the need for the searcher to explicitly
indicate which documents are relevant - Searchers no longer required to assess the
relevance of a number of documents - System makes inferences based on interaction and
selects terms that approximate searcher needs
12Implicit Feedback Approach
- Implicit Feedback traditionally uses surrogate
measures as evidence of searcher interests - Document reading time, scrolling, mouse clicks,
etc. - or forms of document retention
- Printing, saving, bookmarking, etc.
- Useful, but can be highly context-dependent and
vary greatly between users
13Implicit Feedback Approach
- Our approach assumes only that searchers will
view information that pertains to their
information needs - Whole documents can contain good and bad
expansion terms - use smaller representations of documents and
extract terms from these - reduce likelihood that erroneous terms will be
chosen for query expansion
14Document Representations
- For each document there are five different
representations - title, as created by the author
- query-biased summary of the document
- list of top-ranking sentences (TRS) from the top
30 documents, scored in relation to the query - each sentence is considered as a representation
for that document - sentence in the query-biased summary
- sentence in the context it occurs in the document
15(No Transcript)
16Top-Ranking Sentences
- for each document in the top thirty retrieved
- pool all summaries from all docs, rank with score
1
5
4
3
document order
query score order
2
7
Web document
6
Extract sentences from documents
Score sentences in relation to query
Choose top 4 sentences for summary
17Relevance Path
- Searcher can view titles and access full-texts as
in standard Web search interfaces - Through their interaction searchers have control
over which representations they view - Distance travelled along a path can provide
information on the relevance of terms used in
path representations
Summary Sentence
Sentence in Context
TRS
Title
Summary
Doc
18 Binary Voting Model (BVM)
- We choose terms to better represent information
needs from representations viewed by searcher - Each representation votes for the terms it
contains - All terms are candidates in the voting process
and these votes accumulate across all viewed
representations - Useful terms will be those contained in many of
the representations user chooses to view
19Indicative worth
- Document representations can vary in length and
can hence be regarded as being more or less
indicative of document content - i.e. a top-ranking sentence is less indicative
than a query-biased summary (typically 4
sentences) - contains less information about the content of
the document - We weight the contribution of the
representations vote based on the indicative
worth (typical length) of the representations
20Implementing the BVM
- Documents are represented by a vector containing
all unique non-stemmed, non-stopword terms in the
top 30 web documents - The list is the vocabulary
Four terms in vocabulary
Four terms in vocabulary
t
w( )
D1
relevance path
w( )
D2
w(.)
representation weight (based on indicativity)
Each document D, has a separate row in the matrix
w( )
w( )
Dn
21Creating the new query
relevance path
w( ) .2 .1 .3 .6 w( ) .2 .3
.5 w( ) .2 .3 .5 w( ) .1 .3 .4
TRS
Summary
2
4
Title
Take average w(.) across 10 paths and use
top-scoring terms to expand query
TRS indicativity 0.2 Title
indicativity 0.1 Sum indicativity 0.3
22Using the expanded query
- Traditional relevance feedback systems require
searcher to control relevance feedback - Instruct system to perform query modification and
produce a new set of documents - May not always be appropriate
- Information needs are dynamic and can develop in
a dramatic or gradual manner - Gradual changes ? generation of a new result set
is perhaps too severe - Revisions that reflect the degree of development
perhaps more suitable
23Different changes, different actions
- Use the evidence gathered to track potential
changes in information need and tailor the
results presentation to suit degree of change - Large changes ? new searches
- Small changes ? less radical operations
- Reordering the list of documents or reordering
the top-ranking sentences
24Changing Needs
- We detect changes in terms suggested by the
system for query expansion and based on the
degree of change we decide how to use the new
query - The weight of all terms in the vocabulary change
- Vocabulary is static, terms in the list will not
change, weights and order will
25Spearman rank-order correlation
- Tests for degree of similarity between two lists
of rankings - Non-parametric, ranks not scores used
Information viewed by the searcher
Order of the term lists
original order
order after 10 relevance paths
26Choosing the action
- We have a coefficient in the range -1 to 1, where
a result closer to -1 means the term lists are
dissimilar with respect to their rank ordering - As coefficient gets closer to one, the lists
become more similar, and the change in
information need is assumed to be smaller
use Spearman rank order correlation coefficient
to predict extent of change
re-search
-1
0
.2 .5 .8 1
re-order documents
no action
order of terms changes...
re-order TRS
27Take stock
- We have an approach for
- Detecting information needs
- Through monitoring the information viewed by the
searcher - Tracking information needs
- Through the differences in information viewed
over time - We evaluate the success of the approach from the
perspective of the searcher
28Pilot Study
- Assess how well our approach detects information
needs and tracks changes in these needs - Compared it against a baseline system that placed
responsibility for query reformulation and action
on the searcher - Did not compare it with a traditional relevance
feedback system - Test how well it detects/tracks information needs
before claiming it can better relevance feedback
29Hypotheses
- Hypothesis 1
- Terms selected by the system relate to the
information need of the searcher - Hypothesis 2
- System successfully perceives developments in the
information need and acts appropriately
30Subjects
- 24 subjects
- Inexperienced and experienced searchers
- 12 in each group
- Differences in internet/computer usage and search
experience - Inexperienced users 3.1 hours online per week
- Experienced users 34.9 hours online per week
- Average age 26 yrs (max 54, min 16)
31Tasks
- Searchers asked to complete one task from each of
four categories - Fact search
- finding a named persons email address
- Decision search
- choosing the best financial instrument
- Background search
- finding information on dust allergies
- Search for a number of items
- finding contact details of some potential
employers
32Example Task (Background search)
- Simulated work task situation Imagine you work
in an old building and one of your colleagues has
developed a severe dust allergy that you believe
is caused by his working environment. He is
writing a letter to complain about the lack of
cleanliness in his work environment and has asked
you to help him find information about dust
allergies.
33Baseline System
- Searcher responsible for selecting expansion
terms and the action - Increased control, but also increased
responsibility - Term/strategy control panel added to interface
- Allowed us to evaluate the worth of the implicit
feedback system, not strict baseline!
term selection
action selection
34Methodology
- Presentation of tasks to subjects was held
constant each subject performed the tasks in the
same order (factorial design) - 10 minutes for each task
- Background logging was used to record user
interaction
35Methodology
- 1. Short tutorial and training task
- 2. Collected background data on aspects such as
subjects experience and training in online
searching - 3. Introduced to tasks/systems
- 4. Attempted tasks, completed questionnaires
- 5. Final questionnaire
- 6. Informal interview
36Brief Results
- Searchers used the interface components
- Relevance paths, top-ranking sentences, etc.
- Implicit query expansion produced good terms that
searchers found useful - Need tracking was helpful and selected
appropriate actions - Downside interface at times erratic, took away
searcher control
37More results
- White, R.W., Jose, J.M. and Ruthven, I. Adapting
to Evolving Needs Evaluating a Behaviour-Based
Search Interface. Proceedings of the 17th Annual
HCI Conference, 2003. - White, R.W., Jose, J.M. and Ruthven, I.
Implicitly Tracking Information Needs.
Information Processing and Management, in
preparation. - White, R.W., Jose, J.M. and Ruthven, I. An
Approach for Implicitly Detecting Information
Needs. Proceedings of the 12th Annual CIKM
Conference, 2003.
38Plans for the future
- Need to evaluate a similar interface
- 36 subjects
- Comparative evaluation between three systems
- Implicit feedback system that recommends terms
(IQE) and actions - Implicit feedback system that chooses terms and
action - Explicit feedback system that give searcher
control over terms and action