CS 430: Information Discovery - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

CS 430: Information Discovery

Description:

Preliminary version of Assignment 3 is on the web site. ... routing, filtering and clipping are all synonyms used to describe the process of ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 33
Provided by: wya1
Category:

less

Transcript and Presenter's Notes

Title: CS 430: Information Discovery


1
CS 430 Information Discovery
Lecture 15 Usability 2
2
Course Administration
Preliminary version of Assignment 3 is on the
web site. Detailed submission instructions will
be added later.
3
Shared Work!!!
Some programs for Assignment 2 had sections of
identical code! This is not acceptable. 1. If
you incorporate code from other sources, it must
be acknowledged. 2. If you work with a
colleague (a) You must write your own
assignment. (b) You should acknowledge the joint
preparation. IF YOU HAVE NOT FOLLOWED THESE
PRINCIPLES, CONTACT ME DIRECTLY.
4
Levels of Usability
interface design functional design data and
metadata computer systems and networks
conceptual model
5
Conceptual Model
The conceptual model is the user's internal model
of what the system provides The desk top
metaphor -- files and folders The web model
-- click on hyperlinks Library
models search and retrieve search, browse and
retrieve
6
Interface Design
The interface design is the appearance on the
screen and the actual manipulation by the user
Fonts, colors, logos, key board controls,
menus, buttons Mouse control or keyboard
control? Conventions (e.g., "back",
"help") Example Screen space utilization in
American Memory page turner.
7
Functional Design
The functional design, determines the functions
that are offered to the user Selection of
parts of a digital object Searching a list
or sorting the results Help information
Manipulation of objects on a screen Pan or
zoom
8
Same functions, different interface
Example the desk top metaphor Mouse -- 1
button (Macintosh), 2 button (Windows)
or 3 button (Unix) Close button -- left of
window (Macintosh) right of
window (Windows)
9
Data and metadata
Structural data and metadata stored by the
computer system enable the functions and the
interface The desktop metaphor has the
concept of associating a file with an
application. This requires a file type to be
stored with each file -- extension to filename
(Windows and Unix) -- resource fork (Macintosh)
10
Computer systems and networks
The performance, reliability and predictability
of computer systems and networks is crucial to
usability Response time instantaneous for
mouse tracking and echo of key stroke 5 seconds
for simple transactions Example Pipelined
algorithm for the Mercury page turner
11
Croft's Top Ten Criteria
1. Integrated Solutions "A text retrieval system
is a tool that can be used to solve part of an
organization's information management problems.
It is not often, however, the complete solution.
"Typically, a complete solution requires other
text-based tools such as routing and extraction,
tools for handling multimedia and scanned
documents such as OCR, a database management
system for structured data, and workflow or other
groupware systems for managing documents and
their use in the organization." Croft 1995
12
Croft's Top Ten Criteria
2. Distributed Information Retrieval There is a
huge "demand for text retrieval systems that can
work in distributed, wide-area network
environments." "The more general problems are
locating the best databases to search in a
distributed environment that may contain hundreds
or even thousands of databases, and merging the
results that come back from the distributed
search."
13
Croft's Top Ten Criteria
3. Efficient, Flexible Indexing and
Retrieval "One of the most frequently mentioned,
and most highly rated, issues is efficiency. Many
different aspects of a system can have an impact
on efficiency, and metrics such as query response
time and indexing speed are major concerns of
virtually every company involved with text-based
systems." "The other aspect of indexing that is
considered very important is the capability of
handling a wide variety of document formats. This
includes both standards such as SGML, HTML,
Acrobat, and WordPerfect and the myriad formats
used in text-based applications..."
14
Croft's Top Ten Criteria
4. 'Magic' "One of the major causes of failures
in IR systems is vocabulary mismatch. This means
that the information need is often described
using different words than are found in relevant
documents. Techniques that address this problem
by automatic expansion of the query are often
regarded as a form of 'magic' by users and are
viewed as highly desirable."
15
Croft's Top Ten Criteria
5. Interfaces and Browsing "Effective interfaces
for text-based information systems are a high
priority for users of these systems. The
interface is a major part of how a system is
evaluated, ... Interfaces must support a range of
functions including query formulation,
presentation of retrieved information, feedback,
and browsing."
16
Croft's Top Ten Criteria
6. Routing and Filtering "Information routing,
filtering and clipping are all synonyms used to
describe the process of identifying relevant
documents in streams of information such as news
feeds ... large number of archived profiles are
compared to individual documents. Documents that
match are sent to the users associated with the
profile."
17
Croft's Top Ten Criteria
7. Effective Retrieval "Contrary to some
researchers' opinions, companies that sell and
use IR systems are interested in effectiveness.
It is not, however, the primary focus of their
concerns." "... companies are particularly
interested in techniques that produce significant
improvements (rather than a few percent average
precision) and that avoid occasional major
mistakes."
18
Croft's Top Ten Criteria
8. Multimedia Retrieval "The perceived value of
multimedia information systems is very high and,
consequently, industry has a considerable
interest in the development of these techniques."
19
Croft's Top Ten Criteria
9. Information Extraction "Information extraction
techniques are designed to identify database
entities, attributes and relationships in full
text." Also known as data mining.
20
Croft's Top Ten Criteria
10. Relevance Feedback "Companies and government
agencies that use IR systems also view relevance
feedback as a desirable feature, but there are
some practical difficulties that have delayed the
general adoption of this technique."
21
See paper by Croft, Cook and Wilder in the CS 430
readings
22
THOMAS
The documents Full text of all legislation
introduced in Congresses, since 1989. Text of
the Congressional Record. Indexes Bills are
indexed by title, bill number, and the text of
the bill. The Congressional Record is
indexed by title, document identifier, date,
speaker, and page number. Search system InQuery
-- developed by the University of Massachusetts,
Amherst.
23
Weighting
Single-word Query The more instances of that word
in the document, the more relevant the document
will be considered. Occurrence of the term in
the title are considered most relevant (weight x
20).
24
Weighting
Multiple-word Queries 1. Documents containing
instances of the search terms as a phrase --i.e.,
adjacent to each other 2. Search terms occur
near, but not next to, each other, and not
necessarily in the same order as entered. 3.
All search terms appear singly, not in proximity
to each other. 4. Documents contain less than
all of the words.
25
Language Problems
InQuery considers of NO relevance documents
containing NO instances of any form of the search
words A search for "capital punishment" does
not find legislation about "death penalty". If
there are no highly relevant documents, InQuery
returns poorly relevant documents A search
for "elderly black Americans" returned a bill on
"black bears" as most relevant, followed by bills
relating to "black colleges and universities".
(There were no bills in any way related to
"elderly black Americans".)
26
Queries
Words Unique Queries 1 5,767 2
9,646 3 6,905 4 2,240 5
656 6 87 7 19 8 1 Total
25,321 Table showing number of words in queries
27
The Human in the Loop
Return objects
Return hits
Browse repository
Search index
28
D-Lib Working Group on Metrics
DARPA-funded attempt to develop a TREC-like
approach to digital libraries (1997). "This
Working Group is aimed at developing a consensus
on an appropriate set of metrics to evaluate and
compare the effectiveness of digital libraries
and component technologies in a distributed
environment. Initial emphasis will be on (a)
information discovery with a human in the loop,
and (b) retrieval in a heterogeneous world.
" Very little progress made. See
http//www.dlib.org/metrics/public/index.html
29
MIRA
Evaluation Frameworks for Interactive Multimedia
Information Retrieval Applications European study
1996-99 Chair Keith Van Rijsbergen, Glasgow
University Expertise Multi Media Information
Retrieval Information Retrieval Human Computer
Interaction Case Based Reasoning Natural
Language Processing
30
MIRA Starting Point
Information Retrieval techniques are
beginning to be used in complex goal and task
oriented systems whose main objectives are not
just the retrieval of information. New
original research in IR is being blocked or
hampered by the lack of a broader framework for
evaluation.
31
MIRA Aims
Bring the user back into the evaluation
process. Understand the changing nature of
IR tasks and their evaluation. 'Evaluate'
traditional evaluation methodologies.
Consider how evaluation can be prescriptive of IR
design Move towards balanced approach
(system versus user) Understand how
interaction affects evaluation. Support the
move from static to dynamic evaluation.
Understand how new media affects evaluation.
Make evaluation methods more practical for
smaller groups. Spawn new projects to
develop new evaluation frameworks
32
MIRA Approaches
Developing methods and tools for evaluating
interactive IR. Possibly the most important
activity of all. User tasks Studying real
users, and their overall goals. Improve user
interfaces is to widen the set of users
Develop a design for a multimedia test
collection. Get together collaborative
projects. (TREC was organized as competition.)
Pool tools and data.
Write a Comment
User Comments (0)
About PowerShow.com