Title: INFO624 Week 9 Effective Information Retrieval
1INFO624 -- Week 9Effective Information Retrieval
- Dr. Xia Lin
- Assistant Professor
- College of Information Science and Technology
- Drexel University
2Effective Information Retrieval
- Systems perspectives
- Fast indexing and retrieval algorithms
- Inverted indexing. Tree structures, Hash tables
- Semantic indexing and mapping
- Subject indexing
- Latent semantic indexing
- Intelligent information retrieval
- Knowledge representation
- Logical inferences
3Effective Information Retrieval
- Users perspectives
- Iteration
- Relevance Feedback
- Use User's Profiles
- Graphical Display of Search Results
- Browsing/Interactive Searching
- We cant change the user. We should make the
system to adapt to the users needs
4Iteration
- Most search needs to be done iteratively
- From the users point of view
- The first query often does not retrieve what the
user wants - The user needs to see the output of previous
queries to construct the next query - The user often needs to reconstruct his/her
information needs after they read/browse search
results.
5Iteration Users strategies
- Modify queries repeatedly based on some goals
- Starting with high precision
- Use a specific query first
- Broaden queries to include more relevant
documents - "pearl growing"
- Starting with high recall
- Use a very broad query
- Improve precision gradually
- "onion peeling"
- Starting with known items
- Find documents similar to the known items
- Browsing/interactive searching
6Iteration Systems strategies
- If the system can learn from the users
activities, the system likely can retrieve better
results to meet users needs. - Relevance feedback
- Users profiles
- The system should provide better output
representations to help the user - Browse
- Conduct interactive searches.
7Relevance Feedback
- Feedback The user provides information that the
system can use to modify its next search or next
display - Relevant Feedback
- Users let the system know
- what documents are relevant to their information
needs - What concepts or terms are related to their
information needs - What weights they would like the system to put on
each relevant documents/terms
8Relevant Feedback Systems Strategy
- The system should invite the user to select
relevant documents/terms from the retrieved
results before the second retrieval is conducted - The system should use information from user's
feedback to conduct next search.
9Design IR Systems with relevance feedback
- Collect relevance feedback through
- Binary vs. scales
- Positive and negative feedback
- Apply relevance feedback to
- Query
- Profile
- Document
- Retrieval algorithm
10User Profiles
- User profiles
- information about the users information needs
that IR system can use to modify its search
process. - Simple user profiles
- A list of terms that the user selects to
represent his/her information needs - A list of terms with weights
11- Extended user profiles
- More complex term structures
- Information use patterns
- levels of interests
- Users background information
- Users browsing behaviors
- What pages the user has visited last week, last
month, - From which page to which page
12Use of user Profiles
- Selective Dissemination of Information (SDI)
- The system regularly runs the search to get any
new information that matches users profiles. - The user can set up several profiles
- Once they are set up, the queries are always the
same. - The user can set the frequency of the update
searches.
13SDI
- Advantages of SDI
- Automatic retrieval of new information for the
user - Set up a profile once, use the profile for
retrieval many times. - The user can change the profiles or the search
frequency as needed. - Disadvantages of SDI
- The query based on the profile is static
- Timing problems
- Information in need is information indeed.
- Something I am very interest, but it did not come
at the time I want to read it.
14Use profiles during the search
- Modify the query
- When the user sends a query, the system
automatically adds some terms to the query from
the users profiles. - When the user sends a query, the system checks if
the query terms is in users profile. If it is,
increase the weight for the terms. - Organize the search results
- When the user sends a query, the system uses the
profiles information to organize the search
results (such as clustering, ranking, )
15Browsing
- Browsing is an act of human information seeking
- a mental process of identifying and choosing
information - a dynamic process that varies in time and depends
on intermediate results. - a part of process of decision making, problem
solving, etc.
16Browsing for Information Retrieval
- A kind of searching process in which the initial
search criteria or goals are only partly defined - general-purpose web browsing
- An art of not knowing what one wants until one
finds it - visual recognition
- content recognition
17Browsing for Information Retrieval
- A learning activity that emphasizes structures
and interactive process - exploratory
- movements based on feedback
- A process of finding and navigating in a unknown
or unfamiliar information space - becoming aware of new contents
- finding unexpected results
18Search or Browse?
- Would you like to search using a search engine or
would you like to browse from pages to pages (or
through a hierarchy)? - Depend on what?
19Factors of browsing
- Purposes
- Fact retrieval
- Concept formation or interpretation
- Current awareness
- Tasks
- Well-defined tasks
- Ill-defined tasks
- number of items to browse
20Factors of browsing
- Individual characteristics
- Motivation
- Experience and knowledge
- Cognitive styles
- Context
- Subject disciplines
- Organizational schemes
- Nature of text/information
- Medium
- Does the system support browsing?
21IR Systems that support browsing
- Good navigation tools
- Easy to move from one item to another
- Links
- good structures
- fast access
- Easy to back track
- Correct any errors
- make new selections
22IR Systems that support browsing
- Good displays
- easy to read
- meaningful orders of retrieval results
- graphical presentation
- Meaningful content organization
- contextual hierarchical structures
- Grouping of related items
- Contextual landmarks
23why just browse when you can fly?
- HotSauce is an innovative 3D fly-through
interface for navigating information spaces. It
was developed, largely as a one-man effort, by
Ramanathan V. Guha while at Apple Research in the
mid-1990s. HotSauce was a specific 3D
spatialization of the Meta Content Framework
(MCF) also developed by Guha.
24HotSauce
25Why Surf alone?
- What if you had an assistant always looking ahead
for you when browsing the web. - The assistant could warn you if the page was
irrelevant, could alert you if that link or some
other link merited your attention. - The assistant could save you time and
frustration.
CACM,44(8), p.71, 2001
26Information Agents
- a software that applies user profiles,
dynamically and intelligently, to search tasks - Search distributed, possibly heterogeneous
information resources on the users behalf. - Gather and integrate search results by some
Artificial Intelligence techniques - Accept users feedback and use the feedback to
modify the user profiles and search strategies
27Architecting Browsable Websites
- Design site structures
- Metaphor Exploration
- Organizational metaphors
- Functional metaphors
- Visual metaphors
- Define Navigation
- Global navigation
- Local navigation
- Design Document
28Interactive Systems
- When an interactive system is well-designed, the
interface almost disappears, enabling users to
concentrate on their work, exploration, or
pleasure. - Ben Shneiderman
29Design Principles
- Offer informative feedbacks
- Relationships between query and documents
retrieved - Relationships among retrieved documents
- Relationships between metadata and documents
- Reducing working memory load
- Keep tracks of choices made during the search
process - Allow user to return temporarily abandoned
strategies or jump from one strategy to another - Retain information and context across search
session.
30- Provide alternative interfaces for novice and
expert users. - Simplicity vs. power
31Output Presentation for Search engines
- Two major issues
- What information to present?
- How to organize the output items?
- Information in the output display
- Traditional databases
- Document reference numbers (unique number)
- Citations (author, title, source)
- Document surrogate (citation plus abstract and/or
indexing terms) - fulltext
32- On the web
- title, url
- First few sentences/related sentences/summaries
- Dates / page sizes
- Degree of relevance
- special links
- find similar one
- Types of links
- Related categories
33- What other information you may wish to have in
the retrieval output? - Citations (or links from this document)?
- Critique or evaluation?
- Access information (how many times it was
accessed in last 6 months)? - Links to this document
- Author contact information ?
- Why documents were retrieved?
34Output organization
- Linear
- a list of documents
- listed by
- best match
- alphabetical orders
- dates
- order of selected fields (authors, titles, web
sites)
35- Linear display
- Practical and most popular
- easy to generate
- users know how to use it
- Did not shown relationships among documents!
- Document relationships are more complex than a
linear one
36- Hierarchical display
- Separate data into different levels or branches
- Branches can be expanded/collapsed.
- Show more data in less space
- Show the organization of the data
37- Graphical displays
- Show more complex relationships
- Use location, colors, dimensions, etc to
represent documents, terms or concepts. - Provide more interactive functions
38What is IV?
System-centered View
- The use of computer-supported, interactive,
visual representations of abstract data - to assist navigation in large information spaces
- to reveal complex information structures
- to amplify cognition
User-centered
39IV and IR
- Both need to process a large amount of
information - Both are tools to assist the cognitive process of
finding, learning, and understanding information. - Both face the challenge of uncertainty
- Not an Exact science
- Both subject to humans interpretation.
40VIRI -- Visual Information Retrieval Interfaces
- 2-dimensional graphical display
- use graphical objects (icons, dots etc.) to
represent documents - Use geographical relationships to indicate
document relationships - use colors to group/differentiate documents
- use animation to assist interaction
41Concept Visualization
- AltaVista LiveTopic
- HiBrowse Interface
- SemioMap
- Hyperbolic Trees
- Visual Thesaurus
- Visual Concept Explorer
42Alta Vistas LiveTopic
43ConceptSpace
44HiBrowse Interface
45SemioMap
46Inxight.com
47Topic Maps
- Highwire http//www.highwire.org
48Visual Thesaurus
49Visual Concept Explorer
50Concept Mapping
51(No Transcript)
52MedLine Search
53IBM Visualization Space
- This information system understands the user.
- It "hears" users' voice commands and "sees"their
gestures and body positions. Interactions are
natural, more like human-to-human interactions.
54Visual Search Engines
- TheBrain
- Mooter
- Kartoo
- MapStan
- Grokker
- ToughGraph
- StarNight
- NewsLink
55WebBrainhttp//www.webbrain.com/
56Mooter http//www.mooter.com/
57Kartoo http//www.kartoo.com/
58MapStan http//search.mapstan.com/
59Grokker
- http//www.groxis.com/service/grok/g_products.html
60Touchgraph
- http//www.touchgraph.com/
61(No Transcript)
62Starrynight from RHIZOME
63Galaxy of News Rennison 95
64Galaxy of News Rennison 95
65Map of Information Scientists
66Author Mapping
67AuthorLink
68NewsLink
- Integrate
- And cross mapping
- Mapping on topics displaying by people
- Mapping on people display by organization
- Etc.
- NewsLink http//project.cis.drexel.edu/lexislink/
-
69Discussion
- Information Visualization
- What works and what does not?
70VIRI
- Advantages
- More representational power
- show more information in a limited screen space
- many different ways to group documents
- can put both keywords and documents in the same
2-dimensional space - Provide good overview
- Provide more interaction
71VIRI
- Disadvantages
- Difficult to generate
- Not always easy to understand
- Many not be specific enough
- Hard to use
72Evaluation of IR Systems
- Using Recall Precision
- Conduct query searches
- Try many different queries
- Results may depend on sampling queries.
- Compare results of Precision Recall
- Recall Precision need to be considered together.
73How to calculate Recall
- Determine recall for the whole collection
- Take a random sample to estimate
- Use a broad query to select a sample collection
for the estimation - Use seed documents
- Use relative recall
- Use two more expert searches as the base.
- Use one system as the base to estimate recall on
other systems - Use a small test collection
- Use experts to judge relevance of every document.
- Prepare special collections.
74Functionalities
- Precision and Recall are particularly useful for
evaluating searching/indexing algorithms, and
system features. - Compare P R with and without fuzzy search.
- Compare P R with different type of indexing
options - Compare P R across systems with the sample
features - Precision and Recall are query-oriented, not
system-oriented.
75Evaluation without P R
- The emphasis should be on the user and the
interaction. - Be specific on data collection
- How data are collected and indexed?
- Is there a quality control for the data
collection? - Be creative on the test questions and methods
- Not just questionnaires
- Be selective on subject groups
76Quality Evaluation
- Data quality
- Coverage of database
- It will not be found if it is not in the
database. - Completeness and accuracy of data
- Indexing methods and indexing quality
- It will not be found if it is not indexed.
- indexing types
- currency of indexing ( Is it updated often?)
- indexing sizes
77Interface Consideration
- User friendly interface
- How long does it take for a user to learn
advanced features? - How well can the user explore or interact with
the query output? - How easy is it to customize output displays?
78User Satisfaction
- User satisfaction
- The final test is the user!
- User satisfaction is more important then
precision and recall - Measuring user satisfaction
- Survey
- Use statistics
- User experiments
79User Experiments
- Observe and collect data on
- System behaviors
- User search behaviors
- User-system interaction
- Interpret experiment results
- for system comparisons
- for understanding users information seeking
behaviors - for developing new retrieval systems/interfaces