INFO624 Week 9 Effective Information Retrieval - PowerPoint PPT Presentation

1 / 79
About This Presentation
Title:

INFO624 Week 9 Effective Information Retrieval

Description:

We should make the system to adapt to the user's needs. Iteration ... Starrynight from RHIZOME. Galaxy of News. Rennison 95. Galaxy of News. Rennison 95 ... – PowerPoint PPT presentation

Number of Views:118
Avg rating:3.0/5.0
Slides: 80
Provided by: xlin2
Category:

less

Transcript and Presenter's Notes

Title: INFO624 Week 9 Effective Information Retrieval


1
INFO624 -- Week 9Effective Information Retrieval
  • Dr. Xia Lin
  • Assistant Professor
  • College of Information Science and Technology
  • Drexel University

2
Effective Information Retrieval
  • Systems perspectives
  • Fast indexing and retrieval algorithms
  • Inverted indexing. Tree structures, Hash tables
  • Semantic indexing and mapping
  • Subject indexing
  • Latent semantic indexing
  • Intelligent information retrieval
  • Knowledge representation
  • Logical inferences

3
Effective Information Retrieval
  • Users perspectives
  • Iteration
  • Relevance Feedback
  • Use User's Profiles
  • Graphical Display of Search Results
  • Browsing/Interactive Searching
  • We cant change the user. We should make the
    system to adapt to the users needs

4
Iteration
  • Most search needs to be done iteratively
  • From the users point of view
  • The first query often does not retrieve what the
    user wants
  • The user needs to see the output of previous
    queries to construct the next query
  • The user often needs to reconstruct his/her
    information needs after they read/browse search
    results.

5
Iteration Users strategies
  • Modify queries repeatedly based on some goals
  • Starting with high precision
  • Use a specific query first
  • Broaden queries to include more relevant
    documents
  • "pearl growing"
  • Starting with high recall
  • Use a very broad query
  • Improve precision gradually
  • "onion peeling"
  • Starting with known items
  • Find documents similar to the known items
  • Browsing/interactive searching

6
Iteration Systems strategies
  • If the system can learn from the users
    activities, the system likely can retrieve better
    results to meet users needs.
  • Relevance feedback
  • Users profiles
  • The system should provide better output
    representations to help the user
  • Browse
  • Conduct interactive searches.

7
Relevance Feedback
  • Feedback The user provides information that the
    system can use to modify its next search or next
    display
  • Relevant Feedback
  • Users let the system know
  • what documents are relevant to their information
    needs
  • What concepts or terms are related to their
    information needs
  • What weights they would like the system to put on
    each relevant documents/terms

8
Relevant Feedback Systems Strategy
  • The system should invite the user to select
    relevant documents/terms from the retrieved
    results before the second retrieval is conducted
  • The system should use information from user's
    feedback to conduct next search.

9
Design IR Systems with relevance feedback
  • Collect relevance feedback through
  • Binary vs. scales
  • Positive and negative feedback
  • Apply relevance feedback to
  • Query
  • Profile
  • Document
  • Retrieval algorithm

10
User Profiles
  • User profiles
  • information about the users information needs
    that IR system can use to modify its search
    process.
  • Simple user profiles
  • A list of terms that the user selects to
    represent his/her information needs
  • A list of terms with weights

11
  • Extended user profiles
  • More complex term structures
  • Information use patterns
  • levels of interests
  • Users background information
  • Users browsing behaviors
  • What pages the user has visited last week, last
    month,
  • From which page to which page

12
Use of user Profiles
  • Selective Dissemination of Information (SDI)
  • The system regularly runs the search to get any
    new information that matches users profiles.
  • The user can set up several profiles
  • Once they are set up, the queries are always the
    same.
  • The user can set the frequency of the update
    searches.

13
SDI
  • Advantages of SDI
  • Automatic retrieval of new information for the
    user
  • Set up a profile once, use the profile for
    retrieval many times.
  • The user can change the profiles or the search
    frequency as needed.
  • Disadvantages of SDI
  • The query based on the profile is static
  • Timing problems
  • Information in need is information indeed.
  • Something I am very interest, but it did not come
    at the time I want to read it.

14
Use profiles during the search
  • Modify the query
  • When the user sends a query, the system
    automatically adds some terms to the query from
    the users profiles.
  • When the user sends a query, the system checks if
    the query terms is in users profile. If it is,
    increase the weight for the terms.
  • Organize the search results
  • When the user sends a query, the system uses the
    profiles information to organize the search
    results (such as clustering, ranking, )

15
Browsing
  • Browsing is an act of human information seeking
  • a mental process of identifying and choosing
    information
  • a dynamic process that varies in time and depends
    on intermediate results.
  • a part of process of decision making, problem
    solving, etc.

16
Browsing for Information Retrieval
  • A kind of searching process in which the initial
    search criteria or goals are only partly defined
  • general-purpose web browsing
  • An art of not knowing what one wants until one
    finds it
  • visual recognition
  • content recognition

17
Browsing for Information Retrieval
  • A learning activity that emphasizes structures
    and interactive process
  • exploratory
  • movements based on feedback
  • A process of finding and navigating in a unknown
    or unfamiliar information space
  • becoming aware of new contents
  • finding unexpected results

18
Search or Browse?
  • Would you like to search using a search engine or
    would you like to browse from pages to pages (or
    through a hierarchy)?
  • Depend on what?

19
Factors of browsing
  • Purposes
  • Fact retrieval
  • Concept formation or interpretation
  • Current awareness
  • Tasks
  • Well-defined tasks
  • Ill-defined tasks
  • number of items to browse

20
Factors of browsing
  • Individual characteristics
  • Motivation
  • Experience and knowledge
  • Cognitive styles
  • Context
  • Subject disciplines
  • Organizational schemes
  • Nature of text/information
  • Medium
  • Does the system support browsing?

21
IR Systems that support browsing
  • Good navigation tools
  • Easy to move from one item to another
  • Links
  • good structures
  • fast access
  • Easy to back track
  • Correct any errors
  • make new selections

22
IR Systems that support browsing
  • Good displays
  • easy to read
  • meaningful orders of retrieval results
  • graphical presentation
  • Meaningful content organization
  • contextual hierarchical structures
  • Grouping of related items
  • Contextual landmarks

23
why just browse when you can fly?
  • HotSauce is an innovative 3D fly-through
    interface for navigating information spaces. It
    was developed, largely as a one-man effort, by
    Ramanathan V. Guha while at Apple Research in the
    mid-1990s. HotSauce was a specific 3D
    spatialization of the Meta Content Framework
    (MCF) also developed by Guha.

24
HotSauce
25
Why Surf alone?
  • What if you had an assistant always looking ahead
    for you when browsing the web.
  • The assistant could warn you if the page was
    irrelevant, could alert you if that link or some
    other link merited your attention.
  • The assistant could save you time and
    frustration.

CACM,44(8), p.71, 2001
26
Information Agents
  • a software that applies user profiles,
    dynamically and intelligently, to search tasks
  • Search distributed, possibly heterogeneous
    information resources on the users behalf.
  • Gather and integrate search results by some
    Artificial Intelligence techniques
  • Accept users feedback and use the feedback to
    modify the user profiles and search strategies

27
Architecting Browsable Websites
  • Design site structures
  • Metaphor Exploration
  • Organizational metaphors
  • Functional metaphors
  • Visual metaphors
  • Define Navigation
  • Global navigation
  • Local navigation
  • Design Document

28
Interactive Systems
  • When an interactive system is well-designed, the
    interface almost disappears, enabling users to
    concentrate on their work, exploration, or
    pleasure.
  • Ben Shneiderman

29
Design Principles
  • Offer informative feedbacks
  • Relationships between query and documents
    retrieved
  • Relationships among retrieved documents
  • Relationships between metadata and documents
  • Reducing working memory load
  • Keep tracks of choices made during the search
    process
  • Allow user to return temporarily abandoned
    strategies or jump from one strategy to another
  • Retain information and context across search
    session.

30
  • Provide alternative interfaces for novice and
    expert users.
  • Simplicity vs. power

31
Output Presentation for Search engines
  • Two major issues
  • What information to present?
  • How to organize the output items?
  • Information in the output display
  • Traditional databases
  • Document reference numbers (unique number)
  • Citations (author, title, source)
  • Document surrogate (citation plus abstract and/or
    indexing terms)
  • fulltext

32
  • On the web
  • title, url
  • First few sentences/related sentences/summaries
  • Dates / page sizes
  • Degree of relevance
  • special links
  • find similar one
  • Types of links
  • Related categories

33
  • What other information you may wish to have in
    the retrieval output?
  • Citations (or links from this document)?
  • Critique or evaluation?
  • Access information (how many times it was
    accessed in last 6 months)?
  • Links to this document
  • Author contact information ?
  • Why documents were retrieved?

34
Output organization
  • Linear
  • a list of documents
  • listed by
  • best match
  • alphabetical orders
  • dates
  • order of selected fields (authors, titles, web
    sites)

35
  • Linear display
  • Practical and most popular
  • easy to generate
  • users know how to use it
  • Did not shown relationships among documents!
  • Document relationships are more complex than a
    linear one

36
  • Hierarchical display
  • Separate data into different levels or branches
  • Branches can be expanded/collapsed.
  • Show more data in less space
  • Show the organization of the data

37
  • Graphical displays
  • Show more complex relationships
  • Use location, colors, dimensions, etc to
    represent documents, terms or concepts.
  • Provide more interactive functions

38
What is IV?
System-centered View
  • The use of computer-supported, interactive,
    visual representations of abstract data
  • to assist navigation in large information spaces
  • to reveal complex information structures
  • to amplify cognition

User-centered
39
IV and IR
  • Both need to process a large amount of
    information
  • Both are tools to assist the cognitive process of
    finding, learning, and understanding information.
  • Both face the challenge of uncertainty
  • Not an Exact science
  • Both subject to humans interpretation.

40
VIRI -- Visual Information Retrieval Interfaces
  • 2-dimensional graphical display
  • use graphical objects (icons, dots etc.) to
    represent documents
  • Use geographical relationships to indicate
    document relationships
  • use colors to group/differentiate documents
  • use animation to assist interaction

41
Concept Visualization
  • AltaVista LiveTopic
  • HiBrowse Interface
  • SemioMap
  • Hyperbolic Trees
  • Visual Thesaurus
  • Visual Concept Explorer

42
Alta Vistas LiveTopic
43
ConceptSpace
44
HiBrowse Interface
45
SemioMap
46
Inxight.com
47
Topic Maps
  • Highwire http//www.highwire.org

48
Visual Thesaurus
49
Visual Concept Explorer
50
Concept Mapping
51
(No Transcript)
52
MedLine Search
53
IBM Visualization Space
  • This information system understands the user.
  • It "hears" users' voice commands and "sees"their
    gestures and body positions. Interactions are
    natural, more like human-to-human interactions.

54
Visual Search Engines
  • TheBrain
  • Mooter
  • Kartoo
  • MapStan
  • Grokker
  • ToughGraph
  • StarNight
  • NewsLink

55
WebBrainhttp//www.webbrain.com/
56
Mooter http//www.mooter.com/
57
Kartoo http//www.kartoo.com/
58
MapStan http//search.mapstan.com/
59
Grokker
  • http//www.groxis.com/service/grok/g_products.html

60
Touchgraph
  • http//www.touchgraph.com/

61
(No Transcript)
62
Starrynight from RHIZOME
63
Galaxy of News Rennison 95
64
Galaxy of News Rennison 95
65
Map of Information Scientists
66
Author Mapping
67
AuthorLink
68
NewsLink
  • Integrate
  • And cross mapping
  • Mapping on topics displaying by people
  • Mapping on people display by organization
  • Etc.
  • NewsLink http//project.cis.drexel.edu/lexislink/

69
Discussion
  • Information Visualization
  • What works and what does not?

70
VIRI
  • Advantages
  • More representational power
  • show more information in a limited screen space
  • many different ways to group documents
  • can put both keywords and documents in the same
    2-dimensional space
  • Provide good overview
  • Provide more interaction

71
VIRI
  • Disadvantages
  • Difficult to generate
  • Not always easy to understand
  • Many not be specific enough
  • Hard to use

72
Evaluation of IR Systems
  • Using Recall Precision
  • Conduct query searches
  • Try many different queries
  • Results may depend on sampling queries.
  • Compare results of Precision Recall
  • Recall Precision need to be considered together.

73
How to calculate Recall
  • Determine recall for the whole collection
  • Take a random sample to estimate
  • Use a broad query to select a sample collection
    for the estimation
  • Use seed documents
  • Use relative recall
  • Use two more expert searches as the base.
  • Use one system as the base to estimate recall on
    other systems
  • Use a small test collection
  • Use experts to judge relevance of every document.
  • Prepare special collections.

74
Functionalities
  • Precision and Recall are particularly useful for
    evaluating searching/indexing algorithms, and
    system features.
  • Compare P R with and without fuzzy search.
  • Compare P R with different type of indexing
    options
  • Compare P R across systems with the sample
    features
  • Precision and Recall are query-oriented, not
    system-oriented.

75
Evaluation without P R
  • The emphasis should be on the user and the
    interaction.
  • Be specific on data collection
  • How data are collected and indexed?
  • Is there a quality control for the data
    collection?
  • Be creative on the test questions and methods
  • Not just questionnaires
  • Be selective on subject groups

76
Quality Evaluation
  • Data quality
  • Coverage of database
  • It will not be found if it is not in the
    database.
  • Completeness and accuracy of data
  • Indexing methods and indexing quality
  • It will not be found if it is not indexed.
  • indexing types
  • currency of indexing ( Is it updated often?)
  • indexing sizes

77
Interface Consideration
  • User friendly interface
  • How long does it take for a user to learn
    advanced features?
  • How well can the user explore or interact with
    the query output?
  • How easy is it to customize output displays?

78
User Satisfaction
  • User satisfaction
  • The final test is the user!
  • User satisfaction is more important then
    precision and recall
  • Measuring user satisfaction
  • Survey
  • Use statistics
  • User experiments

79
User Experiments
  • Observe and collect data on
  • System behaviors
  • User search behaviors
  • User-system interaction
  • Interpret experiment results
  • for system comparisons
  • for understanding users information seeking
    behaviors
  • for developing new retrieval systems/interfaces
Write a Comment
User Comments (0)
About PowerShow.com