Information Retrieval - PowerPoint PPT Presentation

About This Presentation
Title:

Information Retrieval

Description:

Modern Information Retrieval by Ricardo Baeza-Yates and Berthier ... cars, Le Mans, France, tourism. Retrieval. Browsing. Database. CSE 8337 Spring 2003. 7 ... – PowerPoint PPT presentation

Number of Views:999
Avg rating:3.0/5.0
Slides: 15
Provided by: bert193
Learn more at: https://s2.smu.edu
Category:

less

Transcript and Presenter's Notes

Title: Information Retrieval


1
Information Retrieval
  • CSE 8337
  • Spring 2003
  • Introduction/Overview
  • Material for these slides obtained from
  • Modern Information Retrieval by Ricardo
    Baeza-Yates and Berthier Ribeiro-Neto
    http//www.sims.berkeley.edu/hearst/irbook/
  • Data Mining Introductory and Advanced Topics by
    Margaret H. Dunham
  • http//www.engr.smu.edu/mhd/book

2
Motivation
  • IR representation, storage, organization of, and
    access to information items
  • Focus is on the user information need
  • User information need
  • Find all docs containing information on college
    tennis teams which (1) are maintained by a USA
    university and (2) participate in the NCAA
    tournament.
  • Emphasis is on the retrieval of information (not
    data)

3
DB vs IR
  • Records (tuples) vs. documents
  • Well defined results vs. fuzzy results
  • DB grew out of files and traditional business
    systesm
  • IR grew out of library science and need to
    categorize/group/access books/articles

4
DB vs IR (contd)
  • Data retrieval
  • which docs contain a set of keywords?
  • Well defined semantics
  • a single erroneous object implies failure!
  • Information retrieval
  • information about a subject or topic
  • semantics is frequently loose
  • small errors are tolerated
  • IR system
  • interpret contents of information items
  • generate a ranking which reflects relevance
  • notion of relevance is most important

5
Motivation
  • IR in the last 20 years
  • classification and categorization
  • systems and languages
  • user interfaces and visualization
  • Still, area was seen as of narrow interest
  • Advent of the Web changed this perception once
    and for all
  • universal repository of knowledge
  • free (low cost) universal access
  • no central editorial board
  • many problems though IR seen as key to finding
    the solutions!

6
Basic Concepts
  • The User Task
  • Retrieval
  • information or data
  • purposeful
  • Browsing
  • glancing around
  • cars, Le Mans, France, tourism

7
Basic Concepts
Logical view of the documents Document
representation viewed as a continuum logical
view of docs might shift
8
The Retrieval Process
9
Fuzzy Sets and Logic
  • Fuzzy Set Set membership function is a real
    valued function with output in the range 0,1.
  • f(x) Probability x is in F.
  • 1-f(x) Probability x is not in F.
  • EX
  • T x x is a person and x is tall
  • Let f(x) be the probability that x is tall
  • Here f is the membership function

10
Fuzzy Sets
11
IR is Fuzzy
Reject
Reject
Accept
Accept
Simple
Fuzzy
12
Information Retrieval
  • Information Retrieval (IR) retrieving desired
    information from textual data.
  • Library Science
  • Digital Libraries
  • Web Search Engines
  • Traditionally keyword based
  • Sample query
  • Find all documents about data mining.

13
Information Retrieval
  • Similarity measure of how close a query is to a
    document.
  • Documents which are close enough are retrieved.
  • Metrics
  • Precision Relevant and Retrieved
  • Retrieved
  • Recall Relevant and Retrieved
  • Relevant

14
IR Query Result Measures
IR
Write a Comment
User Comments (0)
About PowerShow.com