Information Retrieval Systems Insys300 - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Information Retrieval Systems Insys300

Description:

IR is a branch of applied computer science focusing on the acquisition, ... Where is bin Laden now? Challenges: Abstraction Principles. First Abstraction Principle ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 21
Provided by: xia52
Category:

less

Transcript and Presenter's Notes

Title: Information Retrieval Systems Insys300


1
Information Retrieval SystemsInsys300
2
Information Retrieval
  • IR is a branch of applied computer science
    focusing on the acquisition, organization,
    storage, retrieval, and distribution of
    information.
  • IR involves helping users find information that
    matches their information needs.
  • IR has become a center of the focus in the web
    era. Its theories, techniques, and applications
    have reached many fields where processing large
    amount of information is essential.

3
Challenges of IR
4
Examples
  • Who taught this course three years ago?
  • Challenges
  • How do you translate the question to a query?
  • What info. needed to store in the system or order
    for a search engine to find the answer?
  • Who took this course three years ago?
  • Challenges

5
Examples
  • Which IST courses are most useful?
  • Challenges
  • Where is bin Laden now?
  • Challenges

6
Abstraction Principles
  • First Abstraction Principle
  • Abstract data from the real world
  • And make them available to the system.
  • Second Abstraction Principle
  • Abstract the users information needs into a form
    the system understands.

7
Components of IR Systems
  • Human Components
  • Users -- who create the needs of the system (the
    user)
  • Organization -- who makes it possible to have the
    system (the funder)
  • Information professionals -- who operate the
    system and provide the services (the server)
  • System and Content Components
  • Data -- the content of the system
  • Device media -- hardware of the system
  • Algorithms procedures -- software of the system

8
Users
  • The user
  • anyone who need to find some information
  • The user groups
  • group by their knowledge of the system
  • novice users vs. experienced users
  • end users vs. information specialists
  • group by their domain knowledge
  • Domain experts vs. general public
  • group by information needs
  • need to locate a particular item
  • need some information
  • need all information on a subject

9
Users Information Needs
  • People depend on information to carry out their
    activities of daily life.
  • need to accomplish some goals
  • need to solve some problems
  • People realize a lack of information
  • perceive a gap in their knowledge state
  • ASK -- Anomalous State of Knowledge
  • desire to fill the gap

10
Reality
Goals
?
Reality
Goals
?
Reality
Goals
?
Reality
Goals
?
Reality
Goals
?
Info. Needs
Problems
Request
??
Queries
First Abstraction Principle
Info. Systems
Second Abstraction Principle
??
Data
11
Data and Information
  • Data
  • String of symbols associated with objects,
    people, and events
  • Values of an attribute
  • Data need not have meaning to everyone
  • Data must be interpreted with associated
    attributes.
  • Information
  • The meaning of the data interpreted by a person
    or a system
  • Data that changes the state of a person or system
    that perceives it.
  • Data that reduces uncertainty.
  • if data contain no uncertainty, there are no
    information with the data.

12
Information and Knowledge
  • Knowledge
  • Structured information
  • through structuring, information becomes
    understandable
  • Processed Information
  • through processing, information becomes
    meaningful and useful
  • information shared and agreed upon within a
    community

knowledge
Data
information
wisdom
13
Text and Information
  • Representation of natural language
  • both readers and authors language
  • Strings of ASCII symbols or Unicode
  • structured by the author
  • indexed by the publisher
  • Data or information?
  • If it can be understood, its information.
  • by Whom? A person or a system?

14
Textual Data
  • Repository of human intellectuals
  • Rich and diverse resources for all answers.
  • If it is written, it is there (in text)
  • Meaningful and understandable (to users).
  • Simple ASCII representation
  • Free of pre-formatted structures
  • continuous
  • separated into documents
  • Easy to process by the computer
  • Machine Intensive (not labor intensive)

15
Problems with Text
  • Massive
  • Any IR system needs the capability of large scale
    data processing.
  • Use of indexes and various representations are
    required.
  • Inconsistent
  • Its a human language
  • Same information expressed in different way
  • Different information expressed in similar ways.
  • Incomplete
  • It uses common knowledge.
  • Its an open system.

16
Retrieval
  • Retrieval
  • What do we retrieve?
  • Data
  • Information
  • Knowledge
  • We cant retrieve information!
  • We can only retrieve documents that contains text
    which carries information.
  • Information can be anywhere
  • in the text, in the links, in the process of text.

17
Information Retrieval
  • Are they the same?
  • Text retrieval
  • Document retrieval
  • Information retrieval

18
Information Retrieval
  • Conceptually, information retrieval is used to
    cover all related problems in finding needed
    information
  • Historically, information retrieval is about
    document retrieval, emphasizing document as the
    basic unit
  • Technically, information retrieval refers to
    (text) string manipulation, indexing, matching,
    querying, etc.

19
(No Transcript)
20
Information Retrieval Systems
  • The goal of IR systems is to help users find
    information that satisfies their information
    needs.
  • The process of IR systems is to match two
    abstractions
  • data abstracted in the system
  • queries abstracted from users information needs
  • Information retrieval is much more difficult than
    data retrieval
Write a Comment
User Comments (0)
About PowerShow.com