Modern Information Retrieval - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

Modern Information Retrieval

Description:

... file, the vocabulary is stored in lexicographical order with a pointer for each ... An array containing all the pointers to the suffixes in lexicographical order ... – PowerPoint PPT presentation

Number of Views:162

Avg rating:3.0/5.0

Slides: 23

Provided by: Ken124

Category:

Tags: information | lexicographical | modern | retrieval

Transcript and Presenter's Notes

Title: Modern Information Retrieval

1
Modern Information Retrieval

Chapter 8 Indexing and Searching

2

It is worthwhile building and maintaining an
index when the text collection is large and
semi-static
semi-static not often updated
consider search cost, space overhead,
construction cost, and maintenance cost

3

Inverted file
a word-oriented index
vocabulary the set of all different words in the
text
occurrences lists of the text positions where
the words appear
the positions can refer to words or characters

4

5

the space required for the vocabulary is rather
small while the occurrences demand much more
space
between 30 and 40 of the text size
block addressing reduces space overhead to 5

6

if the exact occurrence positions are required,
an online search over the qualifying blocks has
to be performed

7

searching the inverted file
vocabulary search the words present in the query
are separately searched in the vocabulary
retrieval of occurrences the lists of the
occurrences of all the words found are retrieved

8

manipulation of occurrences the lists are
traversed in synchronization to find places where
all the words appear in sequence for a phrase
query or appear close enough for a proximity
query
how to efficiently manipulate the occurrences
when block addressing is used?

9

constructing the inverted file

10

once constructed, it is written to disk in two
files
the lists of occurrences are stored contiguously
in the first file
in the second file, the vocabulary is stored in
lexicographical order with a pointer for each
word to its list in the first file

11

Suffix tree and suffix array
can be used to index any text character
allow to answer efficiently more complex queries
index points are selected form the text, which
point to the beginning of the text positions
which will be retrievable
each position is considered as a text suffix
each suffix is uniquely identified by its position

12

13

a suffix tree is a trie data structure built over
all the suffixes of the text
the pointers to the suffixes are stored at the
leaf nodes
the trie is compacted into a Patricia tree where
unary paths are compressed
an indication of the next character position to
consider is stored at the nodes which root a
compressed path

14

space overhead 120 to 240 over the text size

15

suffix arrays provide the same functionality with
much less space requirements
An array containing all the pointers to the
suffixes in lexicographical order
space requirements close to 40 overhead

16

allow binary searches done by comparing the
contents of each pointer
supra-index over the suffix array is used to
reduce the number of disk accesses
compare with an inverted file

17

processing phrase queries by searching the first
words of the phrases
processing proximity queries by searching all the
words in the queries
post-processing needed

18

Signature files
use a hash function to map words to bit masks of
B bits
a text is divided in blocks of b words each
a bit mask of size B is assigned to each block by
bitwise ORing the signatures of all the words in
the block

19

if a word is present in a block, all the bits set
in its signature are also set in the bit mask of
the block
when a bit is set in the mask of the query word
but not in the mask of the block, the word is not
present in the block

20

21

false drop all the corresponding bits are set
while the word is not in the block
signature file design principle make the
probability of a false drop low while keeping the
signature file as short as possible
searching a single word by hashing it to a bit
mask W, checking whether
, and verifying if the word
is actually there

22

process a phrase searching by bitwise ORing the
signatures of all the words in the query
the probability of false drops is reduced
care has to be exercised at block boundaries by
overlapping words in consecutive blocks

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

World's Best PowerPoint Templates PowerPoint PPT Presentation

World's Best PowerPoint Templates - CrystalGraphics offers more PowerPoint templates than anyone else in the world, with over 4 million to choose from. Winner of the Standing Ovation Award for “Best PowerPoint Templates” from Presentations Magazine. They'll give your presentations a professional, memorable appearance - the kind of sophisticated look that today's audiences expect. Boasting an impressive range of designs, they will support your presentations with inspiring background photos or videos that support your themes, set the right mood, enhance your credibility and inspire your audiences.

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Modern Information Retrieval PowerPoint PPT Presentation

Modern Information Retrieval - Modern Information Retrieval Chapter 2 Modeling Can keywords be used to represent a document or a query? keywords as query and matching as query processing cannot ... | PowerPoint PPT presentation | free to view

The Role of Admission Management Systems in Modern Schools PowerPoint PPT Presentation

The Role of Admission Management Systems in Modern Schools - Explore the transformative impact of Admission Management Systems in modern schools with our latest PPT. From streamlining application processes to fostering enhanced communication, discover how these systems revolutionize school admissions for efficiency and informed decision-making. | PowerPoint PPT presentation | free to view

Peer to Peer Information Retrieval PowerPoint PPT Presentation

Peer to Peer Information Retrieval - P2PIR is one of the an application of peer to peer network. P2PIR combines key elements of File Sharing and Federal Information Retrieval. No single technique is used for all P2PIR problem. Recall and Precision are used for Evaluation of P2PIR. A field dealing with the structure, analysis, organization, storage, searching and retrieval of information is called information retrieval. And Searching in peer-to-peer networks is called Peer to Peer Information Retrieval. | PowerPoint PPT presentation | free to view

Lecture 22: Interfaces for Information Retrieval I PowerPoint PPT Presentation

Lecture 22: Interfaces for Information Retrieval I - Lecture 22: Interfaces for Information Retrieval I SIMS 202: Information Organization and Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS | PowerPoint PPT presentation | free to view

Modern Information Retrieval PowerPoint PPT Presentation

Modern Information Retrieval - Modern Information Retrieval Chapter 9: Parallel and Distributed IR Section 9.1: Introduction Section 9.2.2.: MIMD Architectures Inverted Files November 5, 1999 | PowerPoint PPT presentation | free to view

CS276: Information Retrieval and Web Search PowerPoint PPT Presentation

CS276: Information Retrieval and Web Search - Introduction to Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Text Classification 1 Chris Manning, Pandu ... | PowerPoint PPT presentation | free to view

Evaluations in information retrieval PowerPoint PPT Presentation

Evaluations in information retrieval - Evaluations in information retrieval: summary The following gives an overview of approaches that are applied to assess the quality of information retrieval systems ... | PowerPoint PPT presentation | free to view

Evaluation of Information Retrieval Systems PowerPoint PPT Presentation

Evaluation of Information Retrieval Systems - Evaluation of Information Retrieval Systems. Evaluation of IR Systems ... The user wants to find a restaurant serving sashimi. User uses 2 IR systems. ... | PowerPoint PPT presentation | free to view

Information Management Information Retrieval PowerPoint PPT Presentation

Information Management Information Retrieval - Information retrieval is the process of locating the most ... e.g., According to Merriam-Webster, the synonyms for 'library' are 'archive' and 'athenaeum' ... | PowerPoint PPT presentation | free to view

QUERY AND DOCUMENT EXPANSION IN TEXT RETRIEVAL PowerPoint PPT Presentation

QUERY AND DOCUMENT EXPANSION IN TEXT RETRIEVAL - References Chapter 5 (Query Operations) in Baeza-Yates and Ribeiro-Neto, Modern Information Retrieval. A. Singhal and F. Pereira, ... | PowerPoint PPT presentation | free to view

Modern Information Retrieval Chapter 1: Introduction PowerPoint PPT Presentation

Modern Information Retrieval Chapter 1: Introduction - ... index operations to obtain answers are based on list-merging process. Example ... Data retrieval model. Advantage. clean formalism, simplicity. Disadvantage ... | PowerPoint PPT presentation | free to view

Cross Language Information Retrieval (CLIR) PowerPoint PPT Presentation

Cross Language Information Retrieval (CLIR) - Title: Cross Language Information Retrieval (CLIR) Author: Miguel Ruiz Last modified by: Lab-301 Created Date: 2/12/2003 4:51:16 PM Document presentation format | PowerPoint PPT presentation | free to view

Information Retrieval PowerPoint PPT Presentation

Information Retrieval - http://www.sims.berkeley.edu/~hearst/irbook ... 2. Operations TOC. Introduction. Relevance Feedback. Query Expansion. Term Reweighting ... | PowerPoint PPT presentation | free to view

Information Systems Analysis and Design PowerPoint PPT Presentation

Information Systems Analysis and Design - Information Systems Analysis and Design L Ng c Ti n http://tienhuong.wordpress.com ... | PowerPoint PPT presentation | free to view

Modern Information Retrieval: A Brief Overview PowerPoint PPT Presentation

Modern Information Retrieval: A Brief Overview - Starts from 3000BC with Sumerians. The major IR developments starts ... Okapi weighting. Pivoted normalization weighting. Document frequency. Document length ... | PowerPoint PPT presentation | free to view

First Annual Commonwealth Information Security Conference PowerPoint PPT Presentation

First Annual Commonwealth Information Security Conference - First Annual Commonwealth Information Security Conference | PowerPoint PPT presentation | free to view

Web Scale Information Discovery PowerPoint PPT Presentation

Web Scale Information Discovery - 'central' index. Automated Information Discovery ... http://www.loc.gov/z3950/agency/zing/zing-home.html. SRW (Search and Retrieval for the Web) ... | PowerPoint PPT presentation | free to view

Information Retrieval PowerPoint PPT Presentation

Information Retrieval - Modern Information Retrieval by Ricardo Baeza-Yates and Berthier ... cars, Le Mans, France, tourism. Retrieval. Browsing. Database. CSE 8337 Spring 2003. 7 ... | PowerPoint PPT presentation | free to view

CS276: Information Retrieval and Web Search PowerPoint PPT Presentation

CS276: Information Retrieval and Web Search - In relevance feedback, the user marks a number of documents as relevant/nonrelevant. We then try to use this information to return better search results. ... | PowerPoint PPT presentation | free to view

System Wide Information Management (SWIM) PowerPoint PPT Presentation

System Wide Information Management (SWIM) - System Wide Information Management (SWIM) 3rd Annual SWIMposium: An Informational Panel on the SWIM Program | PowerPoint PPT presentation | free to view

Information theory in the Modern Information Society PowerPoint PPT Presentation

Information theory in the Modern Information Society - Information theory in the Modern Information Society A.J. Han Vinck University of Duisburg/Essen January 2003 Vinck@exp-math.uni-essen.de content What is Information ... | PowerPoint PPT presentation | free to view

Information Technology Global Market Report 2019 PowerPoint PPT Presentation

Information Technology Global Market Report 2019 - The information technology market size is expected to reach $2.5 trillion by 2022, significantly growing at a CAGR of around 8% during the forecast period. | PowerPoint PPT presentation | free to view

Silicon Prairie Initiative on Robotics in Information Technology PowerPoint PPT Presentation

Silicon Prairie Initiative on Robotics in Information Technology - Silicon Prairie Initiative on Robotics in Information Technology Engineering Ethics | PowerPoint PPT presentation | free to view

Fundamentals of Information Technology PowerPoint PPT Presentation

Fundamentals of Information Technology - Fundamentals of Information Technology Categories of Computers Next Personal Computers (desktop) Mobile Computers and Mobile Devices Game Consoles Servers Mainframes ... | PowerPoint PPT presentation | free to view

Chapter 4 Information Technology in Business: Hardware PowerPoint PPT Presentation

Chapter 4 Information Technology in Business: Hardware - Title: Management Information Systems Author: Effy Oz Last modified by: Mirella Misiaszek Created Date: 6/29/1997 1:29:08 AM Document presentation format | PowerPoint PPT presentation | free to view

Modern Information Retrieval Chapter 5 Query Operations PowerPoint PPT Presentation

Modern Information Retrieval Chapter 5 Query Operations - global information derived from the document collection. User Relevance Feedback ... Consider the expression (su sv) where the symbol stands for disjunction. ... | PowerPoint PPT presentation | free to view

Information Technology Global Market Report 2018 PowerPoint PPT Presentation

Information Technology Global Market Report 2018 - Asia Pacific was the largest region in the information technology market in 2017, accounting for around 39% market share. North America was the second largest region accounting for around 27% market share. Africa was the smallest region accounting for around 2% market share. Read Report https://www.thebusinessresearchcompany.com/report/information-technology-global-market-report-2018 | PowerPoint PPT presentation | free to view