DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT

Description:

Focus on retrieval of relevant elements rather than entire document ... EXPERIMENTS DONE. All-element and dynamic/flexible retrieval experiments and results ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 22
Provided by: anup88
Category:

less

Transcript and Presenter's Notes

Title: DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT


1
DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED
ENVIRONMENT
  • MAYURI UMRANIKAR

2
CONTENTS
  • Introduction
  • Retrieval Environment
  • - The Vector Space Model
  • - INEX Environment
  • - Flexible Retrieval System
  • Method Used for Retrieval
  • - Document Tree Construction
  • - Ranking of Elements
  • - Output
  • Experiments
  • Conclusions

3
INTRODUCTION
  • Extensible Markup Language (XML) preferred for
    representing documents and due to increase of
    documents, issue of element retrieval arises
  • Focus on retrieval of relevant elements rather
    than entire document
  • INEX INitiative for Evaluation of XML Retrieval
  • Flexible Mechanisms
  • Different Approaches
  • Term Weighting

4
RETRIEVAL ENVIRONMENT
  • 2 Factors Issues when focus moves from
    documents to components and Saltons Vector Space
    Model
  • Vector Space Model Weight number of times a
    term occurs in the document
  • Foxs Extended Vector Space Model Incorporation
    of objective identifiers
  • Document vector consists of subvectors
  • Contain text independently indexed, weighted,
    searched and retrieved
  • Term Weighting weighting within subjective
    vectors
  • Smart Experimental Retrieval System

5
INEX ENVIRONMENT
  • Content Only (CO) ignore document structure,
    like typical queries, specify only content of
    search
  • Content and Structure (CAS) explicitly refer to
    structure, exhaustive and specific
  • CO query directly to user, CAS additional
    filtering and search of body portion
  • CAS returns rank ordered list of elements
  • INEX-EVAL uses measures of recall and precision
  • ( fig, exhaustivity, specificity mapped to a
    single relevance)
  • results are ranked

6
FLEXIBLE RETRIEVAL SYSTEM
  • Smart Format documents and topics translated,
    indexed as extended vectors
  • Subjective vectors contain content bearing
    terms
  • Objective vectors serve as filters on result
    returned by CAS queries
  • Extended vector subjective vector, terms having
    a paragraph in body subvector
  • Lnu-ltu weighting
  • Dynamic flexible retrieval- tree representation,
    rank ordered list by lnu weights

7
METHOD FOR FLEXIBLE RETRIEVAL
  • Input Query Q given and paragraph, retrieve
    rank ordered list, terminal modes
  • N top ranked paragraphs as input selected
  • Set of paragraphs used to identify documents
    elements generated and returned as output
  • Document Tree Needs information of structure
  • Terminal nodes
  • Pre-order traversal
  • Terminal nodes found in paragraph index

8
SIMPLE XML DOCUMENT AND ITS SCHEMA
9
CONSTRUCTION OF DOCUMENT TREE
  • For query Q, n top ranked paras used to build
    trees
  • Leaf elements or terminal nodes - paragraph nodes
  • Each leaf represented by term-freq weighted
    frequency vector
  • 1st gather all leaf nodes, terminal nodes done
  • 2nd merge children vectors for parents
  • Document schema determine merging
  • Parent unique terms of children, term freq
    weighted parent vector( has content of children)
  • Process in recursive manner done

10
RANKING OF ELEMENTS
  • Set of elements of document tree generated
  • Problem- structured retrieval rank ordered list
    of elements
  • Method used All-element index( separate
    representation for each element of each document
    and weighting information)
  • Lnu weights - elements variable length, do not
    require global frequency
  • Normalization and length failing results in
    biased values
  • Pivot document length probability of relevance
    probability of retrieval
  • Slope- amount of tilting
  • Pivoted Normalization reduces difference
  • Lnu term weights
  • ((1log(term_freq))/ (1log(avg_term_freq))
    )/((1-slope)slope((no_unique_terms)/pivot)

11
  • Ltu weighting N collection size, nk no of
    elements
  • ((1log(term_freq))/log(N/nk))/
  • ((1-slope)slope(no_unique_terms)/pivot))
  • N,nk element dependent, should be known through
    indexing
  • We move up N count elements of each type
  • Nk inverted file entry in paragraph index,
    mapping identifiers and xpaths (given)

12
OUTPUT OF FLEXIBLE RETRIEVAL
  • Select another leaf node, gather siblings,
    construct document tree, calculate Lnu term
    weights, Ltu weighted query produce another rank
    ordered list
  • After n top ranked exhausted, last list produced,
    merge lists
  • Single set of elements rank ordered correlation
    Q
  • Comparison flexible retrieval all-element
    index
  • identical set of n paragraphs i/p to
    flexible retrieval have all paragraphs same
    values used for Lnu-ltu

13
ALGORITHM
14
EXPERIMENTS
  • Paragraph result set of extended vectors
    representing paragraph
  • CO subvector represents subjective portion,
    body subvector important (content of element and
    not type) contained in body
  • Tree Representation

15
FACTORS OF INTEREST
  • Slope, pivot for Lnu-ltu
  • Effective structure retrieval
  • Can be determined empirically, applied from one
    collection to other Generic
  • N- no of paragraphs input, sets upper bound on
    number per query
  • Actual trees depend on number of paragraphs
    having same group or same document

16
EXPERIMENTS DONE
  • All-element and dynamic/flexible retrieval
    experiments and results
  • - body-only retrieval
  • Correlation between element and query vector
    produced correlation of body elements only
  • Table 1

17
RESULTS
  • Tables

18
(No Transcript)
19
  • Result equivalent
  • Flexible more efficient file space
  • Time required for indexing is half
  • Dynamic- Per query basis cost more n total
    trees not exact required specified
  • Another factor value of nk

20
DISCUSSIONS AND CONCLUSIONS
  • Flexible retrieval dynamically, rank ordered list
    of elements, single indexing at level - basic
    indexing node (paragraph)
  • Basic functions- SMART extended vector model
  • Results flexible capabilities
  • Attempt to incorporate other subvectors, internal
    node, weight
  • INEX exhaustivity and specificity results
    exhaustive specificity research going on
    results are reflection
  • It is the better way of retrieval than
    all-indexing

21
THANK YOU!!!
Write a Comment
User Comments (0)
About PowerShow.com