A Ranking Scheme for XML Information Retrieval Based on Benefit and Reading Effort - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

A Ranking Scheme for XML Information Retrieval Based on Benefit and Reading Effort

Description:

A Ranking Scheme for XML Information Retrieval Based on Benefit and Reading Effort Toshiyuki Shimizu (Kyoto University) Masatoshi Yoshikawa (Kyoto University) – PowerPoint PPT presentation

Number of Views:155
Avg rating:3.0/5.0
Slides: 25
Provided by: SHIMIZU3
Category:

less

Transcript and Presenter's Notes

Title: A Ranking Scheme for XML Information Retrieval Based on Benefit and Reading Effort


1
A Ranking Scheme for XML Information Retrieval
Based on Benefit and Reading Effort
  • Toshiyuki Shimizu (Kyoto University)
  • Masatoshi Yoshikawa (Kyoto University)

ICADL 2007 12th December
2
XML-IR systems
  • Growing demand for XML Information Retrieval
    (XML-IR) Systems
  • We can identify meaningful document fragments by
    encoding documents in XML
  • ex) Sections, subsections and paragraphs
    in scholarly articles
  • Browsing only document fragments relevant to a
    certain topic
  • The most simple form of queries for XML-IR is
    just a set of keywords
  • Simple, intuitively understandable, yet useful
    form of queries, especially for unskilled
    end-users
  • Active research area as in INEX

INitiative for the Evaluation of XML Retrieval
(http//inex.is.informatik.uni-duisburg.de/)
3
Results of XML-IR Systems
lt?xml version"1.0"?gt ltarticlegt ltsecgt
ltpgtXML labelinglt/pgt ltpgtThe structure of XML
is a tree, and each node in the XML is
labeled.lt/pgt ltpgtWe can get tag name of
each XML element.lt/pgt lt/secgt ltsecgt
ltpgtTree indexlt/pgt ltpgtXML index is
constructed using the labelslt/pgt
lt/secgt lt/articlegt
  • Document fragment (element)
  • With relevance degree (Score)
  • ex) Query term was XML

e0
0.56
article
0.35
0.64
e1
e5
sec
sec
0.4
0.9
0
0.33
0.8
e6
e7
e3
e2
e4
p
p
p
p
p
Score
4
Naïve XML-IR System
  • Thorough strategy of INEX 2005
  • Simply retrieves relevant elements from all
    elements and ranks them in order of relevance

e3 (0.9) e7 (0.8) e1 (0.64) e0
(0.56) e2 (0.4) e5 (0.35) e4 (0.33)
Score
e0
0.56
article
0.64
0.35
e1
e5
sec
sec
0.4
0.9
0
0.33
0.8
e7
e6
e3
e2
e4
p
p
p
p
p
  • Thorough is considered for system evaluation
  • User behavior of browsing search results must be
    considered

5
Problems of Thorough Retrieval for XML-IR
  • Nesting elements
  • Browsing both elements is useless
  • Ancestor element ea ? Descendant element ed
  • ed has been fully seen
  • Descendant element ed ? Ancestor element ea
  • ea has been partially seen before
  • Element size
  • Elements retrieved by XML-IR systems varies
    widely in size
  • Large element, such as article (whole document)
  • Small element, such as p (paragraph)
  • Total output size of top-k elements is
    uncontrollable by simply giving an integer k

6
Overview of our Approach
  • Introduction of the concepts of benefit and
    reading effort
  • Users can control the total output size
  • Systems can retrieve non-overlapping elements

7
Properties of Benefit and Reading Effort (1/2)
  • Benefit
  • The benefit of an element is the amount of gain
    about the query by reading the element
  • Assumption 1 The benefit of an element is
    greater than or equal to the sum of the benefit
    of the child elements
  • Information complementation among sibling
    elements
  • ex) For two query terms A and B e6 contains
    topics about A e7 contains topics about B ?
    The benefit of e5 seems to be greater than
    the sum of benefit of e6 and e7

e5
sec
e7
e6
p
p
8
Properties of Benefit and Reading Effort (2/2)
  • Reading Effort
  • The reading effort of an element is the amount of
    cost by reading the content of the element
  • Assumption 2 The reading effort of an element is
    less than or equal to the sum of the reading
    effort of the child elements
  • Readability of continuous reading
  • ex) Users can read the same content more easily
    by reading e5 rather than separate e6 and e7

e5
sec
e7 e6
e5
e7
e6
p
p
9
Overview of our Approach
  • Introduction of the concepts of benefit and
    reading effort
  • Users can control the total output size
  • Systems can retrieve non-overlapping elements
  • Flexible retrieval
  • Users specify a threshold for the total amount of
    reading effort
  • The systems return relevant elements that provide
    larger benefit and that can be read within
    specified reading effort

10
Flexible Retrieval
  • Systems calculate benefit and reading effort
  • A variant of knapsack problems
  • ex) Threshold of reading effort 15

? Retrieve e2, e3 (Total benefit 11)
e0
article
e1
e5
sec
sec
e3
e2
e4
e7
e6
p
p
p
p
p
11
Flexible Retrieval
  • Systems calculate benefit and reading effort
  • A variant of knapsack problems
  • ex) Threshold of reading effort 20

? Retrieve e3, e7 (Total benefit 17)
e0
article
e1
e5
sec
sec
e3
e2
e4
e7
e6
p
p
p
p
p
12
Search Result Continuity
ex) reading effort 15 ? Retrieve e2, e3
(benefit 11) reading effort 20 ?
Retrieve e3, e7 (benefit 17)
  • The running example violate search result
    continuity
  • The content of element set for reading effort r
    must be contained in the content of element set
    for reading effort r if r lt r
  • The optimal solution
  • is NP-hard (A variant of knapsack problems)
  • may violate search result continuity
  • Greedy retrieval algorithm

13
Retrieval Algorithm
  • Based on the result of Thorough strategy
  • Adjust benefit and reading effort for nesting
    elements of retrieved element, and rerank
  • Remove overlapping contents by nestings

Simply retrieves relevant elements from all
elements and ranks them in order of relevance
e0
Result of Thorough
0.56
article
e3 (0.9) e7 (0.8) e1 (0.64) e0
(0.56) e2 (0.4) e5 (0.35) e4 (0.33)
e1
e5
0.64
0.35
sec
sec
0
0.4
e3
e2
e4
e7
e6
0.9
0.8
0.33
p
p
p
p
p
14
Retrieval Algorithm
Result of Thorough
Our result
e3 (0.9)
e3 (0.9)
Amount of benefit 9
e7 (0.8)
Amount of reading effort 10
e1 (0.64)
e1 (0.5)
Adjust e1 , e0
e0
e0 (0.56)
e0 (0.48)
0.48
0.56
e2 (0.4)
article
e5 (0.35)
e4 (0.33)
e1
e5
0.64
0.35
0.5
sec
sec
Threshold of reading effort 40
0
0.4
e3
e2
e4
e7
e6
0.9
0.8
0.33
p
p
p
p
p
15
Retrieval Algorithm
Result of Thorough
Our result
e3 (0.9)
e3 (0.9)
Amount of benefit 17
Amount of benefit 9
e7 (0.8)
Amount of reading effort 20
Amount of reading effort 10
e7 (0.8)
e1 (0.5)
e0
e0 (0.48)
e0 (0.37)
0.37
0.48
Adjust and rerank e5 , e0
e2 (0.4)
article
e5 (0.35)
e5 (0)
e4 (0.33)
e1
e5
0.35
0
0.5
sec
sec
Threshold of reading effort 40
0
0.4
e3
e2
e4
e7
e6
0.9
0.8
0.33
p
p
p
p
p
16
Retrieval Algorithm
Result of Thorough
Our result
e3 (0.9)
e3 (0.9)
Amount of benefit 26
Amount of benefit 17
e7 (0.8)
Amount of reading effort 38
Amount of reading effort 20
e7 (0.8)
e1 (0.5)
e1 (0.5)
e0
e2 (0.4)
0.17
0.37
e0 (0.37)
e0 (0.17)
article
Adjust and rerank e0
e4 (0.33)
e5 (0)
e1
e5
0.5
0
sec
sec
Threshold of reading effort 40
0
0.4
e3
e2
e4
e7
e6
0.9
0.8
0.33
p
p
p
p
p
17
Retrieval Algorithm
Result of Thorough
Our result
e3 (0.9)
e7 (0.8)
Amount of benefit 26
Amount of benefit 26
e7 (0.8)
e1 (0.5)
Amount of reading effort 38
Amount of reading effort 38
e1 (0.5)
e0
e2 (0.4)
0.17
article
e4 (0.33)
e0 (0.17)
e5 (0)
e1
e5
0.5
0
sec
sec
Threshold of reading effort 40
0
0.4
e4
e3
e2
e7
e6
0.9
0.8
0.33
p
p
p
p
p
18
Retrieval Algorithm
Result of Thorough
Our result
e3 (0.9)
e7 (0.8)
Amount of benefit 26
Amount of benefit 26
e7 (0.8)
e1 (0.5)
Amount of reading effort 38
Amount of reading effort 38
e1 (0.5)
e0
e2 (0.4)
0.17
article
e4 (0.33)
e0 (0.17)
e5 (0)
e1
e5
0.5
0
sec
sec
Threshold of reading effort 40
0
0.4
e4
e3
e2
e7
e6
0.9
0.8
0.33
p
p
p
p
p
19
Retrieval Algorithm
Our result
Result of Thorough
e3 (0.9)
e7 (0.8)
Amount of benefit 26
Amount of benefit 26
e7 (0.8)
e1 (0.5)
Amount of reading effort 38
Amount of reading effort 38
e1 (0.5)
e0
e2 (0.4)
0.17
article
e4 (0.33)
e0 (0.17)
e5 (0)
e1
e5
0.5
0
sec
sec
Threshold of reading effort 40
0
0.4
e4
e3
e2
e7
e6
0.9
0.8
0.33
p
p
p
p
p
20
Evaluation Metrics
  • Based on benefit and reading effort
  • b/e graph (benefit/effort graph)
  • Comparison with BTIL (Best Thorough Input List)
  • BTIL system is the system which use actual
    benefit and reading effort
  • Actual benefit is calculated using manually
    constructed assessments (e.g. INEX)
  • We can observe relative effectiveness of benefit
    changing the specified threshold of reading
    effort
  • Use the same values for reading effort between
    implemented system and BTIL system

21
e0
e0
article
article
e1
e1
e5
e5
sec
sec
sec
sec
e3
e2
e4
e7
e3
e2
e6
e4
e7
e6
p
p
p
p
p
p
p
p
p
p
Calculated benefit / reading effort
Actual benefit / reading effort
For the threshold value 30 of reading effort
BTIL system retrieves e3, e6 Obtained actual
benefit is 23
Implemented system retrieves e3, e7Obtained
actual benefit is 10
22
Examples of b/e Graph using INEX 2005 Test
Collection (1/2)
  • XML document set, Topics, Assessments
  • Calculate actual benefit and reading effort from
    Assessments
  • ex (Exhaustivity) Highly exhaustive (HE) ? 1
    Partially exhaustive (PE) ? 0.5 Not
    exhaustive(NE) ? 0
  • rsize relevant text length (in number of
    characters)
  • size element length (in number of characters)
  • We implemented a system using tf-ief
  • ief stands for inverse element frequency
  • satisfies Assumptions for benefit and reading
    effort

parameter
23
Examples of b/e Graph using INEX 2005 Test
Collection (2/2)
Topic 207
Topic 206
  • We can observe relative effectiveness of
    implemented systems against BTIL system

24
Conclusions and Future Works
  • Conclusions
  • Introduction of benefit and reading effort
  • Handling nesting elements
  • Variety of element size
  • Algorithm for flexible retrieval
  • Result elements change depending on the specified
    reading effort
  • System evaluation
  • Future Works
  • Introduction of switching effort
  • Cost of switching a result item in the results
    list
  • Retrieving numerous results increases the cost of
    browsing
  • Integration with user interface
Write a Comment
User Comments (0)
About PowerShow.com