The Indexbased XXL Search Engine for Querying XML Data with Relevance Ranking

About This Presentation

Title:

The Indexbased XXL Search Engine for Querying XML Data with Relevance Ranking

Description:

... for each occurrence. 10. Element Content Index (cont) Big cat. Africa ... 64 different element names. 208,409 elements. 22. Ontology-based Similarity Search ... – PowerPoint PPT presentation

Number of Views:84

Avg rating:3.0/5.0

Slides: 25

Provided by: csWa9

Category:

more less

Transcript and Presenter's Notes

Title: The Indexbased XXL Search Engine for Querying XML Data with Relevance Ranking

1
The Index-based XXL Search Engine for Querying
XML Data with Relevance Ranking

Anja Theobald and Gerhard Weikum

Presented by Jianbin Wei CSC 8710 Nov. 4, 2003
2
Motivation

Proposed XML Query language, such as XQuery, is
of limited value for XML documents from different
sources
Tradition Web search engines do little about the
structure of XML documents
XXL (Flexible XML Search Language) considers both
of them.

3
Outline

Similarity query
Index support for XXL query
Query processing
Architecture of XXL
Conclusion

4
XXL Query

Exact-match condition
Animal-specimen cat
Similarity condition
Animal-species lion

5
Ontology-based Ontology
lowest common parent
sim(lion, brown bear) 1/1dist(lion, brown
bear)1/3.5
dist(lion, brown bear) siblingdist(big cat,
bear) length(lion, big cat)length(brown
bear, bear)2.5
siblingdist(big cat, bear) 1-2/20.5
6
Index Support for XXL Query

Element path index (EPI)
Element content index (ECI)
Ontology index (OI)

7
Element Path Index

Element name
List of its occurrences
Parents and children of each occurrence (depth is
2)
Attribute as children

8
Element Path Index (cont)
predator
big cat
bear
brown bear
polar bear
lion
tiger
Big cat (zoo.xml, predator, lion, tiger)
9
Element Content Index

Index of every word
Inverse document frequency
Occurrence (which element the word appears)

of elements containing this word
of total elements
frequency for each occurrence
10
Element Content Index (cont)
Zoo.xml
Big cat Africa Bear Africa
Bird
Africa 2/3 big cat, 10/100 bear, 1/100
11
Ontology Index

Used for similarity search and result ranking

ele term_1 term_k term_1 term_k are
the most similar elements to element ele
Lion tiger bear brown bear
12
Query Processing

Query decomposition
Evaluation order
Index-based sub-query evaluation
Result composition

13
Query Decomposition

Decompose query into sub-queries

Select Z From zoo.xml where zoo.name
detroit as Z and Z.species lion
14
Evaluation Order

Sub-queries are evaluated in the order in which
they appear in the origin query
Inside a sub-query, either top-down or bottom-up
matching can be used.

15
Sub-query Evaluation

Sub-query elementary condition without
Element Path Index returns exactly matched
results
Sub-query elementary condition with Ontology
Index returns results with similar terms, and
then Element Path Index returns results for these
terms
lion gt tiger, bear,

16
Sub-query Evaluation (cont)

Sub-query content condition without Element
Content Index returns exactly matched results
Sub-query content condition with Ontology
Index returns results with similar terms, and
then Element Content Index returns results for
these terms
Africa gt Asia, Europe,

17
Sub-query Evaluation (cont)