Efficient processing of XPath queries with structured overlay networks - PowerPoint PPT Presentation

Loading...

PPT – Efficient processing of XPath queries with structured overlay networks PowerPoint presentation | free to download - id: 27238c-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Efficient processing of XPath queries with structured overlay networks

Description:

Contents. Motivation & Problem statement. P-Grid short overview. Indexing strategy. Basic Index ... There is one peer responsible for h(qB) answer the query, ... – PowerPoint PPT presentation

Number of Views:14
Avg rating:3.0/5.0
Slides: 23
Provided by: pcsy1
Learn more at: http://www.gleb.ch
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Efficient processing of XPath queries with structured overlay networks


1
Efficient processing of XPath queries with
structured overlay networks
OnTheMove - OTM 2005 Federated Conferences
Agia Napa, Cyprus. 2 November 2005.
  • Gleb Skobeltsyn, Manfred Hauswirth, Karl Aberer

Presenter Gleb Skobeltsyn
This work was (partially) funded by the EU
project BRICKS http//www.brickscommunity.org
2
Contents
  • Motivation Problem statement
  • P-Grid short overview
  • Indexing strategy
  • Basic Index
  • Caching strategy
  • Simulation results
  • Conclusions

3
Motivation
  • Complex queries are easy to answer in
    unstructured P2P networks, e.g., edutella.
  • But the approach doesnt scale because of the
    high bandwidth consumption.
  • Structured P2P networks typically offer
    logarithmic search complexity, but require a
    special index.
  • Indexing structure to support XPath queries over
    a distributed XML warehouse???

4
Problem statement (1/2)
  • Problem To be able to answer structured queries
    (e.g. XPath) in a XML warehouse distributed in a
    structured P2P network.
  • We assume using 2 different indices
  • For indexing structure (e.g. pure XML path)
  • For indexing values.
  • In this paper we concentrate on the first issue.

5
Problem statement (2/2)
  • We support XPath,// queries, i.e. queries
    containing
  • Child axes (/)
  • Descendant axis (//)
  • Wildcards ().
  • Example //A/B//C
  • We propose an indexing structure to answer such
    queries in a large distributed P2P XML warehouse
  • We try to minimize the consumed bandwidth
    measured in P2P overlay hops

6
P-Grid (1/3) introduction
  • P-Grid is a trie based DHT P2P, similar to Chord,
    Pastry, etc (more info at http//www.p-grid.org/)
    .
  • In P-Grid each peer is responsible for a set of
    binary keys which start from the peers prefix.
  • Routing is based on longest prefix matching (log
    search cost for skewed trees)

P-Grid
query for 100
found
C
A
D
7
P-Grid (2/3) storing indexing information
  • Information is stored in data items.
  • Data item is a key,data tuple.
  • Each peer in P-Grid network stores data items
    whose keys start from the peers prefix

1100
1101
110
11011
11011
8
P-Grid (3/3) order preserving hash function
  • Keys are generated using a P-Grid order
    preserving hash function h( )
  • Example the key h(comp) is a prefix for keys
    h(computer), h(complexity), h(comp).
  • Routing to the key h(comp) may lead to two
    cases

9
Basic Index (1/4) introduction
  • We index XML paths found in the document.
  • Given a path P l1/../lm, m data items are
    stored in P-Grid, using the following sub-paths
    (suffixes) as keys
  • l1/l2/.../lm, l2/.../lm, , lm
  • Each data item stores path and URI.
  • Example given a path P store/book/title,
  • 3 data items are created

Basic index
Key Original Path URI
h(store/book/title) store/book/title Link to the document
h(book/title) store/book/title Link to the document
h(title) store/book/title Link to the document
10
Basic Index (2/4) search
  • Given a XPath query Ql1s1l2..sk-1lk, where si
    /,//,.
  • The first longest sequence of labels divided by
    / is defined as qB.
  • Example for A//C/D//E qBC/D
  • The query is answered by routing to the peer
    responsible for h(qB).
  • There are 2 cases
  • There is one peer responsible for h(qB) answer
    the query,
  • There is a set (sub-tree) of peers responsible
    for h(qB) a shower broadcast is executed over
    this set.

11
Basic index (3/4) shower broadcast
  • Shower broadcast propagates a message (query)
    among all peers in the sub-tree
  • Recursive algorithm, works in parallel fashion
  • Each peer in the sub-tree is visited only once.


1
0
00
11
10
01
010
011
111
110
1100
1101
12
Basic Index (4/4) properties
  • Basic index is sufficient to answer XPath,//
    queries.
  • The shower broadcast consumes bandwidth, though
    efficient in time and distributes the computing.
  • The improvement is to cache the most frequent
    queries locally and avoid shower broadcasts for
    them.

13
Caching strategy (1/4) introduction
  • Types of queries
  • Queries that can be answered by one peer locally.
  • Example A/B/C//E at the peer responsible for
    h(A/B).
  • Queries that require additional broadcast and
    contain only one sub-path (qqB).
  • Example A at the peer responsible for
    h(A/B).
  • Queries that require additional broadcast and
    contain more than one sub-path (q?qB).
  • Example A//C//E at the peer responsible for
    h(A/C).
  • We suggest caching the most popular queries of
    the type 3 to reduce the number of shower
    broadcasts.

Caching strategy
14
Caching strategy (2/4) search
  • The key used for routing is no longer h(qB),
  • but qCconcat(Pl1, Pl2 Plk), where qBPl1
  • Example

P
A
C
E
D
qC
  • The query is routed to a relevant peer which may
    (or may not) answer the query form cache.
  • If the query is of the type 3 and cannot be
    answered locally, its result can be cached.
  • Similarly, the existing cache can be deleted.

15
Caching strategy (3/4) example
A//C/D//E
16
Caching strategy (4/4) analysis
  • A query is profitable to cache if
  • UpdateCostUpdateRate(subtree)ltSearchCost(subtree)
    SearchRate(query)
  • Where
  • UpdateCost the cost of one cache update (log N)
  • UpdateRate average update rate in the sub-tree
  • SearchCost the cost of search
    (routingbroadcast)
  • SearchRate the querys frequency (estimated
    locally)
  • The indexing strategy is adaptive to
    search/update ratio and tries to keep the
    messaging costs optimal.

gathered from neighbours
17
Simulations (1/4) testbed
  • Java application, stores data locally in a DBMS.
  • 50 XML documents, gt5k unique paths
  • 20k data items
  • In each experiment we used 10k queries randomly
    generated from the paths

Simulations
18
Simulations (2/4) search cost
  • Parameter t fraction of cachable queries
  • All cachable queries are cached

19
Simulations (3/4) search cost
  • 1000 peers
  • t0.5 (50 of queries can be cached)

20
Simulations (4/4) average costs
  • 1000 peers, t0.5, Zipf s1.2.
  • For a given search/update ratio there is an
    optimal point

21
Conclusions
  • The efficient solution for indexing XML structure
    in structured overlay networks is proposed.
  • The presented solution can be used in a P2P XML
    querying engine for answering structural (sub)
    queries.

22
Last slide
  • Thank you for your attention!
  • Questions?
About PowerShow.com