Information Management in P2P Serge Abiteboul INRIA-Futurs and Univ. Paris 11 - PowerPoint PPT Presentation

About This Presentation
Title:

Information Management in P2P Serge Abiteboul INRIA-Futurs and Univ. Paris 11

Description:

P2P Data Management, 2006, S. Abiteboul. 3 /89. Success ... Napster (emule, bearshare, etc.): music database. Flickr: picture database. Wikipedia: dictionary ... – PowerPoint PPT presentation

Number of Views:130
Avg rating:3.0/5.0
Slides: 90
Provided by: proje73
Category:

less

Transcript and Presenter's Notes

Title: Information Management in P2P Serge Abiteboul INRIA-Futurs and Univ. Paris 11


1
Information Management in P2PSerge
AbiteboulINRIA-Futurs and Univ. Paris 11
2
Introduction
3
Success stories at the time of the Internet
bubble
  • Google management of Web pages
  • Mapquest management of maps
  • Amazone book catalogue
  • eBay product catalogue
  • Napster (emule, bearshare, etc.) music database
  • Flickr picture database
  • Wikipedia dictionary
  • del.icio.us annotations
  • In France
  • Meetic dating database
  • Kelkoo comparative shopping

They are all about publishing some database
4
The trend is towards peer-to-peerand
interactivity
  • P2P A large and varying number of computers
    cooperate to solve some particular task without
    any centralized authority
  • Goal build an efficient, robust, scalable system
    based (typically) on inexpensive, unreliable
    computers distributed in a wide area network
  • seti_at_home kazaa cabal
  • Switch from centralized servers to communities
    and syndication
  • Interaction and Web 2.0
  • Motivations Social, organizational

5
Information management in a P2P network
  • Private terminology data ring
  • Information is heterogeneous, distributed,
    replicated, dynamic
  • Which info Data meta-data knowledge
    services
  • Peers are heterogeneous, autonomous and possibly
    mobile
  • From sensors to PDA to mainframe
  • Typically very large number of peers
  • Variety of requirements QoS, performance,
    security, etc.

6
Acknowledgement
  • Xyleme Scalable XML warehousing
  • Sophie Cluet, Guy Ferran (Xyleme) many others
  • ActiveXML Language for P2P data management
  • Omar Benjelloun (Google), Ioana Manolescu, Tova
    Milo (Tel Aviv) many others
  • KadoP P2P scalable XML indexing
  • Ioana Manolescu, Nicoleta Preda others
  • Data Ring Infrastructure for P2P data management
  • Alkis Polyzotis (UC Santa Cruz)

7
Outline
  • Introduction the data ring
  • Calculus for P2P data management (ActiveXML)
  • Algebra for P2P data management (ActiveXML
    algebra)
  • Indexing in P2P (KadoP)
  • Conclusion
  • Goal of the tutorial present issues and
    technology on p2p information management
  • Warning it is very biased it is not a survey

8
Outline
  1. Introduction the data ring
  2. Calculus for P2P data management (ActiveXML)
  3. Algebra for P2P data management (ActiveXML
    algebra)
  4. Indexing in P2P (KadoP)
  5. Conclusion

9
1. Introduction the data ring
10
The information in a data ring
  • Data tuples, collections, documents, relations
  • Services data sources, possibly some processing
  • Meta-data about resources attribute/values
    pairs, annotations
  • Ontologies to explain data and metadata
  • View definitions
  • Data integration information, e.g., mappings
    between ontologies
  • Physical data Indices and materialized views

11
Functionalities of the data ring
  • Storage, persistence, replication
  • Indexing, caching, querying, updating,
    optimization
  • Schema management, access control
  • Fault tolerance, self tuning, monitoring
  • Resource discovery, history, provenance,
    annotations, multi-linguism,
  • Semantic enrichment, uncertain data
  • Each functionality may be achieved by a peer or
    by the network

12
And now, what is a peer?
  • A mainframe database
  • A file system
  • Web server
  • A PC
  • A PDA
  • A telephone
  • A sensor
  • A home appliance
  • A car
  • A manufacturing tool
  • A telecom equipment
  • A toy
  • Another data ring

Any connected device or software with some
information to share
A net address and some names of resources (e.g.
document or service)
13
Advantages and disadvantages of P2P
  • Scaling
  • Performance
  • Optimization of parallelism
  • Avoid bottleneck
  • Replication
  • Availability
  • Replication
  • Cost
  • Avoid the cost of server
  • Share operational cost
  • Dynamicity
  • add/remove new data sources
  • Complexity
  • Performance
  • Cost for complex queries
  • Communication cost
  • Availability
  • Peers can leave
  • Consistency maintenance
  • Difficult to support transaction
  • Quality
  • Difficult to guarantee quality

14
Crash course on Web standards
Owl RDFS
XML
  • Data exchange format XML
  • Labeled ordered trees
  • Its main asset XML schema
  • There is much more
  • Distributed computing protocol Web services
  • SOAP Simple Object Access Protocols
  • WSDL Web Service Definition Language
  • UDDI Universal Description, Discovery and
    Integration
  • BPEL Business Process Execution Language
  • Query languages XPath and XQuery
  • Declarative query language for XML full-text
    update language
  • Knowledge representation Owl or RDF/S

Xquery Xpath
SOAP WSDL
15
Information used to live in islands but with the
Web, this is changing uniform access to
information the dream for distributed data
management
16
Do you like the standards?
  • It is the wrong question!
  • Correct questions What can you do with it? What
    is missing?
  • Is Xquery the ultimate query language for the
    Web? No
  • It is a language for querying centralized XML
  • We will see what it is missing
  • We will not talk much about semantics

17
Automatic and distributed management of the data
ring
  • No centralized server
  • No information administrator (no info manager)
  • Most users are non-experts
  • E.g., scientists
  • Requirements
  • Ease of deployment (zero-effort)
  • Ease of administration (zero-effort)
  • Ease of publication (epsilon-effort)
  • Ease of exploitation (epsilon-effort)
  • Participation in community building notably via
    annotations

Happy database admin
18
What should be made automatic
  • Self-statistics from the monitoring of the data
    ring
  • In particular, define the statistics that are
    needed
  • Self-tuning based on the self-statistics
  • Choose the most appropriate organization
  • Decide to install access structures indexes,
    views, etc.
  • Control replication of data and services
  • Self-healing
  • Recovery from errors
  • E.g., replacement of a failing Web service
  • And automatic file management

19
Any hope?
  • Technology exists (database self-tuning, machine
    learning, etc.)
  • But self-tuning for databases has advanced very
    slowly
  • Why can this work?
  • There is no alternative (for db, this was just a
    cool gadget)
  • KISS (keep it simple stupid!)
  • The power of parallelism
  • This is assuming lots of machine have free
    cycles (true) and bandwidth is generous (not
    always true)

20
Distributed access control
  • Goal Control access to ring resources
  • Access to resources is based on access rights
    (ACL)
  • Who is controlling ACLs?
  • A node manages ACLs for a collection of
    distributed resources
  • Easy but against the spirit and possible
    bottleneck
  • The network manages access control
  • Anybody can get the data
  • The data is published with encryption and
    signatures only nodes with proper access rights
    can perform reads/writes
  • Some techniques exist

21
Monitoring
  • What is monitored?
  • Web service calls and database updates
  • The Web
  • Web pages
  • RSS feed
  • What is produced?
  • A stream of events
  • As a continuous service
  • As a RSS feed
  • As a Web site/page
  • Info-surveillance
  • Self-statistics and tracing
  • Basis for error diagnosis

22
Streams are everywhere
  • In query processing
  • In indexing (KadoP)
  • In recursive queries (AXML-QSQ)
  • In messaging, monitoring and pub/sub
  • That is why we will use an algebra over streams
    of trees and not simply an algebra over trees

23
Example Edos distribution system
  • A system for the management of Linux distribution
  • Joint work with Mandriva Software and U. Tel Aviv
  • Community of open-source developers thousands
  • System releases about 10 000 software packages
    metadata
  • Functionalities
  • Query the metadata
  • Query subscription
  • Retrieve packages
  • Publish a new release or update an existing one

24
Exemple WebContent
  • WebContent an ANR platform for the management of
    web content
  • Web surveillance
  • Business, technical, web watching
  • Participation of Gemo
  • WP3 knowledge
  • WP5 P2P content management
  • Partners CEA, EADS, Thales, Bongrain, Xyleme,
    Exalead, many research groups (UVSQ, Grenoble,
    Paris 6, etc.)

25
Taxonomy of such applications
  • Parameters
  • Number of peers and quantity of data
  • How volatile the peers are
  • The query/update workload
  • The functionalities that are desired
  • Edos peers and documents in thousands, mostly
    append for updates, peers not too volatile
  • An extreme Google search engine in P2P for
    billions of documents using millions of hyper
    volatile peers
  • Mostly interested in the first case

26
Thesis
  • XQuery is fine for local XML processing and
    publishing
  • Not sufficient for distributed data management
  • The success of the relational model, i.e., of
    tables on a server
  • A logic for defining tables
  • An algebra for describing query plans over tables
  • By analogy, we need for trees in a P2P system
  • A logic for defining distributed tree data and
    data services
  • An algebra for describing query plans over these
  • Proposal ActiveXML logic and algebra

27
Outline
  1. Introduction the data ring
  2. Calculus for P2P data management (ActiveXML)
  3. Algebra for P2P data management (ActiveXML
    algebra)
  4. Indexing in P2P (KadoP)
  5. Conclusion

28
2. Active XMLa logic for distributed data
management
29
The basis
  • AXML is a declarative language for distributed
    information management and an infrastructure to
    support the language in a P2P framework
  • Simple idea XML documents with embedded service
    calls
  • Intensional data
  • Some of the data is given explicitly whereas for
    some, its definition (i.e. the means to acquire
    it when needed) is given
  • Dynamic data
  • If the data sources change, the same document
    will provide different information

30
Example(omitting syntactic details)
ltresorts stateColoradogt ltresortgt
ltnamegt Aspen lt/namegt ltscgt
Unisys.com/snow(Aspen) lt/scgt ltdepth
unitmetergt1lt/depthgt lthotels IDAspHotels
gt . Yahoo.com/GetHotels(ltcity
nameAspen/gt) lt/hotelsgt lt/resortgt
lt/resortsgt
  • May contain calls
  • to any SOAP web service
  • e-bay.net, google.com
  • to any AXML web services
  • to be defined

31
Marketing ? Philosophy
Active answer intensional and dynamic and
flexible Embedding calls in data is an old idea
in database
Manon Whats the capital of Brazil? Dad Lets
ask Wikipedia.com! Manon How do I get a cheap
ticket to Galapagos? Dad Lets place a
subscription on LastMinute.com! Manon What are
the countries in the EC? Dad France, Germany,
Holland, Belgium, and hum Lets ask
YouLists.com for more!
32
Active XML peer
AXML peer
soap
  • Peer-to-peer architecture
  • Each Active XML peer
  • Repository manages Active XML data
  • Web client calls the services inside a document
  • Web server provides (parameterized)
    queries/updates over the repository as web
    services
  • Exchange of AXML instead of XML

33
What is an AXML peer?
  • PC
  • Now open source ObjectWeb queries in OQL
  • Peer on a mass storage system
  • eXist (open source XML database) queries in
    XQuery
  • Xyleme queries in XyQL
  • PDA or cell phone
  • Persistence in file system and XPATH
  • On going the entire network
  • Data is stored in a P2P network - KadoP
  • More java card, a relational database

34
A key issue call activation
  • When to activate the call?
  • Explicit pull mode active databases
  • Implicit pull mode deductive databases
  • Push mode query subscription
  • What to do with its result?
  • How long is the returned data valid?
  • Mediation and caching
  • Where to find the arguments?
  • Under the service call XML,XPATH or a service
    call

35
Another key issue what to send?
  • Send some AXML tree t
  • As result of a query or as parameter of a call
  • The tree t contains calls, do we have to evaluate
    them?
  • If I do, I may introduce service calls, do we
    have to evaluate all these calls before
    transmitting the data?
  • Hi John, what is the phone number of the Prime
    Minister of France?
  • Find his name at whoswho.com then look in the
    phone dir
  • Look in the yellow pages for deVillepins in
    phone dir of www.gov.fr
  • (33) 01 56 00 01

36
Active XMLcool idea complex problems
  • Blasphemous claim
  • Active XML is the proper paradigm for data
    exchange!
  • Not XML not XQuery
  • Brings to a unique setting
  • distributed db, deductive db, active db, stream
    data
  • warehousing, mediation
  • This is unreasonable? Yes!
  • Plenty of works ahead to make it work
  • But first, the algebra

37
Outline
  • Introduction the data ring
  • Calculus for P2P data management (ActiveXML)
  • Algebra for P2P data management (ActiveXML
    algebra)
  • Query processing
  • Query optimization
  • Indexing in P2P (KadoP)
  • Conclusion

38
3. Active XML algebra
39
Motivation
  • Relational model centralized tables
  • optimization algebraic expression and
    rewriting
  • Active XML model distributed trees
  • optimization algebraic expression and
    rewriting
  • Distributed query optimization based on algebraic
    rewriting of Active XML trees
  • Based on experiences with AXML optimization

40
Active XML peers
output stream
  • We focus on positive AXML
  • Set-oriented data
  • Positive/monotone services
  • Services tree-pattern-query-with-join queries
  • Services produce streams
  • Optimized by a local query optimizer
  • Evaluated by a local query processor
  • Out of our scope

p
Local query processing
join
?
p
input stream
input stream
41
The problem
  • An AXML system
  • A set of peers
  • For each peer a set of documents and services
  • Extensional data is distributed
  • Intensional data (knowledge) is distributed
  • Defined using query services (TPQJ queries)
  • These services are generic any peer can evaluate
    a query
  • A query q to some peer
  • Evaluate the answer to q with optimal response
    time

42
AXML algebra
  • (AXML) algebraic expressions

AXML logic
d_at_p
Each such expression lives at some peer Includes
the AXML trees
43
Algebraic expressions annotations
  • Executing service call ?
  • Terminated service call ?
  • Subtlety
  • q_at_p(5) definition of intensional data
  • eval(q_at_p(5)) request to evaluate it during
    query optimization
  • ? q_at_p(5) query is being evaluated during query
    processing
  • ? q_at_p(5) query evaluation is complete

44
Evaluation rules local rules
for l ? sc, s ? send, receive
45
Evaluation rules transfer rules
?
  • Site p asks p to do the work and send the result
    to p

46
Evaluation rules more transfer rules
x_at_p
?


Z
  • When a query is evaluated, results appear
  • They are sent to the place that requested them
  • Also some rules for eof

47
Evaluation
  • Reminder setting
  • An AXML system
  • A request to evaluate query q at peer p eval_at_p(
    q )
  • Rewrite the trees in peer workspaces until
    termination of the process
  • Results
  • For positive XML, this process converges to a
    possibly infinite state
  • This process computes the answer to q
  • May be fairly inefficient need for optimization!

48
Optimization
  • More rewrite rules to evaluate a query more
    efficiently

49
Query optimization
  • Well-known optimization techniques for
    distributed data management
  • Pushing selections
  • Semijoin reducers
  • Horizontal, vertical, hybrid decomposition
  • Recursive query processing and query-subquery
  • Some specific AXML optimizations
  • Pushing queries over service calls
  • Lazy service call evaluation
  • Optimizing subscription management
  • All are captured by the algebraic framework

50
Example pushing selections
Suppose q q1(?(q2))
  • Same rule applies if d_at_p2 is replaced by a
    continuous query

51
Example interleaving of processing and
optimization
  • At peer i di ri ? di1
  • Query at p1 ?(d1)
  • ?(d1) ? ?(r1) ? ?(d2)eval_at_p1(?(r1) ? ?(d2)) ?
    eval_at_p1(?(r1)) ? eval_at_p1(?(d2))eval_at_p1(?(r1)) ?
    ??(r1) (starts streaming data)
  • ?(d2) ? ?(r2) ? ?(d3) ?(r2) starts streaming
    data
  • ?(d3) ? ?(r3) ? ?(d4)

52
Transfer and load balancing rules
Peer p1 delegates the evaluation of E to p2
53
Transfer and load balancing rules
x_at_p1
x_at_p1
eval_at_p1
?
eval(E)
send_at_p1
send_at_p2
newRoot_at_p2()
x_at_p1
Peer p1 delegates the evaluation of E to p2
54
Transfer and load balancing rules
x_at_p1
newRoot_at_p2()
x_at_p1
?

send_at_p2
eval(E)
x_at_p1
Peer p1 delegates the evaluation of E to p2
55
Transfer and load balancing rules
x_at_p1
newRoot_at_p2()
x_at_p1
?

send_at_p2
eval(E)
x_at_p1
Peer p1 delegates the evaluation of E to p2
56
Transfer and load balancing rules
x_at_p1
newRoot_at_p2()
x_at_p1
?

send_at_p2
x_at_p1
Peer p1 delegates the evaluation of E to p2
57
Transfer and load balancing rules
x_at_p1
eval_at_p1
?
send_at_p1
?
send_at_p2
newRoot_at_p2()
eval_at_p2
x_at_p1
Peer p1 delegates the evaluation of E to p2
58
Back to interleaved execution and optimization

?
?
?
?
Repeated transfers
?(r2)
?(r3)
?(r4)
?(r1)
Data transfers reduced More work for p1 merging
all the streams
Hierarchical stream merging
59
Example Horizontal and vertical decomposition
  • A relation d over ABC that is split both
    horizontally and vertically
  • d (d1 ? d2) d3
  • d1 ?Blt5 (d') and d2 ?Bgt5 (d')
  • d', d1, d2 over AB and d3 over BC each di is at
    a peer pi
  • Consider the query ?B0_at_p(d)
  • ? ?B0_at_p( (?Blt5 (d') ? ?Bgt5 (d'))) d3_at_p3 )
  • ? ?B0 _at_p( d1_at_p1 d3_at_p3 )
  • ? ? _at_p (x_at_p?receive(d1_at_p1)?,
    y_at_p?receive(d3_at_p3)?)
  • ? send_at_p1(x_at_p ?B0_at_p1(d1_at_p1) )
  • ? send_at_p3(y_at_p d3_at_p3)

60
Common sub-expression elimination
  • eval_at_p(E), x_at_p?receive_at_p(E)? ?
  • eval_at_p(x_at_p), x_at_p?receive_at_p(E)?

eval_at_p
x_at_p
?


receive_at_p
x_at_p
61
Common sub-expression elimination
62
Example recursive query processing
  • Using a pseudo Datalog syntax
  • s1_at_p(x, y) ? d2_at_p'(x, z), s2_at_p'(z, y)
  • s2_at_p'(x, y) ? d1_at_p(x, z), s1_at_p(y, z)
  • After rewriting
  • on p x_at_p? ? receive_at_p(q1_at_p'(d2_at_p', s2_at_p') ) ?
  • root_at_p? ? send_at_p(y_at_p', q2_at_p(? d1_at_p, ?x_at_p) ) ?
  • on p' root_at_p'? ? send_at_p'(x_at_p,
  • ? q1_at_p'(d2_at_p', y_at_p'? ? receive_at_p'(s2_at_p') ? ) )
    ?

63
Generic and global services
  • q_at_any where q is a query
  • Any peer that has some query processor for q can
    do it
  • f_at_any where f is a processing service call
  • Example decryption or gene comparison
  • q over a P2P collection

eval_at_p
eval_at_p
eval_at_p
eval_at_p
?
?
q_at_p2
q_at_p1
q
coll
q
index
_at_
_at_
q
64
The AXML algebra conclusion
  • Captures distributed XML query processing/optimiza
    tion
  • Based on a communication model a la CCS
  • Algebraic stream-oriented
  • Orthogonal to the local XML query optimizer
  • Orthogonal to the network support (DHT, small
    world etc.)
  • What is not yet available? A cost model and
    heuristics

65
Outline
  • Introduction the data ring
  • Calculus for P2P data management (ActiveXML)
  • Algebra for P2P data management (ActiveXML
    algebra)
  • Indexing in P2P (KadoP)
  • Conclusion

66
4. P2P XML indexing and query processing
67
Efficient evaluation of tree-pattern-queries
  • Many optimization techniques
  • We are interested here in distributed query
    evaluation/optimization
  • 1) We consider XML indexing
  • 2) Holistic twig join that is based on indexing
  • 3) P2P indexing
  • 4) P2P query processing
  • 5) Optimizing P2P indexing

68
XML indexing structural identifiers

1
A
8
0
7
2
B
C
8
6
1
1
X ancestor of Y ltgt pre(X) lt pre(Y) and post(X)
post(Y)
3
8
5
D
F
E
4
8
6
2
2
2
6
4
G
X parent of Y ltgt X ancestor of Y and level(X)
level(Y) - 1
John
6
4
3
3
-Level
Structural IDs Prefix-Postfix
69
Holistic Twig Join
  • Input a document and a tree pattern query
  • Find the bindings of the query in the document
  • Holistic holistique
  • (le tout et pas juste les parties)
  • Twig brindille
  • Join you know
  • Sounds like Harry Potter?

70
Query evaluation over a document
Ids for A (1,8,0)
Ids for C
Ids for D
John
Ids for John
Ids are sorted in lexicographical order Goals is
to find matching Ids
71
The Holistic Twig Join Algorithm
level
0
r (1,25)
1
b (10,11)
a (16,17)
b (19,22)
2
c (11,11)
c (17,17)
b (20,21)
3
c (22,22)
c (21,21)
4
72
The Holistic Twig Join Algorithm
(a7, b4, c8), (a7, b5, c8),
Stacks
(a7, b4 ,c9)
(a7 ,b6 ,c11)
a
a7
a1
a5
a7
a4
a6
a2
a3
b4
b6
b1
b2
b4
b6
b
b5
b3
c1
c2
c10
c5
c9
c8
c11
c6
c7
c4
c3
c9
c8
c11
c
Legend
This is the end
Head of the stream Find the match for the query
sub-tree determined by this node !!! The ID is
present also in the stack
73
P2P XML processing
74
XML indexing in Xyleme
  • History
  • 1999 INRIA research project
  • 2000 Creation of a spin-off
  • 2006 About 25 people
  • Technology
  • A scalable XML repository
  • A content warehouse
  • On a cluster of Linux PC
  • XML query processing
  • Twig join
  • Index is distributed
  • Keyword-based vs. document based

hash(C)
LAN
hash(John)
Put(Cd,p,6,6,1)
Put(Johnd,p,3,1,2)
75
Query processing over a distributed collection
A
Ids for A (p12,d456, 1,7,0)
C
D
Ids for C
Ids for D
John
Ids include peerId and docId Ids are sorted in
lexicographical order Goals is to find matching
Ids in the collection
Ids for John
76
XML indexing in KadoP
  • Use structural Ids
  • Publish them via a DHT
  • Distributed Hash Table
  • Peers come and go
  • Locate(k) log(n) messages to fing the peer in
    charge of key k
  • Put(k,v)
  • Get(k) retrieves all the values for k
  • We use Pastry
  • We also tried P2PSim and JXTA

hash(C)
DHT
posting for C
hash(John)
put(Cd,p,6,6,1)
put(Johnd,p,3,1,2)
put(Cd,p,6,6,1)
77
XML query processing in KadoP
  • Given a tree pattern query Q
  • Evaluate an index query indexQ to locate the
    peers that can provide some answers
  • indexQ is a twig join
  • Ship Q to these peers and evaluate it there
  • If indexQ is imprecise, many false positive
  • Example ship Q to all peers (maximal
    parallelism)
  • Example Instead of structural Ids, just use
    (label/word,peerId,docId)

78
KadoP architecture
KadoP peer publish query
Semantic layer
Web interface
External Layer
ActiveXML engine
KadoP Engine
Indexing
Logical Layer
Query processing
DHT locate, put, get delete
Physical Layer
Index
79
Some technical issues
  • Our goal manage millions of documents with a
    large number of peers
  • First experiments were a disaster
  • Replace the index storage of the DHT in a FS by
    storage in a database (Berkeley DB)
  • Extend the API of the DHT to have Append and not
    only Read/Write
  • Extend the API of the DHT to have a streaming
    exchange of postings
  • Useful because the XML algebra works better with
    streams
  • Now it scales but there is the issue of long
    postings

80
The issue of long postings Google in P2P
  • Using keyword distribution
  • Suppose
  • Peer for Ullman is in Europe
  • Peer for XML is in US
  • we have to ship one long posting between US and
    Europe
  • For a large number of users, we absorb all the
    bandwidth of Internet backbone
  • Need for replication
  • Even for thousands of peers, the exchange of long
    postings is an issue

Ullmann xml?
DHT
Ullman
xml
81
Intensional indexing in KadoP
Distributed B-tree
  • Long posting bad response time
  • No long posting
  • get h(name) then parallel fetch
  • Possibility to optimize further
  • f(docId55..docId75)
  • may be it does not match
  • no need to call f

long posting
h(Name)
f g h i
h(Name)
82
More optimization
  • Standard for P2P keyword search
  • Gap compression and adaptive set intersection
  • Standard distributed query optimization
    techniques
  • Ship smallest list
  • Load balancing
  • Caching
  • Replication
  • Semi-join techniques notably Bloom semi-join

83
Outline
  • Introduction the data ring
  • Calculus for P2P data management (ActiveXML)
  • Algebra for P2P data management (ActiveXML
    algebra)
  • Indexing in P2P (KadoP)
  • Conclusion

84
6. Conclusion
85
Conclusion
  • Logic for distributed data management
  • Opinion XQuery is a language for local XML
    management
  • Proposal ActiveXML
  • Algebraic foundation of distributed query
    optimization
  • Proposal ActiveXML algebra
  • P2P (Active) XML indexing
  • KadoP is now being tested and we are working on
    optimization
  • Software
  • ActiveXML is open-source see activexml.net
  • KadoP soon will be already available upon
    request
  • EDOS distribution system as well

86
Lots of related work and related systems
  • This is going very fast in system devepments
  • Structured P2P nets Pastry, Chord
  • Content delivery net Coral, Akamai
  • XML repositories Xyleme, DBMonet
  • Multicas systemst Avalanche, Bullet
  • File sharing systems BitTorrent, Kazaa
  • Pub/Sub systems Scribe, Hyper
  • Distributed storage systems OceanStore,
    GoogleFS
  • Etc.
  • Fundamental research is somewhat left behind

87
Issues
  • P2P query optimization
  • P2P access control
  • P2P archiving
  • P2P self tuning
  • P2P monitoring
  • P2P knowledge management SomeWhere
  • Also analysis and verification of these systems
  • E.g., termination, error detection, diagnosis

88
Find your own topic
  • Pick your favorite problem for data or knowledge
    management and study it in a P2P setting
  • with gigabytes of data and thousands of peers
  • If you find it boring, consider it
  • with terabytes of data and millions of peers

89
Merci
Merci
Write a Comment
User Comments (0)
About PowerShow.com