MAPPING DATA IN PEERTOPEER SYSTEMS:SEMANTICS AND ALGORITHMIC ISSUES Department of Computer Science U - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

MAPPING DATA IN PEERTOPEER SYSTEMS:SEMANTICS AND ALGORITHMIC ISSUES Department of Computer Science U

Description:

Does not work in streaming fashion ... B2B Domain:business-to-business setting. Results ... B2B. Complex semantics for tables,but still efficient new mappings ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 48
Provided by: suz2
Category:

less

Transcript and Presenter's Notes

Title: MAPPING DATA IN PEERTOPEER SYSTEMS:SEMANTICS AND ALGORITHMIC ISSUES Department of Computer Science U


1
MAPPING DATA IN PEER-TO-PEER SYSTEMSSEMANTICS
AND ALGORITHMIC ISSUESDepartment of Computer
Science University of TorontoAnastasios
Kementsietsidis Marcelo Arenas Renee
J.Millerpresented by Ahmet OLGUN Suzan
BAYHAN
2
OUTLINE
  • 1-ABSTRACT
  • 2-INTRODUCTION
  • 3-MOTIVATING EXAMPLE
  • 4-MAPPING TABLES
  • 5-MAPPING AS CONSTRAINTS
  • 6-CONSISTENCY AND INTERFERENCE
  • 7-THE ALGORITHM
  • 8-EXPERIMENTAL RESULTS
  • 9-CONCLUSIONS

3
ABSTRACT
  • PROBLEM OF MAPPING DATA IN PEER-TO-PEER DATA
    SHARING SYSTEMS(PPDSS)
  • MAPPING TABLES LISTING CORRESPONDING VALUES IN A
    PPDSS
  • WHY TABLES ARE APPROPRIATE
  • A LANGUAGE TO SPECIFY MAPPING TABLES UNDER
    DIFFERENT SEMANTICS
  • COMPLEXITY OF THE PROBLEM
  • AN EFFICIENT ALGORITHM FOR ITS SOLUTION
  • IMPLEMENTATION WITH EXPERIMENTAL RESULTS
  • HYPERION PROJECT

4
INTRODUCTION
  • Traditionally data integration and exchange bw
    heterogeneous data sources is provided mainly
    through use of views i.e., queries
  • Sources share their schemas and cooperate
  • BUT IN OUR WORK SUCH CLOSE COOPERATION IS
  • Not desirable (PRIVACY)
  • Not feasible (maybe due to resource limitations)

5
SIMILARITY WITH FILE-SHARING SYSTEMS
  • TO FIND DATA WHEN THERE IS NO AGREEMENT ON THE
    LOGICAL DESIGN OF DATA,
  • FOCUS ON VALUES AND HOW THEY CORRESPOND
  • IN FILE SHARING SYSTEMS LIKE NAPSTER AND GNUTELLA
    ,QUERYING IS DONE ON SIMPLE VALUE SEARCH OF FILE
    NAMES
  • QUERIES ARE OF THE FORM
  • RETRIEVE ALL FILES NAMED X
  • EASY BECAUSE THERE IS A CONSENSUS ON
    NAMES

6
WHAT IF NO ACCEPTED NAMING STANDARD???
  • Each peer has to develop its own naming standard
  • Conforming external standards is time-consuming
    and expensive
  • So to search data in such environments
    ?MAPPING TABLES that store correspondence between
    values.
  • At simplest, tables are binary tables
    corresponding identifiers from two different
    sources
  • Mapping Tables represent EXPERT KNOWLEDGE

7
MOTIVATING EXAMPLE
  • DOMAINBIOLOGICAL DATABASES
  • GENE DATABASE?GDB
  • PROTEIN DATABASE?SwissProt
  • GENETIC DISORDERS AND RELATED GENES
    DATABASE?MIM

8
EXAMPLE (CONTD)
  • Integration of these resources is extremely
    desirable for scientists to have uniforn access
    BUT SEEMS UNATTAINABLE due to political,financial
    and technical reasons.
  • Among technical reasons , heterogeneity of
    sources like formatted files,spreadsheets,relation
    al databases

9
MAIN CHARACTERISTICS AND USE OF MAPPING TABLES
  • Associations within and Across Domains
  • Peer Autonomy
  • Semantics
  • Automated discovery of mappings

10
Association within and Across Domains
  • Mapping table is not necessarily a function
  • By mapping tables we associate seemingly
    unconnect databases
  • Disjoint worlds can be associated since the
    corresponding worlds are semantically close to
    each other

11
Peer Autonomy
  • Autonomy has high importance in peer-to-peer
    systems.
  • Mapping tables do not restrict the operation of
    peers in any way beyond the agreement on values
    expressed in the tables.

12
Mapping Table 1
Figure 1
13
Semantics
  • Experts have varying degree of expertise,so we
    should better show the confidence level of
    mapping tables
  • A tuple (X,Y)
  • If X value appearing in a mapping table follows
    the open-world semantics then it can be
    associated with any Y value-Partial Information
    about X

14
Closed World
  • If X follows Closed-World semantics, then values
    in the table can only be associated with the
    specified Y values.
  • 4 alternatives
  • 1-OO (No specific information,no practical
    interest)
  • 2-OC (Partial knowledge)
  • 3-CO(Partial knowledge)
  • 4-CC(complete knowledge)

15
Open/Closed World
Table 1Alternative open/closed world semantics
16
Automated Discovery
  • Given a semantics for mapping tables, to reason
    about them,treat mapping tables as constraints on
    the exchange of information.
  • Simplest way to combine tables? CONJUNCTION

17
Example Mapping Tables
18
MAPPING TABLES
  • A,B,C,D ? individual attributes
  • dom(A) ? domain of A like integers,characters
  • U,X,Y ? set of attributes
  • R ? a relational schema
  • RU ? attributes of a schema
  • r ? relation instance
  • t ? tuples

19
MAPPING TABLES(contd)
tX?values of tuple t in attributes of
X XA1,A2.... Ak dom(X)dom(A1)Xdom(A2)X...Xdom
(Ak) To represent different semantics of mapping
tables,it is necessary to introduce variables V?
a set of variables where Vn dom(A)F for each
attribute of A
20
DEFINITION 1
  • Given a set of attributes U,t is a mapping over U
    if for each A?U ,tA is either a constant in
    dom(A),a variable in V or an expression of the
    form v-S,where v?V and S is a finite subset of
    dom(A)

21
DEFINITION 2
  • Let X and Y be nonempty disjoint set of
    attributes. A mapping table m from X to Y is a
    finite set of mappings over X U Y such that each
    variable appears in at most one mapping

22
DEFINITION 2
  • Set of mappings?mapping table
  • Table?relations containing variables
  • RESTRICTEach variable appears in at most one
    mapping
  • TWO DIFFERENT MAPPINGS ARE COMPLETELY INDEPENDENT

23
DEFINITION 3
  • A valuation ? over a mapping table m is a
    function that maps each constant value in m to
    itself and each variable v of m to a value in
    the intersection of the domains of the attributes
    where v appears.Furthermore,if v appears in an
    expression of the form v-S,then ? (v) is not an
    element of S.

24
MAPPING AS CONSTRAINTS
  • View mapping tables as constraints on the
    exchange of information between sources
  • Given a set of mapping constraints,we are able to
    infer new mapping constraints and check the
    consistency of the constraints

25
(No Transcript)
26
CONSISTENCY INFERENCE
  • Infer new mapping tables
    Combine the knowledge from mapping tables
    available in a network of peers
  • Determine consistency of mapping tablesAutomated
    inference and consistency checks will help a
    curator to see whether semantics are valid

27
Problem Definition
  • Given a mapping constraint formula (MCF) F over
    a set of attributes U, F is consistent if there
    exists a nonempty relation r of U satisfying F.
  • Inference problem is the problem of verifying
    whether a set of MCFs implies another MCF

28
Theorems
  • Theorem The consistency problem for
    conjunctions of mapping constraints is
    NP-complete.
  • Theorem If the length of the paths or number of
    mapping constraints is fixed then the consistency
    problem for the conjunctions of mapping
    constraints is NP-complete.

29
Assumptions
  • Assumptions to solve the consistency problem
  • Number of mapping constraints per peer is small
  • The length of paths is small
  • For example in Gnutella paths have maximum
    size of 7

30
THE ALGORITHM
  • ? P1,P2,..,Pn a path of peers
  • Ui set of attributes at each peer
  • S set of constraints over path ?
  • µ X ?Y a mapping constraint
  • ext(µ )? (t) t ? m and ? is a valuation over
    m

31
THE ALGORITHM
  • 1- S is consistent iff there exists t ? ext(µ)
  • 2-? µX?Y, S ? µ iff ext(µ) ? ext(µ)
  • For inference check 2 if S ? µ
  • For consistencycheck 1.

32
Design DecisionsP1,P2,P3,P4 path
33
Algorithm for computing the cover
  • P1 sends all mapping constraints to P2
  • P2 uses those constraints with his own to create
    a cover between P1 and P3
  • P2 forwards cover to P3
  • P3 does the same thing to create a cover bw P1
    and P4
  • P3 sends the computed cover back to P1

34
Problems
  • Unnecessary computation
  • Cover involving A6 can be done locally
  • Does not work in streaming fashion
  • P1 has to wait for the whole computation to
    finish to get the cover between itself and P4
  • So ?...

35
Partitions
Peer P2
Peer P1
p 5
p 1
p 6
p 7
p 2
Peer P3
p 3
p 8
p 4
p 9
36
Description of the Algorithm
  • Two phases
  • Information gathering
  • Computation

37
Information Gathering
  • P1 sends to P2 the set of attributes at each
    partition BUT NO MAPPINGS
  • P2 computes inferred partitions
  • Inferred partitions to discover interdependencies
    or lack thereof bw partitions
  • Then computation phase

38
Inferred Partitions
Peer P1
Peer P2
39
Computation Phase
  • The computation starts at penultimate peer
  • Cover between P3 and P4 computed and sent to P2
  • Cover between P2 and P4 computed and streamed to
    P1
  • Cover between P1 and P4 computed

40
EXPERIMENTAL RESULTS
  • Do our solutions provide added value for
    communities that already use mapping tables
    extenxively?
  • Are characteristics of our algorithm appropriate
    and effective in a peer-to-peer environment?

41
Implementation
  • Geographically distributed machines with one peer
    per machine
  • Each peer has 2 modules
  • First module interacts with the storage
    manager to retrieve mappings and perform cover
  • Second is peer-to-peer networking protocol

42
Implementation
  • Each peer decides how much cache to use
  • Biology Domain6 Biological DB used
  • GDB MIM SwissProt Hugo Locus Unigene
  • Tabe sizes range from 7000 to 28000 mappings with
    an average of 13000.
  • B2B Domainbusiness-to-business setting

43
Results
  • Cache sizes from 64 to 128 mappings result
  • the best running times for those data character
  • B2B
  • Complex semantics for tables,but still
    efficient new mappings
  • Total execution time scales linearly with
    the number of computed mappings

44
(No Transcript)
45
(No Transcript)
46
CONCLUSION
  • Problem of managing collections of mapping tables
  • Alternative semantics for tables
  • A language that allows specification of mapping
    tables under different semantics
  • Complexity of Inference and consistency
  • An algorithm to solve the problem

47
  • ANY QUESTIONS?
  • THANK YOU...
Write a Comment
User Comments (0)
About PowerShow.com