Reconciling While Tolerating Disagreement in Collaborative Data Sharing - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Reconciling While Tolerating Disagreement in Collaborative Data Sharing

Description:

Reconciling While Tolerating Disagreement in Collaborative Data ... low GBs of data, MBs of updates. periodic updates from multiple sites. Synthetic workloads ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 21
Provided by: nic198
Category:

less

Transcript and Presenter's Notes

Title: Reconciling While Tolerating Disagreement in Collaborative Data Sharing


1
Reconciling While Tolerating Disagreement in
Collaborative Data Sharing
Nicholas Taylor, Zachary Ives Department of
Computer and Information Science University of
Pennsylvania
  • ACM SIGMOD
  • International Conference on Management of Data
  • June 27, 2006

2
Data Exchange is Needed Everywhere
  • Cell phone and PDA address books
  • contain slightly different data
  • Collaborators citation databases
  • different abbreviation styles
  • different citation programs
  • Biologists databases
  • collect new information from published databases
  • store information from own experiments
  • disagree about key data points

3
Traditional Data Integration
4
Conflicting Data is Inevitable
  • Independent sources, conflicting data
  • common in collaborative settings
  • not well accommodated by traditional data
    integration
  • Schema constraints often reveal conflicts
  • e.g. name è rating

5
A Model for Data Sharing
  • Global instance not possible, but conflicts are
    localized
  • Collaborative Data Sharing System (CDSS)
  • Synchronize databases by sharing transactions
  • Each participant creates its own global
    instance by deciding which transactions to apply
  • ORCHESTRA is our implementation of a CDSS

6
CDSS Overview
CDSS (ORCHESTRA)
updates (D1)
RDBMS
Queries and Answers
  • User interacts with standard database
  • CDSS coordinates with other participants
  • Ensures availability of published updates
  • Finds consistent set of trusted updates
    (reconciliation)
  • This paper assumes a single schema

7
Trust Policies in a CDSS
8
Challenges of Reconciliation
  • Updates in atomic transactions
  • Causal dependencies (antecedents)
  • Intermittent participation
  • Maximal progress at each step
  • Consistent, predictable behavior
  • All transaction acceptances are final
  • Always prefer higher priority transactions
  • Frequent conflict resolution can be frustrating
  • Allow user decisions to be deferred

9
Data Sharing Operations
  • Operations involve only one participant
  • Publishing
  • Reconciliation
  • Participant applies consistent subset of updates
  • May get its own unique instance

d
Publish New Updates
request
Reconciliation Requests
Published Updates
d
Local Instance
Update Log
d
10
Reconciliation in ORCHESTRA
  • Group transactions with antecedents and accept
    highest priority chains

R(X,Y) XèY
Reconciliation 1
Reconciliation 2
û
(A,4) (B,4)
(A,3)
ü
û
(B,3) (C,5)
(B,3) è(B,4)
(A,2)
û
(B,3) (C,5)
ü
(B,4) (C,5)
6
(D,8)
Decision ü Accept û Reject 6 Defer
(D,9)
(C,6)
û
6
11
Consistent Reconciliations
  • Applied transactions may not
  • modify non-present values
  • cause constraint violations
  • have an unapplied antecedent transaction
  • interact with each other
  • Want to avoid transient conflicts
  • Therefore, flatten chains of antecedent
    transactions

(C,6) è(D,6)
(C,6)
(D,6)
Peer 1
(C,5)
Peer 3
12
Reconciliation Algorithm
  • Input Flattened trusted applicable transaction
    chains
  • Output Set A of accepted transactions
  • For each priority p from pmax to 1
  • Let C be the set of chains for priority p
  • If some t in C conflicts with a non-subsumed u in
    A, REJECT t
  • If some t in C
  • uses a deferred value, DEFER it
  • conflicts with a non-subsumed, non-rejected u in
    C, DEFER t
  • Otherwise, ACCEPT t by adding it to A

13
Flattening and Antecedents
R(X,Y) XèY
ü
(A,2) (D,6) è(D,7)
(D,6)
(D,6)
(A,2) (D,7)
û
(A,1) (B,3) (C,4)
ü
(A,1) (B,3) (C,4)
(B,3) è(B,4) (C,5) è(E,5)
Decision ü Accept û Reject 6 Defer
û
(C,5)
(C,5)
(A,1) (B,4) (E,5)
ü
14
System Architecture
  • Reconciliation algorithm at each participant
  • Centralized and distributed update stores
  • Hold updates
  • Compute antecedent chains

Publish New Updates
Reconciliation Requests
Reconciliation Algorithm
Published Updates
RDBMS
15
Experimental Overview
  • Experimental goals
  • demonstrate feasibility of CDSS concept
  • explore efficiency of system
  • Target domain
  • bioinformatics databases, 10s to 100s of sites
  • low GBs of data, MBs of updates
  • periodic updates from multiple sites
  • Synthetic workloads
  • no real workloads with conflicts exist stress
    test
  • tuples generated using skewed distribution (hot
    items)
  • modification if value present, otherwise insertion

16
Result Quality is Robust
  • Effect of reconciliation interval on
    synchronicity
  • synchronicity avg. no. of values per key
  • ten peers each publish 500 transactions of one
    update
  • Infrequent reconciliation slowly changes
    synchronicity

17
Fetch Times Dominate Cost
  • Effect of reconciliation interval on running time
  • ten peers each publish 500 single-update
    transactions
  • Infrequent reconciliation more efficient
  • Fetch times (i.e. network latency) dominate

18
Summary of Experiments
  • CDSS concept is feasible
  • Infrequent reconciliation has minimal effects
  • Distributed implementation is practical
  • Reconciliation is not an expensive operation
  • See paper for system stability experiments
  • Effect of increasing transaction size
  • negligible on synchronicity after size two
  • Effect of adding peers
  • worsens synchronicity sublinearly
  • increases execution time linearly

19
Related Work
  • Inconsistency repair
  • Bry97, ABC99
  • Causal ordering in distributed DBs with
    replication
  • Optimistic Concurrency Control KR81,
    Version vectors PPR83,
  • Distributed file systems
  • Ivy MMGC02, Coda Braam98,KS95,
    Bayou TTP96,
  • File synchronization
  • Unison PV04, Harmony BVP06
  • Version control (CVS, Subversion, etc.)

20
Future Work Conclusions
  • Future Work Completing the ORCHESTRA platform
  • Improved performance and reliability in
    distributed store
  • Support for multiple schemas
  • Evaluation with real users
  • Conclusions
  • Conflicts are inevitable and irresolvable
  • Collaborative Data Sharing Systems handle
    conflicts using update-centric semantics for
    consistency
  • Performance evaluations validate CDSS approach
  • A fully distributed implementation is feasible
Write a Comment
User Comments (0)
About PowerShow.com