Title: An Efficient, LowCost Inconsistency Detection Framework for Data and Service Sharing in an InternetS
1An Efficient, Low-Cost Inconsistency Detection
Framework for Data and Service Sharing in an
Internet-Scale System
Yijun Lu, Hong Jiang, and Dan
Feng University of Nebraska-Lincoln, USA
Huazhong University of Science and Technology,
China
2Introduction
- Consistency control is important
- Active replication is essential to data security
- Systems need to handle updates
- Thus, consistency needs to be maintained
- Challenges
- Requirement is difficult to predict
- Overhead to maintain consistency is high
- In Grid-like systems, network is unreliable
3Two Flavors
- Inconsistency avoidance
- To avoid inconsistency in the first place. Incur
high maintenance cost and support a specific
application. - Examples
- Strong consistency
- NFS consistency
- etc.
- Optimistic consistency protocol?
- Pre-defined
- Inconsistency detection
- Our new approach
- There is no need to define consistency protocols
4Inconsistency Detection
- Features
- No need to pre-define consistency level
- Detect inconsistency among nodes in a timely
manner - Resolve inconsistencies based on application
semantics - Advantages
- Efficient Timely inconsistency detection
- Low-cost No prohibitive cost associated with a
given consistency protocol - Versatile Several applications with different
consistency requirement can run simultaneously
5Overview of IDF
6Efficient Detection
Focus of this paper
7Outline
- Background
- Design
- Evaluation
- Inconsistency resolution
- Related work
- Current status
8Background
- RanSub
- Locate disjoint content within a system
- Two processes collect/distribute
- Used to exchange nodes information among one
another - Gossip-based data dissemination
- A node disseminates non-duplicate packets to
random set of neighbors every T seconds. - Each message travels a certain number of hops
- Used to distribute updates
9Design of Timely Detection
- Basic idea
- Two layers
- Top layer captures most inconsistencies fast
- Bottom layer catch all the missed inconsistencies
- Terms
- Temperature the frequency that a user updates a
certain file in a period of time.
101. Measure the Updating Patterns
- Importance
- Use nodes updating patterns as an indicator of
their interest in a certain file, called
temperature. - The higher the temperature, the more likely a
node is the trouble makerIt causes most
inconsistencies. - Strategy
- A node tracks its updating history for a certain
file during a certain period of time.
112. Learning the Updating Patterns
- Use RanSub
- Collect nodes updating patterns
- Each node learns a random disjoint set with each
distribution - Possible improvement
- RanSub uses a single multicasting tree
- This cannot tolerate a single interior node
failure - Deploy a multicasting forest?
123. Temperature Collection/Dist.
- Why does this matter?
- Network bandwidth cost could be prohibitive
- Think the total number of files in a computer
- Interest-group based approach
- Nodes only report the temperature of files that
they are interested in. - In distribution, an interior node only relays the
temperature of files that are interested in by
nodes in its sub-tree - Result
- It can be supported by any connectivity,
including a dial-up connection.
134. Two-layer detection
- Two layers
- Solid line top layer
- Dotted line bottom layer
- Version vector is used to detect inconsistencies
- Mechanism
- Travel the top layer first
- If no inconsistency found in top layer
- Go to the bottom layer
An example
145. Caching Garbage Collection
- Caching
- Cache temperature information
- Cache routing information among top layer, then
smart decision can be made to save traversal time - Garbage collection
- Keep the temperature fresh
- Assign time stamp to each piece of temperature
information - Temperature information expires when the an
information is older than a threshold.
156. Discussion
- Till now, we treat the term update generically
- Only one kind of update
- Several forms of update exist, indeed
- Creating
- Modifying
- Deleting
- It does not matter in the detection part, but
does matter when we design the APIs for
applications
16Evaluation 1 Failure rate
- Why do we care about it?
- Top layer detects inconsistencies much faster
than bottom layer - It is desirable that most inconsistencies are
captured by the top layer - Analysis result
- In worst case scenario, two sub-cases exist
- Case 1 failure rate 0.04
- Case 2 failure rate 18.9
- See paper for clarification
- Main message
- Top layer captures the vast majority of
inconsistencies!
17Evaluation 2 Maintenance Cost
- Metric
- of messages received by each node incurred by
the maintenance process - Simulation setup
- 1000 nodes in the network.
- Simulation runs 800 seconds.
- Result
- Max bandwidth cost lt 6KB/s
18Inconsistency Resolution
- Overview
- Utilize detection result
- Support multiple applications with different
requirement for consistency control - Semantic-based resolution (ongoing future work)
- Get semantics
- Hint-based
- Middleware detection
- Resolution schemes
- Middleware automatically resolves inconsistency
- Ask users preference before reacting
19Related Work
- TACT
- Explore trade-off between consistency level and
performance - DENO
- Peer-to-Peer scheme, yet to maintain strong
consistency - Lpbcast
- Pure gossip-based protocol
- Quorum system
- Could fails in the presence of node failure
20Current Status
- Dealing with inconsistency resolution
- Support applications.
- Implementing a prototype on Planet-Lab
- Investigating the implications of the new
framework to large-scale distributed systems in
general