Caching and Data Consistency in P2P - PowerPoint PPT Presentation

About This Presentation
Title:

Caching and Data Consistency in P2P

Description:

Caching and Data Consistency in P2P Dai Bing Tian Zeng Yiming Caching and Data Consistency Why Caching Caching helps use bandwidth more efficiently The data ... – PowerPoint PPT presentation

Number of Views:127
Avg rating:3.0/5.0
Slides: 60
Provided by: MikeZ156
Category:

less

Transcript and Presenter's Notes

Title: Caching and Data Consistency in P2P


1
Caching and Data Consistency in P2P
  • Dai Bing Tian
  • Zeng Yiming

2
Caching and Data Consistency
  • Why Caching
  • Caching helps use bandwidth more efficiently
  • The data consistency in this topic is different
    from the consistency in distributed database
  • It refers to the consistency between cached copy
    and data on servers.

3
Introduction
  • Caching is built based on current P2P
    architectures like CAN, BestPeer, Pastry, etc.
  • Caching layer is between application layer and
    P2P layer.
  • Every peer has its cache control unit and its
    local cache, and publish the cache contents

4
Presentation Order
  • We will present four papers, they are
  • Squirrel
  • PeerOLAP
  • Caching for Range Queries
  • With CAN
  • With DAG

5
Overview
Paper Based on Caching Consistency
Squirrel Pastry Yes Yes
PeerOLAP BestPeer Yes No
RQ with CAN CAN Yes Yes
RQ with DAG Not Specified Yes Yes
6
Squirrel
  • Enables web browsers on desktop machines to share
    their local caches
  • Uses a self-organizing, peer-to-peer network
    Pastry as its object location service
  • Pastry is fault resilient, so is Squirrel

7
Web Caching
  • Web browser generate HTTP GET requests
  • If the object is in the local cache, return it if
    fresh enough
  • freshness can be checked by submitting cGET
    request
  • If no such object, issue GET request to the
    server
  • For simplicity, we assume objects are cacheable

8
Home Node
  • As described in Pastry, every peer (node) has its
    nodeID
  • objectID SHA-1 (obj URL)
  • This object is assigned to the node whose ID is
    numerically nearest to the objectID
  • The node who owns this object is called the home
    node of this object

9
Two approaches
  • There are two approaches of Squirrel
  • Home-store
  • Directory
  • Home-store stores the object directly in the
    cache of the home node
  • Directory stores the pointer to the nodes who
    have this object in its cache, these nodes are
    called delegates

10
Home-store
WAN
Origin Server
Requester
LAN
Send A over
Send A over
Yes, it is fresh
Request for A
Yes, it is fresh
Request for A
Is my copy of A fresh?
Is my copy of A fresh?
Home Node
Request Routed Through Pastry
11
Directory
Origin Server
Send A over
Request for A
Send A over
Yes, it is fresh
Request for A
Requester
Is my copy of A fresh?
Send A over
WAN
Request for A
Delegate
LAN
Requester and I are your delegates
Get it from D
Update Meta-info Keep the directory
Request for A
Get it from Server
No directory
Request Routed Through Pastry
Home Node
Im your delegate
12
Conclusion
  • The home-store approach is less complicated, but
    it does not have any collaboration
  • The directory approach is more collaborative, it
    has the ability to store more objects in those
    peers with larger cache capacity, by setting the
    pointers to these peers in the directory

13
PeerOLAP
  • OnLine Analytical Processing (OLAP) query
    typically involves large amounts of data
  • Each peer has a cache containing some results
  • An OLAP query can be answered by combining
    partial results from many peers
  • PeerOLAP acts as a large distributed cache

14
Data Warehouse Chunk
  • A data warehouse is based on a multidimensional
    data model which views data in the form of a data
    cube.
  • Han Kamber

http//www.cs.sfu.ca/han/dmbook
15
PeerOLAP network
  • LIGLO servers provide global name lookup and
    maintain a list of active peers
  • Except for LIGLO servers, the network is fully
    distributed without any centralized
    administration point

16
Query Processing
  • Assumption 1 Only chunks at the same aggregation
    level as the query are considered
  • Assumption 2 The selecting predicates is a
    subset of grouping-by predicates

17
Cost Model
  • Every chunk is associated with a cost value,
    indicating how long it spends to get this chunk

18
Eager Query Processing (EQP)
  • Peer P sends requests for the missing chunks to
    all its neighbors, Q1, Q2, .... Qk
  • Each Qi provides the desired chunks as many as
    possible, return to P with a cost associated with
    each chunk
  • Qi then propagates the requests to all its
    neighbors recursively
  • In order to avoid flooding, hmax is set to limit
    the depth of the search

19
EQP (Contd.)
  • P collects (chunk, cost) pairs from all its
    neighbors
  • Random select one chunk ci, and find the peer who
    can provide it with lowest cost, Qi
  • For the subsequent chunks, it evaluates the
    minimum of two cases the peer with lowest cost
    is not connected yet, or some existing peer who
    can also provide this chunk
  • Ask for chunks from these peers and the rest
    missing chunks from the warehouse.

20
Lazy Query Processing (LQP)
  • Instead of propagating the requests from each Qi
    to all its neighbors, each Qi selects its most
    beneficial neighbor, and forward the request.
  • Given the expected number of neighbors a peer has
    is k, EQP will visit O(khmax) nodes, LQP only
    visit O(khmax)

21
Chunk Replacement
  • Least Benefit First (LBF)
  • Similar to LRU, every chunk has a weight
  • Once the chunk is used by P, its weight is set
    back to the original benefit value
  • Every time there is a new chunk come in, the
    weight of old chunks will reduce

22
Collaboration
  • LBF gives local chunk replacement algorithm
  • 3 variations of global behavior
  • Isolated Caching Policy non-collaborative
  • Hit Aware Caching Policy collaborative
  • Voluntary Caching highly collaborative

23
Network Reorganization
  • Optimization can be done by creating virtual
    neighborhoods of peers with similar query
    patterns
  • So that there is a high probability for P to get
    missing chunks directly from neighbors
  • Each connection is assigned a benefit value and
    the most beneficial connections are selected to
    be the peers neighbors

24
Conclusion
  • PeerOLAP is a distributed caching system for OLAP
    results
  • By sharing the contents of individual caches,
    PeerOLAP constructs a large virtual cache which
    can benefit all peers
  • PeerOLAP is fully distributed and highly scalable

25
Caching For Range Queries
  • Range Query
  • E.g.
  • SELECT Student.name
  • WHERE 20ltStudent.agelt30
  • Why Cache?
  • Data source too far away from the requesting node
  • Data source overloaded with queries
  • Data source is a single point of failure
  • What to cache?
  • All tuples falling in the range
  • Who cache?
  • Peers responsible for the range

26
Problem Definition
  • Given a relation R, and a range attribute A, we
    assume that the results of prior range-selection
    queries of the form R.A(LOW, HIGH) are stored at
    the peers. When a query is issued at a peer which
    requires the retrieval of tuples from R in the
    range R.A(low, high), we want to locate a peer in
    the system which already stores tuples that can
    be accessed to compute the answer.

27
A P2P Framework for Caching Range Queries
  • Based on CAN.
  • Map data into 2d virtual space, where d is
    dimensions of the relation.
  • For every dimension/attribute, say its domain is
    a, b, it is mapped to a square virtual hash
    space whose corner coordinates are (a,a), (b,a),
    (b,b) and (a,b).
  • The virtual hash space is further partitioned
    into rectangular areas, each of which is called a
    zone.

28
Example
  • Virtual hash space for an attribute whose domain
    is 10,70
  • zone-1 lt(10,56),(15,70)gt
  • zone-5 lt(10,48),(25,56)gt
  • zone-8 lt(47,10),(70,54)gt

29
Terminology
  • Each zone is assigned to a peer.
  • Active Peer
  • Owns a zone
  • Passive Peer
  • Not participate in the partitioning, register
    itself with an active peer
  • Target Point
  • A range low,high is hashed to a point with
    coordinates (low,high)
  • Target Zone
  • Where the target point resides
  • Target Node
  • The peer that owns the target zone
  • Stores the tuples falling into the range which
    is mapped to the its zone
  • Caches the tuples in the local cache OR
  • Stores a pointer to the peer who caches the tuples

30
Zone Maintenance
  • Initially, only the data source is the active
    node and the entire virtual hash space is its
    zone
  • A zone split happens under two conditions
  • Heavy Answering Load
  • Heavy Routing Load

31
Example of Zone Splits
  • If a zone has too many queries to answer
  • It finds the x-median and y-median of the stored
    results. Determine if a split at x-median or
    y-median results in even distribution of stored
    answers and the space.
  • If a zone is overloaded because of routing
    queries
  • It splits the zone from the midpoint of the
    longer side.

32
Answering A Range Query
  • If an active node poses the query, the query is
    initiated from the corresponding zone if a
    passive node poses the query, it contacts any
    active node from where the query starts routing.
  • 2 steps involved
  • Query Routing
  • Query Forwarding

33
Query Routing
  • If the target point falls in this zone
  • Return this zone
  • Else
  • Route the query to the neighbor who is closest
    to the target point

(26,30)
34
Query Routing
  • If the target point falls in this zone
  • Return this zone
  • Else
  • Route the query to the neighbor who is closest
    to the target point

(26,30)
35
Query Routing
  • If the target point falls in this zone
  • Return this zone
  • Else
  • Route the query to the neighbor who is closest
    to the target point

(26,30)
36
Forwarding
  • If the results are stored in the target node,
    then the results are sent back to the querying
    node
  • Else, it is still possible that zones lie in the
    upper left area of the target point store the
    results. So we need to forward the query to these
    zones too.

37
Example
  • If no results are found in zone-7, the shaded
    region may still contains the results.
  • Reason Any prior range query q whose range
    subsumes (x,y) must be hashed into the shaded
    region.

38
Forwarding (Cont.)
  • How far should it go?
  • For a range (low,high), we want to restrict to
    results falling in (low-offset,highoffset),
    where offset AcceptableFit x domain.
  • AcceptabelFit 0,1
  • The shaded square defined by the target point and
    offset is called the Acceptable Region

offset
39
Forwarding (Cont.)
  • Flood Forwarding
  • A naïve approach. Forward to the left and top
    neighbors if they fall in the acceptable region
  • Directed Forwarding
  • Forward to the neighbor that maximally overlaps
    with the acceptable region
  • Can bound the number of forwards by specifying a
    limit d, which is decremented for every forward.

40
Discussion
  • Improvements
  • Lookup During Routing
  • Warm up queries
  • Peer soft-departure Failure event
  • Updatecache consistency
  • Say a tuple t with range attribut ak is updated
    in the data source, then the target zone of point
    (k,k) and all zones lie in the upper left region
    have to update their cache.

41
Range Addressable Network A P2P Cache
Architecture for Data Ranges
  • Assumption
  • Tuples stored in the system are labeled 1,2,,N
    according to the range attribute
  • A range a,b is a contiguous subset of
    1,2,,N, where 1ltaltbltN
  • Objective
  • Given a query range a,b, peers cooperatively
    find results falling in the shortest superset of
    a,b, if they are cached somewhere.

42
Overview
  • Based on Range Addressable DAG (Directed Acyclic
    Graph)
  • Map every active node in the P2P system to a
    group of nodes in the DAG
  • A node is responsible for storing results and
    answering queries falling into a specific range

43
Range Addressable DAG
  • The entire universe 1,N is mapped to the root.
  • Recursively divide one node into 3 overlapping
    intervals of equal length.

44
Range Lookup
7,13
  • Input a query range qa,b,
  • a node v in DAG
  • Output the shortest range in
  • DAG that contains q
  • boolean downtrue
  • search (q, v)
  • if q i(v)
  • search (q, parent(v))
  • if q i(child(v)) down
  • search (q, child(v))
  • else
  • if some range stored at v is a superset of q
  • return the shortest range containing q that is
    stored at v or parent(v) ()
  • else
  • downfalse
  • search(q,parent(v))

5,12
Q 7,10
45
Peer Protocol
  • Maps the logical DAG structure to physical peers
  • Two components
  • Peer Management
  • Handles peer joining, leaving, failure
  • Range Management
  • Deals with query routing and updates

46
Peer Management
  • It ensures that at any time,
  • every node in the DAG is assigned to some peer
  • the nodes belonging to one peer, called a zone,
    is a connected component of the DAG
  • This is done by handling Join Request, Leave
    Request, Failure Event properly.

47
Join Request
  • The first peer joining the system takes over the
    entire DAG
  • A new peer joining the system contacts one of the
    peers in the system to take over one of its child
    zones. Default strategy left child, then mid
    child, then right child.

48
Join Request
  • The first peer joining the system takes over the
    entire DAG
  • A new peer joining the system contacts one of the
    peers in the system to take over one of its child
    zones. Default strategy left child, then mid
    child, then right child.

49
Join Request
  • The first peer joining the system takes over the
    entire DAG
  • A new peer joining the system contacts one of the
    peers in the system to take over one of its child
    zones. Default strategy left child, then mid
    child, then right child.

50
Join Request
  • The first peer joining the system takes over the
    entire DAG
  • A new peer joining the system contacts one of the
    peers in the system to take over one of its child
    zones. Default strategy left child, then mid
    child, then right child.

51
Leave Request
  • When a peer wants to leave (soft departure), it
    hands over its zone to the smallest neighboring
    zone.
  • Neighboring zones there is a parent-child
    relationship among any nodes in the zones

52
Leave Request
  • When a peer wants to leave (soft departure), it
    hands over its zone to the smallest neighboring
    zone.
  • Neighboring zones there is a parent-child
    relationship among any nodes in the zones

53
Failure Event
  • A zone maintains info on all its ancestors. So in
    case it finds out one of its parents failed, it
    contacts the nearest alive ancestor for zone
    takeover.

54
Range Management
  • Range Lookup
  • Range Update
  • When a tuple is updated in the data source, we
    locate the peer with the shortest range
    containing that tuple, then update this peer and
    all its ancestors.

55
Improvement
  • Cross Pointers
  • For a node v, if its the left child of its
    parent, then it keeps cross pointers to all the
    left children of nodes that are in its parents
    level.
  • Similarly for mid child.

56
Improvement (Cont.)
P1
  • Load Balancing by Peer Sampling
  • Collapsed DAG collapse each peers zone to a
    single node.
  • The system is balanced if the collapsed DAG is
    balanced.
  • Lookup time is O(h) where h is the height of the
    collapsed DAG. Hence a balanced system leads to
    optimal performance.
  • When a new peer joins, it polls k peers randomly,
    and send join request to the one whose zone is
    rooted nearest to the root.

P2
P3
57
Improvement (Cont.)
  • Load Balancing by Peer Sampling
  • Collapsed DAG collapse each peers zone to a
    single node.
  • The system is balanced if the collapsed DAG is
    balanced.
  • Lookup time is O(h) where h is the height of the
    collapsed DAG. Hence a balanced system leads to
    optimal performance.
  • When a new peer joins, it polls k peers randomly,
    and send join request to the one whose zone roots
    nearest to the root.

Collapsed DAG
58
Conclusion
  • Caching Range Queries based on CAN
  • Maps every attribute into a 2D space
  • The space is divided into zones
  • Peers manage their respective zones
  • A range low,high is mapped to a point
    (low,high) in the 2D space
  • Query Routing Query Forwarding

59
Conclusion (Cont.)
  • Range Addressable Network
  • Model ranges as DAG
  • Every peer takes responsibility of a group of
    nodes in DAG
  • Querying involves traversal of the DAG
Write a Comment
User Comments (0)
About PowerShow.com