Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks - PowerPoint PPT Presentation

1 / 62
About This Presentation
Title:

Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks

Description:

Total Access/Update Cost. Total Access Cost ... Entries of the form: (data item, the nearest cache) Above is on top of routing table. ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 63
Provided by: bint
Category:

less

Transcript and Presenter's Notes

Title: Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks


1
Ph.D. Thesis ProposalData Caching in Ad Hoc
and Sensor Networks
  • Bin Tang
  • Computer Science Department
  • Stony Brook University

2
Summary of My Work
  • Data Caching
  • Update cost constraint
  • Optimal algorithm for tree approximation
    algorithm for general graph.
  • Memory constraint with multiple data items
  • Approximation algorithm for general graph
  • number constraint w/h read/write/storage cost
  • Optimal algorithm for tree
  • Localized distributed implementations.
  • Compare with existing work

3
Motivation
  • Ad hoc and sensor networks are resource
    constrained
  • Limited bandwidth, battery energy, and memory
  • Caching can save access (communication) cost, and
    thus, bandwidth and energy
  • Under update cost, memory, number constraint

4
Rooted in
  • Facility location problem set up facilities in a
    network to minimize total access cost and setting
    up cost
  • K-median problem set up k facilities to minimize
    total access cost

5
1. Cache Placement in Sensor Networks Under
Update Cost Constraint
6
Problem Statement
  • Sensor Network Model
  • A data item stored at a server node.
  • Updated at a certain frequency.
  • Other nodes access the data item at a certain
    frequency.
  • Problem Statement
  • Select nodes to cache the data item to
  • Goal Minimize total access cost
  • Constraint Total update cost.

7
Why update cost constraint?
  • Nodes close to the server bear most of the
    update cost.

8
Problem Formulation
  • Given
  • Network graph G(V,E).
  • A data item stored at a server node
  • Update frequency
  • Access frequency for each other node
  • Update cost constraint ?
  • Goal
  • Select cache nodes to minimize the total access
    cost
  • Total update cost is less than ?

9
Total Access/Update Cost
  • Total Access Cost
  • ? i ? V (hop length between i and its nearest
    cache x access frequency of i)
  • Total Update cost cost of the optimal Steiner
    tree over server and all caches

10
Algorithm Design Outline
  • Tree Networks
  • Optimal dynamic programming algorithm.
  • General Networks
  • Multiple-unicast update model -- Approximation
    algorithm.
  • Steiner-tree update model Heuristic and
    Distributed.

11
Tree Networks
12
Subtree notation
r
  • Server r
  • Consider a subtree Tv.
  • Let path (v,x) on its leftmost branch be all
    caches.
  • Let C_v be the optimal access cost in Tv using
    additional update cost d
  • Next Recursive equation for C_v

v
x
Tv
Tr
13
Dynamic Programming Algorithm for Tvunder update
cost constraint d
  • Let u leftmost deepest node in the optimal set
    of caches in Tv
  • Path(v,u) can be all caches (update cost doesnt
    increase)
  • For a fixed u,
  • C_v
  • Constant optimal access cost in Rv,u for
    constraint (d d_u)
  • Here, d_u is the cost to update u (using
    path(v,x)).

Tv Lv,u Tu Rv,u
14
DP recursive equation for Tv
  • C_v minu ? Tv (access cost in Lv,u using
    path(v,x) or path(v,u)
  • access cost in Tu using u
    optimal cost in Rv,u with constraint d d_u)
  • Here, d_u is the cost in updating u (using
    path(v,x)).
  • Note that Rv,u has a path (v, parent(u)) of
    caches on its leftmost branch.

15
Time complexity
  • Time complexity O(n4n3 ?)
  • Analysis
  • Precomputation takes O(n4)
  • Lv,u with cache path (v,x) O(n4), for all v,u,x
  • Tu O(n2), for all u
  • Recursive equation takes O(n3 ?)
  • n2? entries for each pair of (v,x) and all
    values of ?
  • Each entry takes O(n) n possible u

16
General Graph Network
  • Two Update Cost Models
  • Multiple-Unicast
  • Optimal Steiner Tree

17
Multiple-Unicast Update Model
  • Update cost Sum of shortest path lengths from
    server to each cache node
  • Benefit of node A Decrease in total access cost
    due to selection of A as a cache
  • Benefit per unit update cost.

18
Greedy Algorithm
  • Iteratively Select the node with the highest
    benefit per unit update cost, until the update
    cost is exhausted
  • Theorem Greedy solutions benefit is at least
    63 of the optimal benefit.

19
Steiner-Tree Update Cost Model
  • Steiner-tree update cost Cost of 2-approximation
    Steiner tree over cache nodes
  • Incremental Steiner update cost of node A
    Increase in Steiner-tree update cost due to A
    becoming a cache
  • Greedy-Steiner Algorithm
  • Iteratively, select the node with the highest
    benefit per unit above-defined update cost.

20
Distributed Greedy-Steiner Algorithm
  • Each non-cache node estimates its benefit per
    unit update cost
  • If the estimate is maximum among all its
    non-cache neighbors, then it decides to cache
  • Algorithm
  • In each rounds, each node decides to cache based
    on above.
  • The server gathers new cache node information,
    and computes the total update cost
  • The remaining update cost is broadcast to the
    network, and the new round begins

21
Performance Evaluation
  • (i) network-related -- number of nodes and
    transmission radius,
  • (ii) application-related -- number of
    clients.
  • Random network of 2,000 to 5,000 nodes in a 30 x
    30 region.

22
Compared Caching Schemes
  • Centralized Greedy
  • Centralized Greedy-Steiner
  • Distributed Greedy-Steiner
  • Dynamic Programming on Shortest Path Tree of
    Clients
  • Dynamic Programming on Steiner Tree over Clients
    and Server

23
Varying Network Size Transmission radius 2,
percentage of clients 50, update cost 25 of
the Steiner tree cost
24
Varying Transmission Radius - Network size
4000, percentage of clients 50, update cost
25 of the Steiner tree cost
25
Varying number of clients Transmission Radiu
2, update cost 50 of the Steiner tree cost,
network size 3000
26
To Recap
  • Data caching problem under update cost
    constraint.
  • Optimal algorithm for tree an approximation
    algorithm for general graph.
  • Efficient distributed implementations.
  • More general cache placement problem
  • (a) under memory constraint (b) multiple data
    items.

27
2. Data Caching under Memory Constraint
28
Problem Addressed
  • In a general ad hoc network with limited memory
    at each node, where to cache data items, such
    that the total access (communication) cost is
    minimized?

29
Problem Formulation
  • Given
  • Network graph G(V,E)
  • Multiple data items
  • Access frequencies (for each node and data item)
  • Memory constraint at each node
  • Select data items to cache at each node under
    memory constraint
  • Minimize total access cost ?nodes ?data items
  • (distance from node to the nearest cache
    for that data item) x (access frequency)

30
Related Work
  • Related to facility-location problem and K-median
    problem No memory constraint
  • Baev and Rajaraman
  • 20.5-approximation algorithm for uniform-size
    data item
  • For non-uniform size, no polynomial-time
    approximation unless P NP
  • We circumvent the intractability by approximating
    benefit instead of access cost

31
Related Work - continued
  • Two major empirical works on distributed caching
  • Hara infocom99
  • Yin and Cao Infocom 04 (we compare our work
    with theirs)
  • Our work is the first to present a distributed
    caching scheme based on an approximation algorithm

32
Algorithms
  • Centralized Greedy Algorithm (CGA)
  • Delivers a solution whose benefit is at least
    1/2 of the optimal benefit
  • Distributed Greedy Algorithm (DGA)
  • Purely localized

33
Centralized Greedy Algorithm (CGA)
  • Benefit of caching a data item at a node
  • the reduction of total access cost
  • i.e., (total access cost before caching)
  • (total access cost after caching)

34
Centralized Greedy Algorithm (CGA)
  • CGA iteratively selects the most beneficial
  • (data item, node to cache at) pair.
  • I.e., we pick (at each stage) the pair that has
    the maximum benefit.
  • Theorem CGA is (1/2)approximate for uniform
    data item.
  • ¼-approximate for non-uniform size data item

35
CGA Approximation Proof Sketch
  • G modified G, where each node
  • has twice memory of that in G
  • caches data items selected by CGA and optimal
  • B(Optimal in G)
  • lt B(Greedy Optimal in G)
  • B(Greedy) B(Optimal) w.r.t Greedy
  • lt B(Greedy) B(Greedy) Due to greedy choice
  • 2 x B(Greedy)

36
Distributed Greedy Algorithm (DGA)
  • Each node caches the most beneficial data items,
    where the benefit is based on local traffic
    only.
  • Local Traffic includes
  • Its own data requests
  • Data requests to its data items
  • Data requests forwarding to others

37
DGA Nearest Cache Table
  • Why do we need it?
  • Forward requests to the nearest cache
  • Local Benefit calculation
  • What is it?
  • Each nodes keeps the ID of nearest cache for each
    data item
  • Entries of the form (data item, the nearest
    cache)
  • Above is on top of routing table.
  • Maintenance next slide

38
Maintenance of Nearest-cache Table
  • When node i caches data Dj
  • broadcast (i, Dj) to neighbors
  • Notify server, which keeps a list of caches
  • On recv (i, Dj)
  • if i is nearer than current nearest-cache of Dj,
    update and forward

39
Maintenance of Nearest-cache Table -II
  • i deletes Dj
  • get list of caches Cj from server of Dj
  • broadcast (i, Dj, Cj) to neighbors
  • On recv (i, Dj, Cj)
  • if i is current nearest-cache for Dj, update
    using Cj and forward

40
Maintenance of Nearest-cache Table -III
  • More details pertaining to
  • Mobility
  • Second-nearest cache entries (needed for benefit
    calculation for cache deletions)
  • Benefit thresholds

41
Performance Evaluation
  • CGA vs. DGA Comparison
  • DGA vs. HybridCache Comparison

42
CGA vs. DGA
  • Summary of simulation results
  • DGA performs quite close to CGA, for wide range
    of parameter values

43
Varying Number of Data Items and Memory Capacity
Transmission radius 5, number of nodes 500
44
DGA vs. Yin and Caos work.
  • Yin and Caoinfocom04
  • CacheData caches passing-by data item
  • CachePath caches path to the nearest cache
  • HybridCache caches data if size is small
    enough, otherwise caches the path to the data
  • Only work of a purely distributed cache placement
    algorithm with memory constraint

45
DGA vs. HybridCache
  • Simulation setup
  • Ns2, routing protocol is DSDV
  • Random waypoint model, 100 nodes move at a speed
    within (0,20m/s), 2000m x 500m area
  • Tr250m, bandwidth2Mbps
  • Performance metrics
  • Average query delay
  • Query success ratio
  • Total number of messages

46
  • Server Model
  • 1000 data items, divided into two servers.
  • Data item size 100, 1500 bytes
  • Data access models
  • Random Each node accesses 200 data items
    randomly from the 1000 data items
  • Spatial (details skipped)
  • Naïve caching algorithm caches any passing-by
    data, uses LRU for cache replacement

47
Varying query generate time on random access
pattern
48
Summary of Simulation Results
  • Both HybridCache and DGA outperform Naïve
    approach
  • DGA outperforms HybridCache in all metrics
  • Especially for frequent queries and small cache
    size
  • For high mobility, DGA has slightly worse average
    delay, but much better query success ratio

49
To Recap
  • Data caching problem for multiple items under
    memory constraint
  • Centralized approximation algorithm
  • Localized distributed implementation
  • No update or storage cost are considered
    (otherwise, no performance guarantee)
  • Can we consider and minimize the total cost of
    read/write/storage ?

50
3. Data Caching Under Number Constraint
51
Problem Formulation
  • Given
  • Network graph G(V,E).
  • A data item to be stored in the network
  • Access (read) frequency for each node
  • Write frequency for each node
  • Caching (storage) cost for each node
  • Number of allowable caching node P
  • Goal
  • Select cache nodes to minimize the total cost
  • Under number constraint

52
Total Cost
  • Total read cost total write cost total
    storage cost
  • ? i ? V (hop length between i and its
    nearest cache x access frequency of i)
  • ? i ? V (cost of optimal steiner tree over i
    and all caches x write frequency of i)
  • ? i ? cache nodes (storage cost at i)

53
Related Work
  • K-median problem (access and storage cost)
  • Tamir attains the best time complexity in tree
  • We generalize it with write cost in both tree (
    O(n2P3) ) and general graph
  • Kalpakis et al. solves the same problem, with
    time complexity O(n6P3)

54
Tree Topology
55
Tamirs DP Algorithm on tree Tr
  • Transform arbitrary tree into full binary tree
  • Each non-leaf node v has two children v1, v2
  • For each v in binary tree, compute and sort the
    distance from v to all nodes
  • leaves to root dynamic programming algorithm

56
Our DP Algorithm
  • Ideal For each node v in Tr
  • the cost of sub-tree Tv
  • access cost of nodes in Tv
  • storage cost of caching nodes in Tv
  • write cost of all the writer nodes in Tr due to
    edges in Tv

57
DP Algorithm - Definitions
  • G(v, q, r) optimal cost for subtree Tv, exact q
    caches in Tv, closest to v is at most r hops away
  • F(v, q, r) optimal cost for Tv, exact q caches
    in Tv some cache nodes outside of Tv, closest to
    v is r hops away
  • F(v, r) optimal cost for Tv, no cache in Tv
    some cache nodes outside of Tv, closest to v is r
    hops away

58
Recursive DP Equations p cache nodes allowed
  • 1. G(v, q, 0) -- v is cache node
  • storage cost at v
  • the cost of Tv1, Tv2
  • the write cost on vv1, vv2
  • 2. G(v, qltp, rgt0) there is some cache node
    outside of Tv
  • min G(v, q, r-1), // there is cache in Tv
    r-1 hops from v
  • cost in closest cache to v is r hops away

59
Recursive DP Equations - continued
  • 3. G(v, qP, rgt0) no cache node outside of Tv
  • min G(v, q, r-1),
  • the cost of closest cache is r hops
    away
  • 4. F(v, q, r) there is cache node outside of Tv
  • min G(v, q, r-1),
  • the cost of closest cache to v is r hops
    away

60
  • Minimum total cost of original tree Tr
  • min 1pP G(r, p, L, L is the hops of r
    to the farthest node in Tr
  • Time Complexity O(n2P3)
  • For each p, vary q from 1 to q
  • For each (v, q), vary closest cache node to v (n
    possibilities) and spit q in to Tv1, Tv2 (q such
    possibilities)

61
Conclusion
  • We design optimal, near optimal and heuristics
    for data caching under different constraint in ad
    hoc and sensor networks
  • We show our algorithms can be implemented in
    distributed way

62
Questions?
Write a Comment
User Comments (0)
About PowerShow.com