Title: Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks
1Ph.D. Thesis ProposalData Caching in Ad Hoc
and Sensor Networks
- Bin Tang
- Computer Science Department
- Stony Brook University
2Summary of My Work
- Data Caching
- Update cost constraint
- Optimal algorithm for tree approximation
algorithm for general graph. - Memory constraint with multiple data items
- Approximation algorithm for general graph
- number constraint w/h read/write/storage cost
- Optimal algorithm for tree
- Localized distributed implementations.
- Compare with existing work
3Motivation
- Ad hoc and sensor networks are resource
constrained - Limited bandwidth, battery energy, and memory
- Caching can save access (communication) cost, and
thus, bandwidth and energy - Under update cost, memory, number constraint
4Rooted in
- Facility location problem set up facilities in a
network to minimize total access cost and setting
up cost - K-median problem set up k facilities to minimize
total access cost
51. Cache Placement in Sensor Networks Under
Update Cost Constraint
6Problem Statement
- Sensor Network Model
- A data item stored at a server node.
- Updated at a certain frequency.
- Other nodes access the data item at a certain
frequency. - Problem Statement
- Select nodes to cache the data item to
- Goal Minimize total access cost
- Constraint Total update cost.
7Why update cost constraint?
-
- Nodes close to the server bear most of the
update cost. -
-
8Problem Formulation
- Given
- Network graph G(V,E).
- A data item stored at a server node
- Update frequency
- Access frequency for each other node
- Update cost constraint ?
- Goal
- Select cache nodes to minimize the total access
cost - Total update cost is less than ?
9Total Access/Update Cost
- Total Access Cost
-
- ? i ? V (hop length between i and its nearest
cache x access frequency of i) - Total Update cost cost of the optimal Steiner
tree over server and all caches -
10Algorithm Design Outline
- Tree Networks
- Optimal dynamic programming algorithm.
- General Networks
- Multiple-unicast update model -- Approximation
algorithm. - Steiner-tree update model Heuristic and
Distributed.
11Tree Networks
12Subtree notation
r
- Server r
- Consider a subtree Tv.
- Let path (v,x) on its leftmost branch be all
caches. - Let C_v be the optimal access cost in Tv using
additional update cost d - Next Recursive equation for C_v
v
x
Tv
Tr
13Dynamic Programming Algorithm for Tvunder update
cost constraint d
- Let u leftmost deepest node in the optimal set
of caches in Tv - Path(v,u) can be all caches (update cost doesnt
increase) - For a fixed u,
- C_v
- Constant optimal access cost in Rv,u for
constraint (d d_u) - Here, d_u is the cost to update u (using
path(v,x)).
Tv Lv,u Tu Rv,u
14DP recursive equation for Tv
- C_v minu ? Tv (access cost in Lv,u using
path(v,x) or path(v,u) - access cost in Tu using u
optimal cost in Rv,u with constraint d d_u) -
- Here, d_u is the cost in updating u (using
path(v,x)). - Note that Rv,u has a path (v, parent(u)) of
caches on its leftmost branch.
15Time complexity
- Time complexity O(n4n3 ?)
- Analysis
- Precomputation takes O(n4)
- Lv,u with cache path (v,x) O(n4), for all v,u,x
- Tu O(n2), for all u
- Recursive equation takes O(n3 ?)
- n2? entries for each pair of (v,x) and all
values of ? - Each entry takes O(n) n possible u
16General Graph Network
- Two Update Cost Models
- Multiple-Unicast
- Optimal Steiner Tree
17Multiple-Unicast Update Model
- Update cost Sum of shortest path lengths from
server to each cache node - Benefit of node A Decrease in total access cost
due to selection of A as a cache - Benefit per unit update cost.
18Greedy Algorithm
- Iteratively Select the node with the highest
benefit per unit update cost, until the update
cost is exhausted - Theorem Greedy solutions benefit is at least
63 of the optimal benefit.
19Steiner-Tree Update Cost Model
- Steiner-tree update cost Cost of 2-approximation
Steiner tree over cache nodes - Incremental Steiner update cost of node A
Increase in Steiner-tree update cost due to A
becoming a cache - Greedy-Steiner Algorithm
- Iteratively, select the node with the highest
benefit per unit above-defined update cost.
20Distributed Greedy-Steiner Algorithm
- Each non-cache node estimates its benefit per
unit update cost - If the estimate is maximum among all its
non-cache neighbors, then it decides to cache - Algorithm
- In each rounds, each node decides to cache based
on above. - The server gathers new cache node information,
and computes the total update cost - The remaining update cost is broadcast to the
network, and the new round begins
21Performance Evaluation
- (i) network-related -- number of nodes and
transmission radius, - (ii) application-related -- number of
clients. - Random network of 2,000 to 5,000 nodes in a 30 x
30 region.
22Compared Caching Schemes
- Centralized Greedy
- Centralized Greedy-Steiner
- Distributed Greedy-Steiner
- Dynamic Programming on Shortest Path Tree of
Clients - Dynamic Programming on Steiner Tree over Clients
and Server
23Varying Network Size Transmission radius 2,
percentage of clients 50, update cost 25 of
the Steiner tree cost
24Varying Transmission Radius - Network size
4000, percentage of clients 50, update cost
25 of the Steiner tree cost
25Varying number of clients Transmission Radiu
2, update cost 50 of the Steiner tree cost,
network size 3000
26To Recap
- Data caching problem under update cost
constraint. - Optimal algorithm for tree an approximation
algorithm for general graph. - Efficient distributed implementations.
- More general cache placement problem
- (a) under memory constraint (b) multiple data
items.
272. Data Caching under Memory Constraint
28Problem Addressed
- In a general ad hoc network with limited memory
at each node, where to cache data items, such
that the total access (communication) cost is
minimized?
29Problem Formulation
- Given
- Network graph G(V,E)
- Multiple data items
- Access frequencies (for each node and data item)
- Memory constraint at each node
- Select data items to cache at each node under
memory constraint - Minimize total access cost ?nodes ?data items
- (distance from node to the nearest cache
for that data item) x (access frequency)
30Related Work
- Related to facility-location problem and K-median
problem No memory constraint - Baev and Rajaraman
- 20.5-approximation algorithm for uniform-size
data item - For non-uniform size, no polynomial-time
approximation unless P NP - We circumvent the intractability by approximating
benefit instead of access cost
31Related Work - continued
- Two major empirical works on distributed caching
- Hara infocom99
- Yin and Cao Infocom 04 (we compare our work
with theirs) - Our work is the first to present a distributed
caching scheme based on an approximation algorithm
32Algorithms
- Centralized Greedy Algorithm (CGA)
- Delivers a solution whose benefit is at least
1/2 of the optimal benefit - Distributed Greedy Algorithm (DGA)
- Purely localized
33Centralized Greedy Algorithm (CGA)
- Benefit of caching a data item at a node
-
- the reduction of total access cost
-
- i.e., (total access cost before caching)
- (total access cost after caching)
34Centralized Greedy Algorithm (CGA)
- CGA iteratively selects the most beneficial
- (data item, node to cache at) pair.
- I.e., we pick (at each stage) the pair that has
the maximum benefit. - Theorem CGA is (1/2)approximate for uniform
data item. - ¼-approximate for non-uniform size data item
35CGA Approximation Proof Sketch
- G modified G, where each node
- has twice memory of that in G
- caches data items selected by CGA and optimal
- B(Optimal in G)
- lt B(Greedy Optimal in G)
- B(Greedy) B(Optimal) w.r.t Greedy
- lt B(Greedy) B(Greedy) Due to greedy choice
- 2 x B(Greedy)
36Distributed Greedy Algorithm (DGA)
- Each node caches the most beneficial data items,
where the benefit is based on local traffic
only. - Local Traffic includes
- Its own data requests
- Data requests to its data items
- Data requests forwarding to others
37DGA Nearest Cache Table
- Why do we need it?
- Forward requests to the nearest cache
- Local Benefit calculation
- What is it?
- Each nodes keeps the ID of nearest cache for each
data item - Entries of the form (data item, the nearest
cache) - Above is on top of routing table.
- Maintenance next slide
38Maintenance of Nearest-cache Table
- When node i caches data Dj
- broadcast (i, Dj) to neighbors
- Notify server, which keeps a list of caches
- On recv (i, Dj)
- if i is nearer than current nearest-cache of Dj,
update and forward
39Maintenance of Nearest-cache Table -II
- i deletes Dj
- get list of caches Cj from server of Dj
- broadcast (i, Dj, Cj) to neighbors
- On recv (i, Dj, Cj)
- if i is current nearest-cache for Dj, update
using Cj and forward
40Maintenance of Nearest-cache Table -III
- More details pertaining to
- Mobility
- Second-nearest cache entries (needed for benefit
calculation for cache deletions) - Benefit thresholds
41Performance Evaluation
- CGA vs. DGA Comparison
- DGA vs. HybridCache Comparison
42CGA vs. DGA
- Summary of simulation results
- DGA performs quite close to CGA, for wide range
of parameter values
43Varying Number of Data Items and Memory Capacity
Transmission radius 5, number of nodes 500
44DGA vs. Yin and Caos work.
- Yin and Caoinfocom04
- CacheData caches passing-by data item
- CachePath caches path to the nearest cache
- HybridCache caches data if size is small
enough, otherwise caches the path to the data - Only work of a purely distributed cache placement
algorithm with memory constraint
45DGA vs. HybridCache
- Simulation setup
- Ns2, routing protocol is DSDV
- Random waypoint model, 100 nodes move at a speed
within (0,20m/s), 2000m x 500m area - Tr250m, bandwidth2Mbps
- Performance metrics
- Average query delay
- Query success ratio
- Total number of messages
46- Server Model
- 1000 data items, divided into two servers.
- Data item size 100, 1500 bytes
- Data access models
- Random Each node accesses 200 data items
randomly from the 1000 data items - Spatial (details skipped)
- Naïve caching algorithm caches any passing-by
data, uses LRU for cache replacement
47Varying query generate time on random access
pattern
48Summary of Simulation Results
- Both HybridCache and DGA outperform Naïve
approach - DGA outperforms HybridCache in all metrics
- Especially for frequent queries and small cache
size - For high mobility, DGA has slightly worse average
delay, but much better query success ratio
49To Recap
- Data caching problem for multiple items under
memory constraint - Centralized approximation algorithm
- Localized distributed implementation
- No update or storage cost are considered
(otherwise, no performance guarantee) - Can we consider and minimize the total cost of
read/write/storage ?
503. Data Caching Under Number Constraint
51Problem Formulation
- Given
- Network graph G(V,E).
- A data item to be stored in the network
- Access (read) frequency for each node
- Write frequency for each node
- Caching (storage) cost for each node
- Number of allowable caching node P
- Goal
- Select cache nodes to minimize the total cost
- Under number constraint
52Total Cost
- Total read cost total write cost total
storage cost - ? i ? V (hop length between i and its
nearest cache x access frequency of i) - ? i ? V (cost of optimal steiner tree over i
and all caches x write frequency of i) - ? i ? cache nodes (storage cost at i)
-
-
53Related Work
- K-median problem (access and storage cost)
- Tamir attains the best time complexity in tree
- We generalize it with write cost in both tree (
O(n2P3) ) and general graph - Kalpakis et al. solves the same problem, with
time complexity O(n6P3)
54Tree Topology
55Tamirs DP Algorithm on tree Tr
- Transform arbitrary tree into full binary tree
- Each non-leaf node v has two children v1, v2
- For each v in binary tree, compute and sort the
distance from v to all nodes - leaves to root dynamic programming algorithm
56Our DP Algorithm
- Ideal For each node v in Tr
- the cost of sub-tree Tv
- access cost of nodes in Tv
- storage cost of caching nodes in Tv
- write cost of all the writer nodes in Tr due to
edges in Tv
57DP Algorithm - Definitions
- G(v, q, r) optimal cost for subtree Tv, exact q
caches in Tv, closest to v is at most r hops away - F(v, q, r) optimal cost for Tv, exact q caches
in Tv some cache nodes outside of Tv, closest to
v is r hops away - F(v, r) optimal cost for Tv, no cache in Tv
some cache nodes outside of Tv, closest to v is r
hops away
58Recursive DP Equations p cache nodes allowed
- 1. G(v, q, 0) -- v is cache node
- storage cost at v
- the cost of Tv1, Tv2
- the write cost on vv1, vv2
- 2. G(v, qltp, rgt0) there is some cache node
outside of Tv - min G(v, q, r-1), // there is cache in Tv
r-1 hops from v - cost in closest cache to v is r hops away
-
59Recursive DP Equations - continued
- 3. G(v, qP, rgt0) no cache node outside of Tv
- min G(v, q, r-1),
- the cost of closest cache is r hops
away -
- 4. F(v, q, r) there is cache node outside of Tv
- min G(v, q, r-1),
- the cost of closest cache to v is r hops
away -
60- Minimum total cost of original tree Tr
- min 1pP G(r, p, L, L is the hops of r
to the farthest node in Tr - Time Complexity O(n2P3)
- For each p, vary q from 1 to q
- For each (v, q), vary closest cache node to v (n
possibilities) and spit q in to Tv1, Tv2 (q such
possibilities)
61Conclusion
- We design optimal, near optimal and heuristics
for data caching under different constraint in ad
hoc and sensor networks - We show our algorithms can be implemented in
distributed way
62Questions?