Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks - PowerPoint PPT Presentation

1 / 62

About This Presentation

Title:

Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks

Description:

Total Access/Update Cost. Total Access Cost ... Entries of the form: (data item, the nearest cache) Above is on top of routing table. ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 63

Provided by: bint

Category:

more less

Transcript and Presenter's Notes

Title: Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks

1
Ph.D. Thesis ProposalData Caching in Ad Hoc
and Sensor Networks

Bin Tang
Computer Science Department
Stony Brook University

2
Summary of My Work

Data Caching
Update cost constraint
Optimal algorithm for tree approximation
algorithm for general graph.
Memory constraint with multiple data items
Approximation algorithm for general graph
number constraint w/h read/write/storage cost
Optimal algorithm for tree
Localized distributed implementations.
Compare with existing work

3
Motivation

Ad hoc and sensor networks are resource
constrained
Limited bandwidth, battery energy, and memory
Caching can save access (communication) cost, and
thus, bandwidth and energy
Under update cost, memory, number constraint

4
Rooted in

Facility location problem set up facilities in a
network to minimize total access cost and setting
up cost
K-median problem set up k facilities to minimize
total access cost

5
1. Cache Placement in Sensor Networks Under
Update Cost Constraint
6
Problem Statement

Sensor Network Model
A data item stored at a server node.
Updated at a certain frequency.
Other nodes access the data item at a certain
frequency.
Problem Statement
Select nodes to cache the data item to
Goal Minimize total access cost
Constraint Total update cost.

7
Why update cost constraint?

Nodes close to the server bear most of the
update cost.

8
Problem Formulation

Given
Network graph G(V,E).
A data item stored at a server node
Update frequency
Access frequency for each other node
Update cost constraint ?
Goal
Select cache nodes to minimize the total access
cost
Total update cost is less than ?

9
Total Access/Update Cost

Total Access Cost
? i ? V (hop length between i and its nearest
cache x access frequency of i)
Total Update cost cost of the optimal Steiner
tree over server and all caches

10
Algorithm Design Outline

Tree Networks
Optimal dynamic programming algorithm.
General Networks
Multiple-unicast update model -- Approximation
algorithm.
Steiner-tree update model Heuristic and
Distributed.

11
Tree Networks
12
Subtree notation
r

Server r
Consider a subtree Tv.
Let path (v,x) on its leftmost branch be all
caches.
Let C_v be the optimal access cost in Tv using
additional update cost d
Next Recursive equation for C_v

v
x
Tv
Tr
13
Dynamic Programming Algorithm for Tvunder update
cost constraint d

Let u leftmost deepest node in the optimal set
of caches in Tv
Path(v,u) can be all caches (update cost doesnt
increase)
For a fixed u,
C_v
Constant optimal access cost in Rv,u for
constraint (d d_u)
Here, d_u is the cost to update u (using
path(v,x)).

Tv Lv,u Tu Rv,u
14
DP recursive equation for Tv

C_v minu ? Tv (access cost in Lv,u using
path(v,x) or path(v,u)
access cost in Tu using u
optimal cost in Rv,u with constraint d d_u)
Here, d_u is the cost in updating u (using
path(v,x)).
Note that Rv,u has a path (v, parent(u)) of
caches on its leftmost branch.

15
Time complexity

Time complexity O(n4n3 ?)
Analysis
Precomputation takes O(n4)
Lv,u with cache path (v,x) O(n4), for all v,u,x
Tu O(n2), for all u
Recursive equation takes O(n3 ?)
n2? entries for each pair of (v,x) and all
values of ?
Each entry takes O(n) n possible u

16
General Graph Network

Two Update Cost Models
Multiple-Unicast
Optimal Steiner Tree

17
Multiple-Unicast Update Model

Update cost Sum of shortest path lengths from
server to each cache node
Benefit of node A Decrease in total access cost
due to selection of A as a cache
Benefit per unit update cost.

18
Greedy Algorithm

Iteratively Select the node with the highest
benefit per unit update cost, until the update
cost is exhausted
Theorem Greedy solutions benefit is at least
63 of the optimal benefit.

19
Steiner-Tree Update Cost Model

Steiner-tree update cost Cost of 2-approximation
Steiner tree over cache nodes
Incremental Steiner update cost of node A
Increase in Steiner-tree update cost due to A
becoming a cache
Greedy-Steiner Algorithm
Iteratively, select the node with the highest
benefit per unit above-defined update cost.

20
Distributed Greedy-Steiner Algorithm

Each non-cache node estimates its benefit per
unit update cost
If the estimate is maximum among all its
non-cache neighbors, then it decides to cache
Algorithm
In each rounds, each node decides to cache based
on above.
The server gathers new cache node information,
and computes the total update cost
The remaining update cost is broadcast to the
network, and the new round begins

21
Performance Evaluation

(i) network-related -- number of nodes and
transmission radius,
(ii) application-related -- number of
clients.
Random network of 2,000 to 5,000 nodes in a 30 x
30 region.

22
Compared Caching Schemes

Centralized Greedy
Centralized Greedy-Steiner
Distributed Greedy-Steiner
Dynamic Programming on Shortest Path Tree of
Clients
Dynamic Programming on Steiner Tree over Clients
and Server

23
Varying Network Size Transmission radius 2,
percentage of clients 50, update cost 25 of
the Steiner tree cost
24
Varying Transmission Radius - Network size
4000, percentage of clients 50, update cost
25 of the Steiner tree cost
25
Varying number of clients Transmission Radiu
2, update cost 50 of the Steiner tree cost,
network size 3000
26
To Recap

Data caching problem under update cost
constraint.
Optimal algorithm for tree an approximation
algorithm for general graph.
Efficient distributed implementations.
More general cache placement problem
(a) under memory constraint (b) multiple data
items.

27
2. Data Caching under Memory Constraint
28
Problem Addressed

In a general ad hoc network with limited memory
at each node, where to cache data items, such
that the total access (communication) cost is
minimized?

29
Problem Formulation

Given
Network graph G(V,E)
Multiple data items
Access frequencies (for each node and data item)
Memory constraint at each node
Select data items to cache at each node under
memory constraint
Minimize total access cost ?nodes ?data items
(distance from node to the nearest cache
for that data item) x (access frequency)

30
Related Work

Related to facility-location problem and K-median
problem No memory constraint
Baev and Rajaraman
20.5-approximation algorithm for uniform-size
data item
For non-uniform size, no polynomial-time
approximation unless P NP
We circumvent the intractability by approximating
benefit instead of access cost

31
Related Work - continued

Two major empirical works on distributed caching
Hara infocom99
Yin and Cao Infocom 04 (we compare our work
with theirs)
Our work is the first to present a distributed
caching scheme based on an approximation algorithm

32
Algorithms

Centralized Greedy Algorithm (CGA)
Delivers a solution whose benefit is at least
1/2 of the optimal benefit
Distributed Greedy Algorithm (DGA)
Purely localized

33
Centralized Greedy Algorithm (CGA)

Benefit of caching a data item at a node
the reduction of total access cost
i.e., (total access cost before caching)
(total access cost after caching)

34
Centralized Greedy Algorithm (CGA)

CGA iteratively selects the most beneficial
(data item, node to cache at) pair.
I.e., we pick (at each stage) the pair that has
the maximum benefit.
Theorem CGA is (1/2)approximate for uniform
data item.
¼-approximate for non-uniform size data item

35
CGA Approximation Proof Sketch

G modified G, where each node
has twice memory of that in G
caches data items selected by CGA and optimal
B(Optimal in G)
lt B(Greedy Optimal in G)
B(Greedy) B(Optimal) w.r.t Greedy
lt B(Greedy) B(Greedy) Due to greedy choice
2 x B(Greedy)

36
Distributed Greedy Algorithm (DGA)

Each node caches the most beneficial data items,
where the benefit is based on local traffic
only.
Local Traffic includes
Its own data requests
Data requests to its data items
Data requests forwarding to others

37
DGA Nearest Cache Table

Why do we need it?
Forward requests to the nearest cache
Local Benefit calculation
What is it?
Each nodes keeps the ID of nearest cache for each
data item
Entries of the form (data item, the nearest
cache)
Above is on top of routing table.
Maintenance next slide

38
Maintenance of Nearest-cache Table

When node i caches data Dj
broadcast (i, Dj) to neighbors
Notify server, which keeps a list of caches
On recv (i, Dj)
if i is nearer than current nearest-cache of Dj,
update and forward

39
Maintenance of Nearest-cache Table -II

i deletes Dj
get list of caches Cj from server of Dj
broadcast (i, Dj, Cj) to neighbors
On recv (i, Dj, Cj)
if i is current nearest-cache for Dj, update
using Cj and forward

40
Maintenance of Nearest-cache Table -III

More details pertaining to
Mobility
Second-nearest cache entries (needed for benefit
calculation for cache deletions)
Benefit thresholds

41
Performance Evaluation

CGA vs. DGA Comparison
DGA vs. HybridCache Comparison

42
CGA vs. DGA

Summary of simulation results
DGA performs quite close to CGA, for wide range
of parameter values

43
Varying Number of Data Items and Memory Capacity
Transmission radius 5, number of nodes 500
44
DGA vs. Yin and Caos work.

Yin and Caoinfocom04
CacheData caches passing-by data item
CachePath caches path to the nearest cache
HybridCache caches data if size is small
enough, otherwise caches the path to the data
Only work of a purely distributed cache placement
algorithm with memory constraint

45
DGA vs. HybridCache

Simulation setup
Ns2, routing protocol is DSDV
Random waypoint model, 100 nodes move at a speed
within (0,20m/s), 2000m x 500m area
Tr250m, bandwidth2Mbps
Performance metrics
Average query delay
Query success ratio
Total number of messages

Server Model
1000 data items, divided into two servers.
Data item size 100, 1500 bytes
Data access models
Random Each node accesses 200 data items
randomly from the 1000 data items
Spatial (details skipped)
Naïve caching algorithm caches any passing-by
data, uses LRU for cache replacement

47
Varying query generate time on random access
pattern
48
Summary of Simulation Results

Both HybridCache and DGA outperform Naïve
approach
DGA outperforms HybridCache in all metrics
Especially for frequent queries and small cache
size
For high mobility, DGA has slightly worse average
delay, but much better query success ratio

49
To Recap

Data caching problem for multiple items under
memory constraint
Centralized approximation algorithm
Localized distributed implementation
No update or storage cost are considered
(otherwise, no performance guarantee)
Can we consider and minimize the total cost of
read/write/storage ?

50
3. Data Caching Under Number Constraint
51
Problem Formulation

Given
Network graph G(V,E).
A data item to be stored in the network
Access (read) frequency for each node
Write frequency for each node
Caching (storage) cost for each node
Number of allowable caching node P
Goal
Select cache nodes to minimize the total cost
Under number constraint

52
Total Cost

Total read cost total write cost total
storage cost
? i ? V (hop length between i and its
nearest cache x access frequency of i)
? i ? V (cost of optimal steiner tree over i
and all caches x write frequency of i)
? i ? cache nodes (storage cost at i)

53
Related Work

K-median problem (access and storage cost)
Tamir attains the best time complexity in tree
We generalize it with write cost in both tree (
O(n2P3) ) and general graph
Kalpakis et al. solves the same problem, with
time complexity O(n6P3)

54
Tree Topology
55
Tamirs DP Algorithm on tree Tr

Transform arbitrary tree into full binary tree
Each non-leaf node v has two children v1, v2
For each v in binary tree, compute and sort the
distance from v to all nodes
leaves to root dynamic programming algorithm

56
Our DP Algorithm

Ideal For each node v in Tr
the cost of sub-tree Tv
access cost of nodes in Tv
storage cost of caching nodes in Tv
write cost of all the writer nodes in Tr due to
edges in Tv

57
DP Algorithm - Definitions

G(v, q, r) optimal cost for subtree Tv, exact q
caches in Tv, closest to v is at most r hops away
F(v, q, r) optimal cost for Tv, exact q caches
in Tv some cache nodes outside of Tv, closest to
v is r hops away
F(v, r) optimal cost for Tv, no cache in Tv
some cache nodes outside of Tv, closest to v is r
hops away

58
Recursive DP Equations p cache nodes allowed

1. G(v, q, 0) -- v is cache node
storage cost at v
the cost of Tv1, Tv2
the write cost on vv1, vv2
2. G(v, qltp, rgt0) there is some cache node
outside of Tv
min G(v, q, r-1), // there is cache in Tv
r-1 hops from v
cost in closest cache to v is r hops away

59
Recursive DP Equations - continued

3. G(v, qP, rgt0) no cache node outside of Tv
min G(v, q, r-1),
the cost of closest cache is r hops
away
4. F(v, q, r) there is cache node outside of Tv
min G(v, q, r-1),
the cost of closest cache to v is r hops
away

Minimum total cost of original tree Tr
min 1pP G(r, p, L, L is the hops of r
to the farthest node in Tr
Time Complexity O(n2P3)
For each p, vary q from 1 to q
For each (v, q), vary closest cache node to v (n
possibilities) and spit q in to Tv1, Tv2 (q such
possibilities)

61
Conclusion

We design optimal, near optimal and heuristics
for data caching under different constraint in ad
hoc and sensor networks
We show our algorithms can be implemented in
distributed way

62
Questions?

Write a Comment

User Comments (0)