Title: Dynamic Adaptation of Data Distribution Policies in a Shared Data Space System
1Dynamic Adaptation of Data Distribution Policies
in a Shared Data Space System
Giovanni Russello Michel Chaudron Dept. of
Mathematics and Computing Science Eindhoven
University of Technology
Maarten van Steen Faculty of Science, Dept.
Computer Science Vrije Universiteit Amsterdam
2Problem Context Perspectives
- Distributed System in which
- Application level software components join and
leave a system dynamically - ?The communication profile of applications is
irregular / unpredictable - Middleware designer
- How can I best cater for the communication needs
of an assembly of components? ? optimize resource
use - Component designer
- Can I design my component such that it is
independent of design choices in the
communication mechanisms of the middleware? ?
Hence enhancing reusability
3Shared Data Space Model Overview
A
B
C
put read take
shared data space
tuple ordered sequence of typed fields with
specified values
ltstr name, int agegt - ltGiovanni, 28gt
template ordered sequence of typed fields with
or without a specified value
ltstr name, int agegt - ltGiovanni, int ?gt
4Shared Data Space Model Features
- Small, yet powerful, API
- Uncoupling in time means that applications do not
need to communicate at the same time in order to
exchange data - Uncoupling in space allows applications to
cooperate even if they do not know the location
of each other - The computation is separated form the
coordination -
5Advantages for Component Based Systems
- Application components are not bound to any
specific application interface - Support of run time dynamic (de)composition
- Absence of referential information among
application components
6Existing SD Implementations
- Shared data spaces typically employ a static,
system-wide distribution scheme for distributing
data. - Often, the distribution scheme is dictated by
- the application characteristics
- target platform (HW)
- Examples of distribution schema
- Centralized (JavaSpaces)
- Uniform Distribution (Corradi et al.)
- Hash-based Distribution (Rowstron)
7Problem Statement
- Generally, applications have different needs for
the different types of data types used. - Examples of data usage pattern in a Process Farm
application - Master-Workers job data
- Write-many partial-result data
- Read-most result data
How can we maintain the simple programming model
of the shared data space, yet also cater for
different quality needs?
8A Solution Separation of Concerns
- Specify application functionality outside
extra-functional concerns (such as data
distribution) - A pre-condition is that computation is separated
from coordination - Treat different data types using different
distribution policies .. - .. in order to distribute data more efficiently.
Huh, more efficiently?!?!
9Our Approach GSpace
- Distributed Shared Data Space System
- Separation of functionality from extra-functional
requirements - Differentiation of distribution policy per tuple
type - Dynamic adaptation of distribution policy
- Extendable suite of distribution policies
10GSpace Kernel Deployment
Node 1
Node 2
Node n
Network
11Examples of Distribution Policies
- Store locally (SL)
- Full replication (FR)
- Caching with invalidation (CI)
- Caching with verification (CV)
12Separating Concerns in GSpace
mapping
Implementation
Specification
Application
Computation
Layer
Distribution
Middleware
Coordination
Middleware
Policy Descriptor
Layer
downloading
NW Level
Layer
Tuple Type Ti Distribution Policy Pj
13 More Efficient, Huh?
Minimize the costs involved in data distribution
Cost Function captures the performance of a given
distribution policy during a period of time
CF (p) w1 m1,p w2 m2,p wn mn,p
? wi 1
The policy that produces the lowest CF value
represents the best policy
mi,js are performance indicators (to be measured
from the system )
14Performance Metrics
- Read latency (rl) time spent for reading a
tuple - Take latency (tl) time spent for taking a tuple
- Bandwidth usage (bu) amount of bandwidth used
for distributing tuples and synch messages - Memory usage (mu) amount of memory used for
storing tuples
CF (p) w1 rlp w2 tlp w3 mup w4
bup ? wi 1
15GSpace Kernel Internal Structure
Application Level
Application requests
Middleware Level
Inter-kernel comm
Net OS Level
16OPS Modules
Application Level
OPS
Middleware Level
GSpace Kernel
Net OS Level
17Adaptation System Modules
For each tuple type there is one AM working in
Master mode. All the others work in Slave mode
Application Level
From Controller
AS
Adaptation Module
Logger
Cost Computation Module
To AddTable and PolTable
Middleware Level
To LocDataSpace and DPS
DPCM
Adapt-Comm
Module
GSpace Kernel
Net OS Level
18Adaptation Mechanism Phases
- Logging
- Evaluation
- Adaptation (optional)
19Logging Phase MSC
20Evaluation Phase MSC
21Adaptation Phase MSC
22Experiment Settings
coordinator
Application Model
...
noden
node2
node1
n 2 10
- Application Usage Patterns Simulated by the
Application Model - Local Usage Pattern (LUP)
- Write-many Usage Pattern (WUP)
- Read-mostly Usage Pattern (RUP) (i) and (ii)
Example of a operation run p,l1r,l2r,l2
r,l2t,l1..
23Experiment Settings
run
generate operation run
run phase3
run phase2
run phase1
coordinator
- Static settings Adaptation disabled
- Dynamic settings Adaptation enabled
24Experiment Settings
Static settings Adaptation disabled
Operation Run
Any more distribution policy?
Select the policy with min CF value
yes
Assign the distribution policy to the tuple type
no
Execute the Operation run
Compute the CF value
25Experiment Settings
Dynamic settings Adaptation enabled
same operation run as previous experiments
Operation Run
Any more threshold values?
yes
no
Execute the operation run for the selected
threshold
termination
Aggregate the min CF and actual CF values
During each evaluation phase store the min CF
value and the CF value of the actual policy
26CF of Adaptive and Static Settings
wi 0,25 for all i
Run-phase length 500
27CF of Adaptive and Static Settings
wi 0,25 for all i
Run-phase length 8000
28Accuracy of the Cost Model
29Adaptation Mechanism Overhead
Percentage of time spent in different modules of
a kernel
30Conclusions Future work
- C1 Architecture and Distributed Implementation
of a Shared Data Space - High flexibility at small programming effort
- Adaptivity caters for changing application
behavior - C2 SoC enhances reusability of application
components and distribution policies - C3 Experimental validation
- F Extend support for other extra-functional
concerns - (Real-time, availability,)