First NASA Workshop on Performance-Engineered Information Systems - PowerPoint PPT Presentation

About This Presentation

Title:

First NASA Workshop on Performance-Engineered Information Systems

Description:

Kento Aida, Tokyo Institute of Technology. First NASA Workshop on Performance ... Ocha-U [SS10,2PEx8] (0.16MB/s, 32ms) NITech [Ultra2] (0.15MB/s, 41ms) TITech [Ultra1] ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 33

Provided by: ninfA9

Learn more at: https://ninf.apgrid.org

Category:

more less

Transcript and Presenter's Notes

Title: First NASA Workshop on Performance-Engineered Information Systems

1
First NASA Workshop on Performance-Engineered
Information Systems

A Performance Evaluation Model for Scheduling in
Global Computing Systems

Kento Aida (TIT) Atsuko Takefusa (Ochanomizu
Univ.) Hidemoto Nakada (ETL) Satoshi Matsuoka
(TIT) Satoshi Sekiguchi (ETL) Umpei Nagashima
(National Institute of Materials and Chemical
Research)
http//ninf.etl.go.jp/
2
Global Computing System

Proposed Global Computing Systems
Globus, Netsolve, Ninf, Legion, RCS, etc.

3
Scheduling for Global Computing System
An effective scheduling is required to achieve
high-performance global computing!

Scheduling under Dynamic, Hetero. Env.
computing server performance / load
network topology / bandwidth / congestion
multiple users at multiple sites
Software Systems for Scheduling
AppLes, Netsolve agent, Nimrod, Ninf Metaserver,
Prophet, etc.

4
Framework to evaluate scheduling algorithm

Benchmarking on Real Systems
practical measurement
difficult to perform large-scale experiments
a small number of replications
partial solution

No Effective Framework to evaluate the
performance of scheduling in global computing
systems!
5
Performance Evaluation Model

Objective
modeling various global computing systems
large-scale simulation
reproducibility
Contents
overview of the model
verification of the model
evaluation of scheduling algorithm on the model

6
General Arch. of Global Computing System

Clients
Computing Servers
Scheduling System
Schedulers (e.g. AppLes, Prophet)
perform scheduling according to system / user
policy
Directory Service (e.g. Globus-MDS)
central database of resource information
Monitors/Predictors (e.g. NWS)
monitor and predict server and network status

7
Canonical Model of Task Execution
(1) query about suitable server (2) assign
suitable server (3) request execution (4) return
computed result
Scheduling System
Directory Service
(0) monitor server and network status
periodically
Monitor
Scheduler
Site1
(0)
(1)
(2)
Client A
Server A
(3)
Site2
Server B
Client B
WAN
Server C
Client C
(4)
8
Requirement for the Model

Modeling
various topology
clients, servers, networks
server
performance, load (congestion), variance over
time
network
bandwidth, throughput (congestion), variance
over time
Performing
large-scale simulation
reproducible simulation

9
Proposed Performance Evaluation Model
Queueing Network

Global Computing System
Qs computational servers
Qns network from the client to the server
Qnr network from the server to the client
Congestion on Servers and Networks
other tasks
tasks which are invoked from other processes
and enter Qs
other data
data which are transmitted from other
processes and enter Qns or Qnr

10
Example of Proposed Model
Site 1
Site 1
Server A
Qns1
Qnr1
Client A
Client A
Qs1
Qns2
Qnr2
Site 2
Site 2
Server B
Client B
Client B
Qns3
Qnr3
Qs2
Qns4
Qnr4
Server C
Client C
Client C
Qs3
11
Client

Task Invoked by a Client
data transmitted to the server (Dsend)
computation of the task
data transmitted from the server, or computed
result (Drecv)
Procedure to Invoke Tasks
query scheduler about a suitable server
The scheduler assigns a server.
decompose Dsend into logical packets and transmit
these packets to Qns connected to the assigned
server
The server completes the execution of the task.
receive Drecv from Qnr

12
Parameters for the Client

Packet Transmission Rate
?packet Tnet / Wpacket
Tnet bandwidth of the network between
the client and the server
Wpacket logical packet size

13
Queue as a Network (Qns)
other data

Qns
Qs
client
single server queue with finite buffer FCFS

Procedure
A packet transmitted from the client enters Qns.
A packet is retransmitted when buffer is full.
A packet in Qns is processed for Wpacket / Tnet
time.
A packet of the clients task leaves for Qs.
Arrival rate of other data indicates congestion
of the network.

14
Parameters for Qns

Arrival Rate of Other Data
determine network throughput
Arrival is currently assumed to be Poisson.
?ns_others (Tnet / Tact - 1) x ?packet
Tact ave. actual throughput of the
network to be simulated
Buffer Size of Queue
determine network latency
N Tlatency x Tnet / Wpacket
Tlatency ave. actual latency of the network
to be simulated

15
Example

Simulated Condition
bandwidth Tnet 1.0 MB/s
ave. actual throughput Tact 0.1 MB/s
latency Tlatency 0.1 sec.
logical packet size Wpacket 0.01 MB
Arrival Rate of Other Data and Latency
?packet Tnet / Wpacket 1.0 / 0.01 100
?ns_others (Tnet / Tact - 1) x ?packet
(1.0 / 0.1 - 1) x 100 900
N Tlatency x Tnet / Wpacket 0.1 x 1.0 / 0.01
10

16
Queue as a Server (Qs)
other tasks
Qs

Qnr

Qns
single server queue FCFS or other strategies

Procedure
The computation of the clients task enters Qs
after all associated data arrive at Qs.
A queued task waits for its turn and is processed
for Wc / Tser time. (Tser server
performance, Wc ave. comput. size)
Data of computed result are decomposed into
logical packets and these packets are transmitted
to Qnr.
Arrival rate of other tasks indicates congestion
on the server.

17
Parameters for Qs

Arrival Rate of Other tasks
determine server utilization
Arrival is currently assumed to be Poisson.
?s_others Tser / Ws_others x U
Tser performance of the server
Ws_others ave. computation size
of other tasks
U ave. actual utilization on the server
to be simulated
Packet Transmission Rate
?packet Tnet / Wpacket

18
Example

Simulated Condition
server performance Tser 100 MFlops
ave. actual utilization U 10
ave. computation size Ws_others 0.1 MFlops
Arrival Rate of Other Tasks
?s_others Tser / Ws_others x U
100 / 0.1 x 0.1
100

19
Queue as a Network (Qnr)
other data

Qnr
Qs
single server queue with finite buffer FCFS

Procedure
A packet transmitted from Qns enters Qnr.
A packet is retransmitted when buffer is full.
A packet in Qnr is processed for Wpacket / Tnet
time.
A packet transmitted from Qns leaves for the
client.
Arrival rate of other data indicates congestion
of the network.

20
Verification of the Proposed Model

Comparison
results in simulation on the proposed model
results in experiments on the actual global
computing system, Ninf system

21
Ninf System
Other System
Ninf DB Server
Meta Server
Internet
Meta Server
Meta Server
Ninf Computational Server
Ninf RPC
Program
22
Simulation Parameter (1)

Client
invoking tasks repeatedly
Linpack (problem size 600, 1000, 1400)
(comput. O(2/3n3 2n2), comm. 8n2 20n
O(1))
invocation rate of Ninf_call at the client
?request 1 / (worst response time interval )
packet size 10, 50, 100 KB

23
Simulation Parameter (2)

Network
bandwidth 1.5MB/s
other data
ave. packet size 10, 50, 100KB (Exp. Dist.)
Poisson Arrival
Server
CPU performance 500MFlops
ave. actual utilization 4
other tasks
ave. computation size 10MFlops (Exp. Dist.)
Poisson Arrival

24
Performance of a Clients Tasks
client WS in Ochanomizu Univ., server J90 in
ETL

The performance of clients tasks in the
simulation closely matches experimental results.
Effect of different packet sizes is almost
negligible.
Simulation cost could be reduced.

25
Performance of Clients Tasks
clients WS in U-Tokyo, NITech and TITech,
server J90 in ETL

The performance of tasks invoked by multiclients
in the simulation closely matches experimental
results.
Effect of different packet sizes is almost
negligible.
Simulation cost could be reduced.

26
Evaluation of Scheduling Algorithm

Evaluation
Evaluation of basic scheduling algorithm on
imaginary environment in the simulation on the
proposed model
Scheduling Algorithm
RR round robin
LOAD server load
min. (L 1) / P (L ave. load, P server
performance)
LOTH server load network congestion
min. Compt. / (P / (L 1)) Comm. / Tnet

27
Imaginary Environment
400Mops
100Mops
Server A
Server B
50KB/s
200KB/s
Client 1
Client 2
Client 3
Client 4
28
Simulation Parameter (1)

Client
invoking tasks repeatedly
Linpack (problem size 600)
(comput. O(2/3n3 2n2), comm. 8n2 20n
O(1))
EP (problem size 221)
(comput. number of random number, comm.
O(1))
invocation rate of Ninf_call at the client
?request 1 / (worst response time interval)
interval Linpack 5sec., EP
20sec.
Poisson Arrival
packet size 100 KB

29
Simulation Parameter (2)

Network
bandwidth 1.5MB/s
other data
ave. packet size 100KB (Exp. Dist.)
Poisson Arrival
Server
ave. actual utilization 10
other tasks
ave. computation size 10Mops (Exp. Dist.)
Poisson Arrival

30
Scheduling Performance

RR
performs worst
LOAD
performs well for EP
causes network congestion and degrades the
performance for Linpack
LOTH
performs best

31
Conclusions

Proposal
performance evaluation model for scheduling in
global computing systems
Verification of the Model
The proposed model could effectively simulate the
performance of clients tasks in simple setup of
the actual global computing system, Ninf system.
Evaluation on the Model
Dynamic information of both servers and networks
should be employed for scheduling.

32
Future Work