Title: First NASA Workshop on Performance-Engineered Information Systems
1First NASA Workshop on Performance-Engineered
Information Systems
- A Performance Evaluation Model for Scheduling in
Global Computing Systems
Kento Aida (TIT) Atsuko Takefusa (Ochanomizu
Univ.) Hidemoto Nakada (ETL) Satoshi Matsuoka
(TIT) Satoshi Sekiguchi (ETL) Umpei Nagashima
(National Institute of Materials and Chemical
Research)
http//ninf.etl.go.jp/
2Global Computing System
- Proposed Global Computing Systems
- Globus, Netsolve, Ninf, Legion, RCS, etc.
3Scheduling for Global Computing System
An effective scheduling is required to achieve
high-performance global computing!
- Scheduling under Dynamic, Hetero. Env.
- computing server performance / load
- network topology / bandwidth / congestion
- multiple users at multiple sites
- Software Systems for Scheduling
- AppLes, Netsolve agent, Nimrod, Ninf Metaserver,
Prophet, etc.
4Framework to evaluate scheduling algorithm
- Benchmarking on Real Systems
- practical measurement
- difficult to perform large-scale experiments
- a small number of replications
- partial solution
No Effective Framework to evaluate the
performance of scheduling in global computing
systems!
5Performance Evaluation Model
- Objective
- modeling various global computing systems
- large-scale simulation
- reproducibility
- Contents
- overview of the model
- verification of the model
- evaluation of scheduling algorithm on the model
6General Arch. of Global Computing System
- Clients
- Computing Servers
- Scheduling System
- Schedulers (e.g. AppLes, Prophet)
- perform scheduling according to system / user
policy - Directory Service (e.g. Globus-MDS)
- central database of resource information
- Monitors/Predictors (e.g. NWS)
- monitor and predict server and network status
7Canonical Model of Task Execution
(1) query about suitable server (2) assign
suitable server (3) request execution (4) return
computed result
Scheduling System
Directory Service
(0) monitor server and network status
periodically
Monitor
Scheduler
Site1
(0)
(1)
(2)
Client A
Server A
(3)
Site2
Server B
Client B
WAN
Server C
Client C
(4)
8Requirement for the Model
- Modeling
- various topology
- clients, servers, networks
- server
- performance, load (congestion), variance over
time - network
- bandwidth, throughput (congestion), variance
over time - Performing
- large-scale simulation
- reproducible simulation
9Proposed Performance Evaluation Model
Queueing Network
- Global Computing System
- Qs computational servers
- Qns network from the client to the server
- Qnr network from the server to the client
- Congestion on Servers and Networks
- other tasks
- tasks which are invoked from other processes
and enter Qs - other data
- data which are transmitted from other
processes and enter Qns or Qnr
10Example of Proposed Model
Site 1
Site 1
Server A
Qns1
Qnr1
Client A
Client A
Qs1
Qns2
Qnr2
Site 2
Site 2
Server B
Client B
Client B
Qns3
Qnr3
Qs2
Qns4
Qnr4
Server C
Client C
Client C
Qs3
11Client
- Task Invoked by a Client
- data transmitted to the server (Dsend)
- computation of the task
- data transmitted from the server, or computed
result (Drecv) - Procedure to Invoke Tasks
- query scheduler about a suitable server
- The scheduler assigns a server.
- decompose Dsend into logical packets and transmit
these packets to Qns connected to the assigned
server - The server completes the execution of the task.
- receive Drecv from Qnr
12Parameters for the Client
- Packet Transmission Rate
- ?packet Tnet / Wpacket
- Tnet bandwidth of the network between
- the client and the server
- Wpacket logical packet size
13Queue as a Network (Qns)
other data
Qns
Qs
client
single server queue with finite buffer FCFS
- Procedure
- A packet transmitted from the client enters Qns.
- A packet is retransmitted when buffer is full.
- A packet in Qns is processed for Wpacket / Tnet
time. - A packet of the clients task leaves for Qs.
- Arrival rate of other data indicates congestion
of the network.
14Parameters for Qns
- Arrival Rate of Other Data
- determine network throughput
- Arrival is currently assumed to be Poisson.
- ?ns_others (Tnet / Tact - 1) x ?packet
- Tact ave. actual throughput of the
network to be simulated -
- Buffer Size of Queue
- determine network latency
- N Tlatency x Tnet / Wpacket
- Tlatency ave. actual latency of the network
to be simulated
15Example
- Simulated Condition
- bandwidth Tnet 1.0 MB/s
- ave. actual throughput Tact 0.1 MB/s
- latency Tlatency 0.1 sec.
- logical packet size Wpacket 0.01 MB
- Arrival Rate of Other Data and Latency
- ?packet Tnet / Wpacket 1.0 / 0.01 100
- ?ns_others (Tnet / Tact - 1) x ?packet
- (1.0 / 0.1 - 1) x 100 900
- N Tlatency x Tnet / Wpacket 0.1 x 1.0 / 0.01
10
16Queue as a Server (Qs)
other tasks
Qs
Qnr
Qns
single server queue FCFS or other strategies
- Procedure
- The computation of the clients task enters Qs
after all associated data arrive at Qs. - A queued task waits for its turn and is processed
for Wc / Tser time. (Tser server
performance, Wc ave. comput. size) - Data of computed result are decomposed into
logical packets and these packets are transmitted
to Qnr. - Arrival rate of other tasks indicates congestion
on the server.
17Parameters for Qs
- Arrival Rate of Other tasks
- determine server utilization
- Arrival is currently assumed to be Poisson.
- ?s_others Tser / Ws_others x U
- Tser performance of the server
- Ws_others ave. computation size
of other tasks - U ave. actual utilization on the server
to be simulated - Packet Transmission Rate
- ?packet Tnet / Wpacket
18Example
- Simulated Condition
- server performance Tser 100 MFlops
- ave. actual utilization U 10
- ave. computation size Ws_others 0.1 MFlops
- Arrival Rate of Other Tasks
- ?s_others Tser / Ws_others x U
- 100 / 0.1 x 0.1
- 100
19Queue as a Network (Qnr)
other data
Qnr
Qs
single server queue with finite buffer FCFS
- Procedure
- A packet transmitted from Qns enters Qnr.
- A packet is retransmitted when buffer is full.
- A packet in Qnr is processed for Wpacket / Tnet
time. - A packet transmitted from Qns leaves for the
client. - Arrival rate of other data indicates congestion
of the network.
20Verification of the Proposed Model
- Comparison
- results in simulation on the proposed model
- results in experiments on the actual global
computing system, Ninf system
21Ninf System
Other System
Ninf DB Server
Meta Server
Internet
Meta Server
Meta Server
Ninf Computational Server
Ninf RPC
Program
22Simulation Parameter (1)
- Client
- invoking tasks repeatedly
- Linpack (problem size 600, 1000, 1400)
- (comput. O(2/3n3 2n2), comm. 8n2 20n
O(1)) - invocation rate of Ninf_call at the client
- ?request 1 / (worst response time interval )
- packet size 10, 50, 100 KB
23Simulation Parameter (2)
- Network
- bandwidth 1.5MB/s
- other data
- ave. packet size 10, 50, 100KB (Exp. Dist.)
- Poisson Arrival
- Server
- CPU performance 500MFlops
- ave. actual utilization 4
- other tasks
- ave. computation size 10MFlops (Exp. Dist.)
- Poisson Arrival
24Performance of a Clients Tasks
client WS in Ochanomizu Univ., server J90 in
ETL
- The performance of clients tasks in the
simulation closely matches experimental results. - Effect of different packet sizes is almost
negligible. - Simulation cost could be reduced.
25Performance of Clients Tasks
clients WS in U-Tokyo, NITech and TITech,
server J90 in ETL
- The performance of tasks invoked by multiclients
in the simulation closely matches experimental
results. - Effect of different packet sizes is almost
negligible. - Simulation cost could be reduced.
26Evaluation of Scheduling Algorithm
- Evaluation
- Evaluation of basic scheduling algorithm on
imaginary environment in the simulation on the
proposed model - Scheduling Algorithm
- RR round robin
- LOAD server load
- min. (L 1) / P (L ave. load, P server
performance) - LOTH server load network congestion
- min. Compt. / (P / (L 1)) Comm. / Tnet
27Imaginary Environment
400Mops
100Mops
Server A
Server B
50KB/s
200KB/s
Client 1
Client 2
Client 3
Client 4
28Simulation Parameter (1)
- Client
- invoking tasks repeatedly
- Linpack (problem size 600)
- (comput. O(2/3n3 2n2), comm. 8n2 20n
O(1)) - EP (problem size 221)
- (comput. number of random number, comm.
O(1)) - invocation rate of Ninf_call at the client
- ?request 1 / (worst response time interval)
- interval Linpack 5sec., EP
20sec. - Poisson Arrival
- packet size 100 KB
29Simulation Parameter (2)
- Network
- bandwidth 1.5MB/s
- other data
- ave. packet size 100KB (Exp. Dist.)
- Poisson Arrival
- Server
- ave. actual utilization 10
- other tasks
- ave. computation size 10Mops (Exp. Dist.)
- Poisson Arrival
30Scheduling Performance
- RR
- performs worst
- LOAD
- performs well for EP
- causes network congestion and degrades the
performance for Linpack - LOTH
- performs best
31Conclusions
- Proposal
- performance evaluation model for scheduling in
global computing systems - Verification of the Model
- The proposed model could effectively simulate the
performance of clients tasks in simple setup of
the actual global computing system, Ninf system. - Evaluation on the Model
- Dynamic information of both servers and networks
should be employed for scheduling.
32Future Work
- Modeling
- parallel task execution
- invocation of parallel tasks at the client
- Inter-server communication / synchronization
- co-allocation of parallel tasks
- arrival of other data / task
- Developing Scheduling Algorithm
- prediction of server load and network congestion