Title: Pipeline and Batch Sharing in Grid Workloads
1Pipeline and Batch Sharing in Grid Workloads
- Douglas Thain, John Bent, Andrea C.
Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, and
Miron Livny - 2003
- Reviewed by Zhicheng Qiu
2Overview
- What they want to do?
- Characterizes workloads composed of pipelines of
sequential processes - How they doing?
- measurements of the memory, CPU, and I/O
requirements of individual components - analyses of I/O sharing within complete batches.
- It has no relation to the scheduling
3Pipeline Batch Sharing
How the Pipeline and Batch Sharing work?
4Workloads categories
- Endpoint,
- which represents the input and final output,
- Pipeline-shared,
- which is shared in a write-then-read fashion
within a single pipeline, - Batch-shared,
- which is comprised of input I/O shared across
pipelines.
5Application pipelines
Biology
Earth sys.
Physics
Physics
Chemistry
Astrophysics
6I/O amounts of each type I/O
Appl. Endpoint I/O (MB) Endpoint I/O (MB) Pipeline I/O (MB) Pipeline I/O (MB) Batch I/O (MB) Batch I/O (MB)
Appl. Files Traffic Files Traffic Files Traffic
SETI 2 0.34 12 75.43 0 0
BLASTP 2 0.12 0 0 9 329.99
IBIS 20 179.92 99 148.27 17 7.89
CMS 6 63.56 2 12.99 9 3729.67
HF 3 1.96 7 4654.34 1 0
NAUTILUS 124 14.06 369 785.37 8 3.24
AMANDA 6 5.22 11 264.31 29 508.52
It is not mentioned that how much time is used
for I/O.
7Amdahls ratios
App CPU/IO (MIPS/MBPS) MEM/CPU (MB/MIPS) CPU/IO (instr/op)
SETI 45888 0.15 8737 K
BLASTP 37 26.77 144 K
IBIS 34530 0.20 109823 K
CMS 190 2.09 396 K
HF 74 0.16 353 K
NAUTILUS 2287 1.20 8238 K
AMANDA 785 3.77 551 K
I/O traffic is heavy for some applications.
8Scalability is limited by the I/O bandwidth
Storage center
Commodity disk
Why the bandwidth milestones is the storage
device other than the network bandwidth or the
RAM access bandwidth?
9Conclusions
- Shared I/O is the dominant component of all I/O
traffic, and it causes the serious scalability
problem for applications. - Efforts have to be made to eliminate shared I/O
- New file system are required to minimize the I/O
10How to optimize the I/O traffic?
- Minimize the I/O traffic
- Application design (Sequential I/O vs. random
I/O, Replication) - Speed up endpoint I/O
- File system (Caching, Database, Ramdisk)
11Another question
- Why the CPU and the Memory are not the bottleneck
for the system scalability.