Flexible%20and%20Efficient%20%20I/O%20Optimization%20%20for%20Parallel%20Applications presentation

About This Presentation

Transcript and Presenter's Notes

Title: Flexible%20and%20Efficient%20%20I/O%20Optimization%20%20for%20Parallel%20Applications

1
(No Transcript)
2
Hiding Periodic I/O Costsin Parallel Applications

Xiaosong Ma
Department of Computer Science
University of Illinois at Urbana-Champaign
Spring 2003

3
Roadmap

Introduction
Active buffering hiding recurrent output cost
Ongoing work hiding recurrent input cost
Conclusions

4
Introduction

Fast-growing technology propels high performance
applications
Scientific computation
Parallel data mining
Web data processing
Games, movie graphics
Individual components growth un-coordinated
Manual performance tuning needed

5
We Need Adaptive Optimization

Flexible and automatic performance optimization
desired
Efficient high-level buffering and prefetching
for parallel I/O in scientific simulations

6
Scientific Simulations

Important
Detail and flexibility
Save money and lives

Challenging
Multi-disciplinary
High performance crucial

7
Parallel I/O in Scientific Simulations

Write-intensive
Collective and periodic
Poor stepchild
Bottleneck-prone
Existing collective I/O focused on data transfer

Computation
I/O
Computation
I/O
Computation
I/O
Computation

8
My Contributions

Idea I/O optimizations in larger scope
Parallelism between I/O and other tasks
Individual simulations I/O need
I/O related self-configuration
Approach hide the I/O cost
Results
Publications, technology transfer, software

9
Roadmap

Introduction
Active buffering hiding recurrent output cost
Ongoing work hiding recurrent input cost
Conclusions

10
Latency Hierarchy on Parallel Platforms
local memory access
inter-processor communication
disk I/O
wide-area transfer

Along path of data transfer
Smaller throughput
Lower parallelism and less scalable

11
Basic Idea of Active Buffering

Purpose maximize overlap between computation and
I/O
Approach buffer data as early as possible

12
Challenges

Accommodate multiple I/O architectures
No assumption on buffer space
Adaptive
Buffer availability
User request patterns

13
Roadmap

Introduction
Active buffering hiding recurrent output cost
With client-server I/O architecture IPDPS 02
With server-less architecture
Ongoing work hiding recurrent input cost
Related work and future work
Conclusions

14
Client-Server I/O Architecture
compute processors
I/O servers
15
Client State Machine
16
Server State Machine
data to receive
enough buffer space
prepare
receive a block
recv.
write
out of buffer space
request
write a block
fetch
idle- listen
init.
recv.
recv.
alloc. buffers
got write
idle, no
exit
req.
all data
message
data to
received
fetch
no
fetch a block
data to
write done
data
no request
write
idle
fetch write all
busy- listen
to fetch
write done
exit
msg.
17
Maximize Apparent Throughput

Ideal apparent throughput per server
Dtotal
Tideal Dc-buffered Dc-overflow
Ds-overflow
Tmem-copy
TMsg-passing Twrite
More expensive data transfer only becomes visible
when overflow happens
Efficiently masks the difference in write speeds

18
Write Throughput without Overflow

Panda Parallel I/O library
SGI Origin 2000, SHMEM
Per client 16MB output data per snapshot, 64MB
buffer
Two servers, each with 256MB buffer

19
Write Throughput with Overflow

Panda Parallel I/O library
SGI Origin 2000, SHMEM, MPI
Per client 96MB output data per snapshot, 64MB
buffer
Two servers, each with 256MB buffer

20
Give Feedback to Application

Softer I/O requirements
Parallel I/O libraries have been passive
Active buffering allows I/O libraries to take
more active role
Find optimal output frequency automatically

21
Server-side Active Buffering
data to receive
enough buffer space
prepare
receive a block
recv.
write
out of buffer space
request
write a block
fetch
idle- listen
init.
recv.
recv.
alloc. buffers
got write
idle, no
exit
req.
all data
message
data to
received
fetch
no
fetch a block
data to
write done
data
no request
write
idle
fetch write all
busy- listen
to fetch
write done
exit
msg.
22
Performance with Real Applications

Application overview GENX
Large-scale, multi-component, detailed rocket
simulation
Developed at Center for Simulation of Advanced
Rockets (CSAR), UIUC
Multi-disciplinary, complex, and evolving
Providing parallel I/O support for GENX
Identification of parallel I/O requirements
PDSECA 03
Motivation and test case for active buffering

23
Overall Performance of GEN1

SDSC IBM SP (Blue Horizon)
64 clients, 2 I/O servers with AB
160MB output data per snapshot (in HDF4)

24
Aggregate Write Throughput in GEN2

LLNL IBM SP (ASCI Frost)
1 I/O server per 16-way SMP node
Write in HDF4

25
Scientific Data Migration

Output data need to be moved
Online migration
Extend active buffering to migration
Local storage becomes another layer in buffer
hierarchy

Computation
I/O
Computation
I/O
Computation
I/O
Computation
26
I/O Architecture with Data Migration
compute processors
27
Active Buffering for Data Migration

Avoid unnecessary local I/O
Hybrid migration approach

memory-to-memory transfer
disk staging

Combined with data compression ICS 02
Self-configuration for online visualization

28
Roadmap

Introduction
Active buffering hiding recurrent output cost
With client-server I/O architecture
With server-less architecture IPDPS 03
Ongoing work hiding recurrent input cost
Conclusions

29
Server-less I/O Architecture
I/O thread
compute processors
30
Making ABT Transparent and Portable

Unchanged interfaces
High-level and file-system independent
Design and evaluation IPDPS 03
Ongoing transfer to ROMIO

31
Active Buffering vs. Asynchronous I/O
32
Roadmap

Introduction
Active buffering hiding recurrent output cost
Ongoing work hiding recurrent input cost
Conclusions

33
I/O in Visualization

Periodic reads
Dual modes of operation
Interactive
Batch-mode
Harder to overlap reads with computation

Computation
I/O
Computation
I/O
Computation
I/O
Computation
34
Efficient I/O Through Data Management

In-memory database of datasets
Manage buffers or values
Hub for I/O optimization
Prefetching for batch mode
Caching for interactive mode
User-supplied read routine

35
Related Work

Overlapping I/O with computation
Replacing synchronous calls with async calls
Agrawal et al. ICS 96
Threads Dickens et al. IPPS 99, More et al.
IPPS 97
Automatic performance optimization
Optimization with performance models Chen et al.
TSE 00
Graybox optimization Arpaci-Dusseau et al. SOSP
01

36
Roadmap

Introduction
Active buffering hiding recurrent output cost
Ongoing work hiding recurrent input cost
Conclusions

37
Conclusions

If we cant shrink it, hide it!
Performance optimization can be done
more actively
at higher-level
in larger scope
Make I/O part of data management

38
References

IPDPS 03 Xiaosong Ma, Marianne Winslett,
Jonghyun Lee and Shengke Yu, Improving MPI-IO
Output Performance with Active Buffering Plus
Threads, 2003 International Parallel and
Distributed Processing Symposium
PDSECA 03 Xiaosong Ma, Xiangmin Jiao, Michael
Campbell and Marianne Winslett, Flexible and
Efficient Parallel I/O for Large-Scale
Multi-component Simulations, The 4th Workshop on
Parallel and Distributed Scientific and
Engineering Computing with Applications
ICS 02 Jonghyun Lee, Xiaosong Ma, Marianne
Winslett and Shengke Yu, Active Buffering Plus
Compressed Migration An Integrated Solution to
Parallel Simulations' Data Transport Needs, the
16th ACM International Conference on
Supercomputing
IPDPS 02 Xiaosong Ma, Marianne Winslett,
Jonghyun Lee and Shengke Yu, Faster Collective
Output through Active Buffering, 2002
International Parallel and Distributed Processing
Symposium

Write a Comment

User Comments (0)

About PowerShow.com

Flexible%20and%20Efficient%20%20I/O%20Optimization%20%20for%20Parallel%20Applications PowerPoint PPT Presentation