Flexible%20and%20Efficient%20%20I/O%20Optimization%20%20for%20Parallel%20Applications PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Flexible%20and%20Efficient%20%20I/O%20Optimization%20%20for%20Parallel%20Applications


1
(No Transcript)
2
Hiding Periodic I/O Costsin Parallel Applications
  • Xiaosong Ma
  • Department of Computer Science
  • University of Illinois at Urbana-Champaign
  • Spring 2003

3
Roadmap
  • Introduction
  • Active buffering hiding recurrent output cost
  • Ongoing work hiding recurrent input cost
  • Conclusions

4
Introduction
  • Fast-growing technology propels high performance
    applications
  • Scientific computation
  • Parallel data mining
  • Web data processing
  • Games, movie graphics
  • Individual components growth un-coordinated
  • Manual performance tuning needed

5
We Need Adaptive Optimization
  • Flexible and automatic performance optimization
    desired
  • Efficient high-level buffering and prefetching
    for parallel I/O in scientific simulations

6
Scientific Simulations
  • Important
  • Detail and flexibility
  • Save money and lives
  • Challenging
  • Multi-disciplinary
  • High performance crucial

7
Parallel I/O in Scientific Simulations
  • Write-intensive
  • Collective and periodic
  • Poor stepchild
  • Bottleneck-prone
  • Existing collective I/O focused on data transfer


Computation
I/O
Computation
I/O
Computation
I/O
Computation

8
My Contributions
  • Idea I/O optimizations in larger scope
  • Parallelism between I/O and other tasks
  • Individual simulations I/O need
  • I/O related self-configuration
  • Approach hide the I/O cost
  • Results
  • Publications, technology transfer, software

9
Roadmap
  • Introduction
  • Active buffering hiding recurrent output cost
  • Ongoing work hiding recurrent input cost
  • Conclusions

10
Latency Hierarchy on Parallel Platforms
local memory access
inter-processor communication
disk I/O
wide-area transfer
  • Along path of data transfer
  • Smaller throughput
  • Lower parallelism and less scalable

11
Basic Idea of Active Buffering
  • Purpose maximize overlap between computation and
    I/O
  • Approach buffer data as early as possible

12
Challenges
  • Accommodate multiple I/O architectures
  • No assumption on buffer space
  • Adaptive
  • Buffer availability
  • User request patterns

13
Roadmap
  • Introduction
  • Active buffering hiding recurrent output cost
  • With client-server I/O architecture IPDPS 02
  • With server-less architecture
  • Ongoing work hiding recurrent input cost
  • Related work and future work
  • Conclusions

14
Client-Server I/O Architecture
compute processors
I/O servers
15
Client State Machine
16
Server State Machine
data to receive
enough buffer space
prepare
receive a block
recv.
write
out of buffer space
request
write a block
fetch
idle- listen
init.
recv.
recv.
alloc. buffers
got write
idle, no
exit
req.
all data
message
data to
received
fetch
no
fetch a block
data to
write done
data
no request
write
idle
fetch write all
busy- listen
to fetch
write done
exit
msg.
17
Maximize Apparent Throughput
  • Ideal apparent throughput per server

  • Dtotal
  • Tideal Dc-buffered Dc-overflow
    Ds-overflow
  • Tmem-copy
    TMsg-passing Twrite
  • More expensive data transfer only becomes visible
    when overflow happens
  • Efficiently masks the difference in write speeds



18
Write Throughput without Overflow
  • Panda Parallel I/O library
  • SGI Origin 2000, SHMEM
  • Per client 16MB output data per snapshot, 64MB
    buffer
  • Two servers, each with 256MB buffer

19
Write Throughput with Overflow
  • Panda Parallel I/O library
  • SGI Origin 2000, SHMEM, MPI
  • Per client 96MB output data per snapshot, 64MB
    buffer
  • Two servers, each with 256MB buffer

20
Give Feedback to Application
  • Softer I/O requirements
  • Parallel I/O libraries have been passive
  • Active buffering allows I/O libraries to take
    more active role
  • Find optimal output frequency automatically

21
Server-side Active Buffering
data to receive
enough buffer space
prepare
receive a block
recv.
write
out of buffer space
request
write a block
fetch
idle- listen
init.
recv.
recv.
alloc. buffers
got write
idle, no
exit
req.
all data
message
data to
received
fetch
no
fetch a block
data to
write done
data
no request
write
idle
fetch write all
busy- listen
to fetch
write done
exit
msg.
22
Performance with Real Applications
  • Application overview GENX
  • Large-scale, multi-component, detailed rocket
    simulation
  • Developed at Center for Simulation of Advanced
    Rockets (CSAR), UIUC
  • Multi-disciplinary, complex, and evolving
  • Providing parallel I/O support for GENX
  • Identification of parallel I/O requirements
    PDSECA 03
  • Motivation and test case for active buffering

23
Overall Performance of GEN1
  • SDSC IBM SP (Blue Horizon)
  • 64 clients, 2 I/O servers with AB
  • 160MB output data per snapshot (in HDF4)

24
Aggregate Write Throughput in GEN2
  • LLNL IBM SP (ASCI Frost)
  • 1 I/O server per 16-way SMP node
  • Write in HDF4

25
Scientific Data Migration
  • Output data need to be moved
  • Online migration
  • Extend active buffering to migration
  • Local storage becomes another layer in buffer
    hierarchy


Computation
I/O
Computation
I/O
Computation
I/O
Computation
26
I/O Architecture with Data Migration
compute processors
27
Active Buffering for Data Migration
  • Avoid unnecessary local I/O
  • Hybrid migration approach

memory-to-memory transfer
disk staging
  • Combined with data compression ICS 02
  • Self-configuration for online visualization

28
Roadmap
  • Introduction
  • Active buffering hiding recurrent output cost
  • With client-server I/O architecture
  • With server-less architecture IPDPS 03
  • Ongoing work hiding recurrent input cost
  • Conclusions

29
Server-less I/O Architecture
I/O thread
compute processors
30
Making ABT Transparent and Portable
  • Unchanged interfaces
  • High-level and file-system independent
  • Design and evaluation IPDPS 03
  • Ongoing transfer to ROMIO

31
Active Buffering vs. Asynchronous I/O
32
Roadmap
  • Introduction
  • Active buffering hiding recurrent output cost
  • Ongoing work hiding recurrent input cost
  • Conclusions

33
I/O in Visualization
  • Periodic reads
  • Dual modes of operation
  • Interactive
  • Batch-mode
  • Harder to overlap reads with computation


Computation
I/O
Computation
I/O
Computation
I/O
Computation
34
Efficient I/O Through Data Management
  • In-memory database of datasets
  • Manage buffers or values
  • Hub for I/O optimization
  • Prefetching for batch mode
  • Caching for interactive mode
  • User-supplied read routine

35
Related Work
  • Overlapping I/O with computation
  • Replacing synchronous calls with async calls
    Agrawal et al. ICS 96
  • Threads Dickens et al. IPPS 99, More et al.
    IPPS 97
  • Automatic performance optimization
  • Optimization with performance models Chen et al.
    TSE 00
  • Graybox optimization Arpaci-Dusseau et al. SOSP
    01

36
Roadmap
  • Introduction
  • Active buffering hiding recurrent output cost
  • Ongoing work hiding recurrent input cost
  • Conclusions

37
Conclusions
  • If we cant shrink it, hide it!
  • Performance optimization can be done
  • more actively
  • at higher-level
  • in larger scope
  • Make I/O part of data management

38
References
  • IPDPS 03 Xiaosong Ma, Marianne Winslett,
    Jonghyun Lee and Shengke Yu, Improving MPI-IO
    Output Performance with Active Buffering Plus
    Threads, 2003 International Parallel and
    Distributed Processing Symposium
  • PDSECA 03 Xiaosong Ma, Xiangmin Jiao, Michael
    Campbell and Marianne Winslett, Flexible and
    Efficient Parallel I/O for Large-Scale
    Multi-component Simulations, The 4th Workshop on
    Parallel and Distributed Scientific and
    Engineering Computing with Applications
  • ICS 02 Jonghyun Lee, Xiaosong Ma, Marianne
    Winslett and Shengke Yu, Active Buffering Plus
    Compressed Migration An Integrated Solution to
    Parallel Simulations' Data Transport Needs, the
    16th ACM International Conference on
    Supercomputing
  • IPDPS 02 Xiaosong Ma, Marianne Winslett,
    Jonghyun Lee and Shengke Yu, Faster Collective
    Output through Active Buffering, 2002
    International Parallel and Distributed Processing
    Symposium
Write a Comment
User Comments (0)
About PowerShow.com