Parallel IO problems that you are likely to encounter in CSAR - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Parallel IO problems that you are likely to encounter in CSAR

Description:

HPF-style Cyclic, Block distributions spread data evenly across processors ... High-performance switch-connected workstation cluster ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 39
Provided by: ying5
Category:

less

Transcript and Presenter's Notes

Title: Parallel IO problems that you are likely to encounter in CSAR


1
Parallel I/O problems that you are likely to
encounter in CSAR
  • drl.cs.uiuc.edu/pubs/pio.html
  • Marianne Winslett
  • Department of Computer Science

2
Outline
  • Why parallel I/O?
  • Application needs
  • Traditional common I/O solutions
  • Why is parallel I/O hard?
  • Parallel I/O functionality
  • Parallel I/O performance
  • MPI-IO for portable I/O support

3
Parallel I/O needs froma simulations point of
view
  • Why work on processors when I/O is where the
    action is?

  • - David Patterson
  • Large amounts of data to write (GB)
  • Not much reading unless you are out of core
  • I/O often 1/2 of run time (youd like 1 MB/MIP
    but you arent getting it)
  • Certain I/O operations, I/O access patterns, data
    types are typical

4
I/O quantities
  • Large data sets
  • gt 200 MB per output operation
  • Frequent data dumps checkpoint, timestep outputs
  • Large number of files
  • Larger amounts as the application size and
    machine size increase

5
I/O performance
  • Need fast data dump techniques
  • Tens or hundreds of snapshots of data per run
  • Need support for fast restart
  • Resume from scheduled/unscheduled system
    breakdowns
  • Need support for efficient I/O during data
    analysis and visualization
  • Higher bandwidth and shorter response time as the
    application size and machine size increase

6
Common I/O operations
  • Compulsory (initialization, mostly reads)
  • Checkpoint/restart
  • Timestep output data
  • Writing/reading temporary data generated in the
    computational phases
  • Out-of-core reads/writes
  • Data migration
  • Post-processing for data analysis and
    visualization

7
Collective I/O
  • Processors are relatively closely synchronized
  • Must exchange data with neighbors before resuming
    computation
  • All ready to output their parts of the data set
    at roughly the same time
  • Potential to cooperate with one another during
    I/O, to reduce total I/O time

8
Common parallel I/O patterns
  • Typical I/O access patterns revealed in a single
    run
  • Reading small amount of initial data for
    application computations
  • Mostly collective, sometimes non-collective
  • Generating large amount of intermediate
    computational results for post-processing
  • Mostly collective, may involve reorganization
  • Reading intermediate results
  • Mostly collective, may involve reorganization
  • Data analysis and visualization
  • Collective/non-collective

9
Common data types
  • 2D/3D dense/sparse multidimensional arrays
  • Each array element of fixed length
  • Fine/coarse grained data distributions
  • HPF-style Cyclic, Block distributions spread data
    evenly across processors
  • AMR-style distributions dont
  • Affects communication overhead and load balance
    during I/O

10
I/O interfaces on parallel platforms
  • Unix-like file system interfaces
  • Pleasantly familiar
  • Dont take parallel environment into account
  • Performance will be poor unless you have a
    library to help you
  • System-dependent interfaces
  • Offer many different I/O modes and options
  • May offer high performance
  • Your I/O code will not be portable

11
I/O interface problem
12
Case studies of current I/O systems
  • Examples
  • IBM SP
  • Intel Paragon
  • Origin 2000
  • Cray T3E
  • Workstation clusters

13
Workstation clusters
  • E.g., FDDI-connected HP workstation cluster with
    small number of nodes
  • I/O support on the HP workstation cluster
  • HP-UX file system
  • Fast access to local disk
  • No shared file system

14
Parallel I/O approaches for clusters
15
Parallel I/O approaches for clusters
...
Network
...
16
Multiple independent I/O nodes
  • If each node sends its data to its local disk
  • Fast
  • Many files
  • Non-canonical input/output format
  • Need tools to help with data loading, migration,
    postprocessing

17
Multiple independent I/O nodes
  • If only some nodes act as I/O nodes
  • Opens possibility of canonical output formats
    (e.g., rearrange data during I/O, concatenate
    resulting files)
  • But if multiple compute nodes send I/O requests
    to the same I/O node at the same time, you will
    probably not get anything near peak disk
    performance, because seeks are expensive
  • And the interconnect may be your bottleneck
  • Conclusion can be fast, but need a library to
    help you

18
IBM SP system architecture
  • High-performance switch-connected workstation
    cluster
  • Each node is a RS6000 workstation that can have
    local disk
  • I/O support on IBM SP
  • PIOFS parallel file system
  • 2D file layouts, multiple access modes
  • Very slow (1/2 of disk throughput), unreliable
  • AIX JFS on each nodes local disk
  • Fast, but no shared file system

19
Parallel I/O on the SP
  • If you dont use PIOFS, your options are the same
    as with the HP cluster
  • And you probably dont want to use PIOFS

20
Origin 2000 system architecture
  • A distributed shared memory and I/O system
  • Each node consists of 2 CPUs, caches, memory and
    directory
  • Interconnection of nodes binary n-cube
  • I/O support on 16-node Origin 2000 at NCSA
  • 15 MB/sec sustained throughput for writes
  • 2 SCSI-2 RAID adapters striped via XFS file
    system volume manager, XLV (max 40 MB/s)
  • 8 9-GB 7200 RPM disks per RAID

21
Your I/O options on the NCSA O2000
  • You dont really have any. I/O will be a
    bottleneck for you if you save a lot of data
  • Shared file systems
  • Make it simple to create output in canonical
    format
  • But will be very slow if each processor seeks to
    the right spot and does a small write
  • Usually show little speedup if you add a library
    that supports multiple I/O nodes writing to the
    file system no true parallelism with the shared
    file system. It doesnt have to be this way, but
    it is
  • Expose you to the I/O costs of other applications

22
Intel Paragon system architecture
  • Distributed memory platform
  • I/O support on Intel Paragon
  • Certain nodes are dedicated to I/O
  • PFS parallel file system at Caltech Paragon
  • Can sustain 84 MB/sec, 512 compute nodes
  • Multiple access modes (M_SYNC, M_UNIX, etc.)
    intended to provide high performance for
    different situations, but mainly add complexity
    for users
  • Data automatically striped across 92 I/O nodes

23
Cray T3E
  • Distributed memory machine
  • Separate compute and (system-controlled) I/O
    nodes
  • I/O support on Cray T3E at PSC
  • Shared Cray Unix file system
  • Separate channels for I/O and communication
  • File system can support over 40 MB/sec sustained
    with a single requester
  • The larger the write request, the better
  • Additional requesters increases throughput
    slightly

24
Your options on the Cray T3E
  • Very hard to reach near-peak throughput
  • Performance very sensitive to activities of other
    applications
  • Small write requests are still a bad idea
  • Still, if you dont have an I/O library to help
    you, you will probably get faster I/O here than
    on other platforms (e.g., 20 MB/s with large
    write requests)

25
Summary of what you will find
  • Simulations are often I/O intensive
  • Poor parallel I/O system support out there
  • Need parallel I/O solutions!

26
Understanding the parallel I/O problems
  • Poor I/O performance
  • Non-portable I/O code
  • Complex parallel I/O systems
  • Complex and changing I/O access patterns

27
Causes of the parallel I/O problems
  • Poor I/O performance
  • Unsuitable I/O interfaces
  • Cannot capture application I/O semantics, e.g.,
    I want to write this distributed array
  • Conceptually simple I/O operations are
    transformed into inefficient and complex low
    level I/O requests, with many seeks, buffering
    errors, and partial writes of disk blocks
  • Full I/O parallelism lacking

28
Full I/O parallelism
  • I/O approach must scale up as number of
    processors increases
  • Shared file systems can become centralized
    bottlenecks
  • Each I/O node should be writing at top speed
    special support for collective I/O
  • Requires careful load balancing
  • Communications costs must also be balanced

29
Causes of the parallel I/O problems
  • Non-portable I/O codes
  • System dependent interfaces
  • The MPI-IO Standard
  • Portable interfaces
  • High-performance implementations on different
    platforms

30
MPI-IO
  • I/O interface of the MPI-2 standard
  • Goals
  • Application portability
  • I/O performance
  • File interoperability
  • Support common I/O access patterns
  • Support different storage device hierarchies

31
Parallel I/O interface
Application
Application
Application
Portable parallel I/O interface (MPI-IO)
32
MPI-IO implementations
  • ROMIO from ANL
  • www.mcs.anl.gov/mpi/mpi2
  • MPI-IO from LLNL
  • www.llnl.gov/people/trj/goddard
  • MPI-IO from IBM
  • www.research.ibm.com/p/prost/sections/mpiio.html
  • MPI-IO from NASA Ames
  • parallel.nas.nasa.gov/MPI-IO/pmpio/pmpio.html

33
Causes of the parallel I/O problems
  • Complex parallel I/O resources
  • Many interdependent system modules
  • Processors, memory, disks, tapes, interconnects,
  • Many options to consider simultaneously
  • File layouts, access modes
  • Many performance factors
  • Disk/file system utilization
  • Communication system utilization
  • Load-balancing
  • Parallelism

34
Causes of the parallel I/O problems
  • Complex and changing I/O access patterns
  • Mixture of reads/writes
  • Mixture of fine-grained/coarse-grained data
    distributions
  • Trouble with balancing
  • checkpoint/restart operations
  • timestep output/data analysis and visualization
    operations

35
Our observations
  • Parallel I/O systems need to provide
  • Ease-of-use
  • Simple I/O interfaces
  • Automatic parallelism for I/O and data migration
  • Application portability
  • High-performance I/O strategies for a wide range
    of system conditions automatic I/O strategy
    selection without human intervention

36
Parallel I/O strategies
Application
Application
Application
Automatic I/O strategy selection
37
References
  • Parallel I/O archive
  • http//www.cs.dartmouth.edu/pario/bib
  • Panda
  • http//www.drl.cs.uiuc.edu/panda

38
The state-of-the-art parallel I/O system
Rocket Simulation
Parallel I/O servers
Timestep Checkpoint
Network
Parallel I/O interface
Parallel I/O clients
Secondary storage
Tertiary storage
Write a Comment
User Comments (0)
About PowerShow.com