I/O Strategies for the T3E - PowerPoint PPT Presentation

About This Presentation
Title:

I/O Strategies for the T3E

Description:

I/O Strategies for the T3E Jonathan Carter NERSC User Services – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 39
Provided by: MJD51
Learn more at: https://www.nersc.gov
Category:

less

Transcript and Presenter's Notes

Title: I/O Strategies for the T3E


1
I/O Strategies for the T3E
  • Jonathan Carter
  • NERSC User Services

2
T3E Overview
  • T3E is a set of Processing Elements (PE)
    connected by a fast 3D torus.
  • PEs do not have local disk
  • All PEs access all filesystems equivalently
  • Path for I/O generally looks like
  • user buffer space
  • system buffer space
  • I/O device buffer space

3
Filesystems
  • /usr/tmp
  • fast
  • subject to 14 day purge, not backed up
  • check quota with quota -s /usr/tmp (usually 75Gb
    and 6000 inodes)
  • TMPDIR
  • fast
  • purged at end of job or session
  • shares quota with /usr/tmp
  • HOME
  • slower
  • permanent, backed up
  • check quota with quota (usually 2Gb and 3500
    inodes)

4
Types of I/O
  • Language I/O Fortran or C (ANSI or POSIX)
  • Cray FFIO library (can be used from Fortran or C)
  • MPI I/O
  • Cray extensions to Fortran and C I/O (mostly for
    compatibility with PVP systems)

5
I/O Strategies - Exclusive access files
  • Each PE reads and writes to a separate file
  • Language I/O
  • MPI I/O
  • Increase language I/O performance with FFIO
    library (C must use POSIX style calls)

6
I/O Strategies - Communication and I/O PE
  • One PE coordinates reading and writing and
    communicates data back and forth between other
    PEs via message passing
  • Language I/O
  • MPI I/O
  • Increase language I/O performance with FFIO
    library

7
I/O Strategies - Shared files
  • All PEs read and write the same file
    simultaneously
  • Language I/O with FFIO library global layer
  • MPI I/O
  • Language I/O with FFIO library global layer and
    Cray extensions for additional flexibility

8
Cray FFIO library
  • FFIO is a set of I/O layers tuned for different
    I/O characteristics
  • Buffering of data (configurable size)
  • Caching of data (configurable size)
  • Available to regular Fortran I/O without
    reprogramming
  • Available for C through POSIX-like calls, e.g.
    ffopen, ffwrite

9
The assign command
  • the assign command controls
  • controls which FFIO layer is active
  • striping across multiple partitions
  • lots more
  • scope of assign
  • File name
  • Fortran unit number
  • File type (e.g. all sequential unformatted files)

10
assign Examples
  • read and write to file restart.file from all PEs
    by using the FFIO library global layer
  • assign -F global1282 frestart.file
  • use the FFIO library bufa layer to improve
    performance for file opened on Fortran unit 10
  • assign -F bufa1282 u10
  • use the FFIO library bufa layer to improve
    performance for all unformatted sequential
    Fortran files
  • assign -F bufa1282 gsu

11
assign Examples
  • To see all active assigns
  • assign -V
  • To remove all active assigns
  • assign -R

12
bufa FFIO layer
  • bufa is an asynchronous buffering layer
  • performs read-ahead, write-behind
  • specify buffer size with -F bufabsnbufs where
    bs is the buffer size in units of 4Kbyte blocks,
    and nbufs is the number of buffers
  • buffer space increases your applications memory
    requirements

13
global FFIO layer
  • global is a caching and buffering layer which
    enables multiple PEs to read and write to the
    same file
  • if one PE has already read the data, an
    additional read request from another PE will
    result in a remote memory copy
  • file open is a synchronizing event
  • By default, all PEs must open a global file, this
    can be changed by calling GLIO_GROUP_MPI(comm)
  • specify buffer size with -F globalbsnbufs where
    bs is the buffer size in units of 4Kbyte blocks,
    and nbufs is the number of buffers per PE

14
File positioning with the global FFIO layer
  • Positioning of a read or write is your
    responsibility
  • File pointers are private
  • Fortran
  • Use a direct access file, and read/write(recnum)
  • Use Cray extensions setpos and getpos to
    position file pointer (not portable)
  • C
  • Use ffseek

15
FFIO considerations
  • Examples above use an unblocked file structure,
    normal Fortran files are blocked. To read the
    file without the global or bufa layers you must
    use
  • assign -s unblocked ffilename
  • bufa and global do not allow backspace, or
    skipping over a partially read record. You can
    allow this behavior by using the cos layer in
    addition to bufa or global, but then setpos
    doesnt work.
  • assign -s cos128,bufa1282 ffilename

16
More on FFIO
  • There are many other FFIO layers, some pretty
    obscure
  • cache and cachea layers, good for random access
    files
  • man intro_ffio for a terse description
  • Cray Publication - Application Programmers I/O
    Guide

17
More on assign
  • Many text processing options
  • Switch between Fortran 77 and Fortran 90 namelist
  • File pre-allocation
  • File striping

18
Further Information
  • I/O on the T3E Tutorial by Richard Gerber at
    http//home.nersc.gov/training/tutorials
  • Cray Publication - Application Programmers I/O
    Guide
  • Cray Publication - Cray T3E Fortran Optimization
    Guide
  • man assign

19
MPI I/O
  • Part of MPI-2
  • Interface for High Performance Parallel I/O
  • data partitioning
  • collective I/O
  • asynchronous I/O
  • portability and interoperability

20
MPI I/O Definitions
  • An MPI file is an ordered collection of MPI
    types.
  • A file may be opened individually or collectively
    by a group of processes
  • The fileview defines a template for accessing the
    file and is used to partition the file amongst
    processes

21
Fileviews
  • A fileview is composed of three pieces
  • a displacement (in bytes) form the beginning of
    the file
  • an elementary datatype (etype), which is the unit
    of data access and positioning within the file
  • an filetype, which defines a template for
    accessing the file. A filetype can contain etypes
    or holes of the same extent as etypes.

22
Fileviews (cont.)
  • The filetype pattern is repeated, tiling the
    file
  • Only the non-empty slots are available to read or
    write

23
Fileview (cont.)
  • Each process can have a different filetype
  • Process 0
  • Process 1
  • Process 2

24
MPI_File_set_view
  • Called after MPI_File_open to set fileview
  • MPI_File_set_view(fh, disp, etype, filetype,
    datarep, info)
  • fh is a file handle
  • disp, etype, and filetype define the fileview
  • datarep is one of native, internal, or
    external32
  • info is a set of hints to optimize performance

25
MPI Info object
  • An info object bundles up a set of parameters
  • integer finfo
  • call MPI_Info_create(finfo, ierr)
  • call MPI_Info_set(finfo, access_style,
    write_mostly, ierr)
  • MPI I/O defines a set of parameters used to help
    optimize I/O performance
  • MPI_Info_null can be used instead of an info
    object

26
Open and Close
  • MPI_File_open(comm, filename, amode, info, fh)
  • comm, open is collective over this communicator
  • filename, string or character variable
  • file access mode MPI_MODE_RDONLY, MPI_MODE_RDWR
    etc.
  • info object, used to pass hints to open
  • file handle
  • MPI_File_close(fh)

27
Utility routines
  • MPI_File_delete
  • MPI_File_set_size
  • MPI_File_preallocate
  • MPI_File_set_info

28
Query routines
  • MPI_File_get_size
  • MPI_File_get_group
  • MPI_File_get_amode
  • MPI_File_get_info
  • MPI_File_get_view

29
Data access routines
  • Positioning
  • Explicit, each call has an offset
  • Individual, each PE maintains an individual file
    pointer
  • Shared, the file pointer is maintained globally
  • Synchronism
  • Blocking, routine returns when complete
  • Non-blocking, must call a termination routine to
    ensure completion
  • Coordination
  • Non-collective
  • Collective

30
Summary of access routines
31
Summery of access routines (cont.)
  • MPI_File_seek
  • MPI_File_get_position
  • MPI_File_get_byte_offset
  • MPI_File_seek_shared (collective)
  • MPI_File_get_position_shared

32
T3E Implementation
  • No shared file pointers
  • No non-blocking collective (split collective)
  • SPR filed on non-blocking read
  • Work in progress

33
Examples
  • All the program fragments are available as
    working programs on the T3E
  • Do module load training, then look in
    EXAMPLES/mpi_io
  • All examples are of a distributed dot product
  • initialize data with random numbers
  • compute dot product of whole vector
  • write out data into a shared file
  • read back in and check dot product

PE 0
PE 1
PE 2
34
Naming convention
  • First letter is positioning explicit,
    individual, or shared
  • Second letter is synchronism blocking or
    non-blocking
  • Third letter is coordination non-collective or
    collective
  • ebn.f90 is the explicit, blocking non-collective
    example
  • There are several ibn examples dealing with
    different fileviews

35
Filetype Example
  • Process 0
  • Process 1
  • Process 2

36
Filetype Example
filemode MPI_MODE_RDWR MPI_MODE_CREATE call
MPI_INFO_CREATE(finfo, ierr) call
MPI_INFO_SET(finfo, 'access_style','write_mostly',
ierr) call MPI_FILE_OPEN(MPI_COMM_WORLD,
'vector', filemode, finfo, fhv, ierr) call
MPI_TYPE_CREATE_SUBARRAY(1, mnprocs, m, mme,
MPI_ORDER_FORTRAN, MPI_REAL, mpi_fileslice,
ierr) disp0 call MPI_FILE_SET_VIEW(fhv, disp,
MPI_REAL, mpi_fileslice, 'native',
MPI_INFO_NULL, ierr)
37
Individual, blocking, non-collective
call MPI_FILE_WRITE(fhv, b, m, MPI_REAL, status,
ierr) lresultsdot(m, b, 1, b, 1) call
MPI_REDUCE(lresult, result, 1, MPI_REAL, MPI_SUM,
0, MPI_COMM_WORLD, ierr) if (me.eq.0) then
write(6,) 'dot product ', result end if ! zero
vector and read it back in b0.0 disp0 call
MPI_FILE_SEEK(fhv, disp, MPI_SEEK_SET, ierr) call
MPI_FILE_READ(fhv, b, m, MPI_REAL, status, ierr)
38
Further Information on MPI I/O
  • MPI-The Complete Reference
  • Volume 1, The MPI Core
  • Volume 2, The MPI Extensions
Write a Comment
User Comments (0)
About PowerShow.com