Parallel Programming Fortran M - PowerPoint PPT Presentation

1 / 74
About This Presentation
Title:

Parallel Programming Fortran M

Description:

Tasks are implemented in FM as processes ... Because FM directly uses the task/channel metaphor, previous techniques can be applied directly ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 75
Provided by: LeeMcC
Category:

less

Transcript and Presenter's Notes

Title: Parallel Programming Fortran M


1
Parallel Programming Fortran M HPF
  • September 24, 2002

2
FM Overview
  • FM (like CC) is a set of extensions to the
    basic Fortran language
  • Language constructs to create tasks and channels
  • Constructs to send and receive messages
  • Ensures deterministic execution
  • Mapping decisions do not effect design

3
Fortran a short history
  • First successful high-level language
  • Created by IBM in 1955
  • Adopted by most scientific and military
    institutions (over Assembly)
  • ASA standard published in 1966 became FORTRAN 66
  • FORTRAN 77 became the stable standard and is
    still used by many compilers today

4
Fortran a short history
  • FORTRAN 90/95 (unofficial names) added many of
    the properties common in other HLLs
  • Free format source code form (column independent)
  • Modern control structures (CASE DO WHILE)
  • Records (structures)
  • Array notation (array sections, array operators,
    etc.)
  • Dynamic memory allocation
  • Derived types and operator overloading
  • Keyword argument passing, INTENT (in, out, inout)
  • Numeric precision and range control
  • Modules

5
Fortran a short history
  • Scientists still use it because
  • Variable-dimension array arguments in subroutines
  • Built-in complex arithmetic
  • A compiler-supported infix exponentiation
    operator which is generic with respect to both
    precision and type
  • array notation that allows operations on array
    sections

6
FM Introduction
  • Any valid Fortran program is a valid FM program
    except
  • COMMON must be renamed to PROCESS COMMON
  • Compilers usually do this for you
  • Emphasizes modularity

7
FM Introduction
  • Processes
  • Single and Multiple-producer channels
  • Process blocks and do-loops
  • Sending and receiving messages
  • Mapping
  • Variable passing

8
(No Transcript)
9
Concurrency Defining Processes
  • Tasks are implemented in FM as processes
  • Process definition defines its interface to its
    environment
  • program fm_bridge_construction
  • INPORT (integer) pi
  • OUTPORT (integer) po
  • CHANNEL(inpi, outpo)

10
Concurrency Defining Processes
  • Typed port variables
  • INPORT (integer, real) p1
  • INPORT (real x(128)) p2
  • INPORT (integer m, real x(m)) p3

11
Concurrency Creating Processes
  • An FM program starts out as a single process that
    spawns additional processes (like CC)
  • PROCESSES
  • statement_1
  • .
  • .
  • .
  • statement_n
  • ENDPROCESSES

12
Concurrency Creating Processes
  • One standard subroutine call can be made (e.g.
    call subroutine_1(po1))
  • All other subroutine calls must be to processes
    and are made using the PROCESSCALL command
  • PROCESSES
  • PROCESSCALL worker(pi1)
  • PROCESSCALL worker(pi2)
  • PROCESSCALL process_master(po1,po2)
  • ENDPROCESSES

13
Concurrency Creating Processes
  • Statements in a PROCESSES block executes
    concurrently
  • The block terminates when all of the child
    processes return

14
Concurrency Creating Processes
  • Multiple instances of the same process can be
    created with the PROCESSDO statement
  • PROCESSDO i 1,10
  • PROCESSCALL myprocess
  • ENDPROCESSDO

15
Concurrency Creating Processes
  • PROCESSDO can be nested
  • PROCESSES
  • PROCESSCALL master
  • PROCESSDO i 1,10
  • PROCESSCALL worker
  • ENDPROCESSDO
  • ENDPROCESSES

16
Communication
  • FM processes cannot share data directly
  • Channels can be single-producer, single-consumer
    or multiple-producer, single-consumer

17
Communication Creating Channels
  • Channels are created using the CHANNEL statement
  • CHANNEL(ininport, outoutport)
  • Defines both the input port and the output port

18
Communication Creating Channels
19
Communication Sending Messages
  • A process sends a message by applying the SEND
    statement to an outport
  • OUTPORT (integer, real x(10)) po
  • ...
  • SEND(po) i, a

20
Communication Sending Messages
  • ENDCHANNEL is used to send an end-of-channel
    (EOC) message and set the outport variable to
    null
  • SEND is non-blocking (asynchronous)

21
Communication Receiving Messages
  • A process receives a message by using the RECEIVE
    statement on an inport
  • INPORT (integer n, real x(n)) pi
  • integer num
  • real a(128, 128)
  • RECEIVE(pi) num, a(1,offset)

22
Communication Receiving Messages
  • endlabel causes execution to continue at the
    label when an EOC message is received
  • PROCESS bridge(pi) ! Process definition
  • INPORT (integer) pi ! Argument inport
  • integer num ! Local variable
  • do while(.true.) ! While not done
  • RECEIVE(portpi, end10) num ! Receive message
  • call use_girder(num) ! Process message
  • enddo !
  • 10 end ! End of process

23
Unstructured Communication
  • Identity of communicating processes change during
    program execution
  • Many-to-one
  • Many-to-many
  • Dynamic creation of channels

24
Many-to-One Communication
  • FMs MERGER statement creates a FIFO message
    queue
  • Allows multiple outports to reference it
  • MERGER(ininport, outoutport_specifier)
  • The outport_specifier can be an outport, list or
    outport_specifiers, or an array section from an
    outport array

25
Many-to-One Communication
  • INPORT (integer) pi ! Single inport
  • OUTPORT(integer) pos(4) ! Four outports
  • MERGER(inpi,outpos()) ! Merger
  • PROCESSES
  • call consumer(pi) ! Single consumer
  • PROCESSDO i1,4
  • PROCESSCALL producer(pos(i))
  • ENDPROCESSDO
  • ENDPROCESSES

26
Many-to-Many Communication
  • Similar in implementation to Many-to-one code
    using multiple mergers
  • OUTPORT(integer) pos(3,4) ! 3x4 outports
  • INPORT (integer) pis(4) ! 3 inports
  • do i1,3 ! 3 mergers
  • MERGER(inpis(i),outpos(i,)) !
  • enddo !
  • PROCESSES !
  • PROCESSDO i1,4 !
  • PROCESSCALL producer(pos(1,i)) ! 4 producers
  • ENDPROCESSDO !
  • PROCESSDO i1,3 !
  • PROCESSCALL consumers(pis(i)) ! 3 consumers
  • ENDPROCESSDO !
  • ENDPROCESSES !

27
Dynamic Channel Structures
  • I/O ports can be sent via inports or outports
  • INPORT (OUTPORT (integer)) pi
  • OUTPORT (integer) qo ! Outport
  • RECEIVE(pi) qo ! Receive outport

28
(No Transcript)
29
Asynchronous Communication
  • Specialized data tasks used to read and write
    data requests
  • Can be implemented with the methodology just
    discussed
  • Distributed data version can be accomplished
    using PROBE to do polling of an inport

30
Asynchronous Communication
  • PROBE sets a logical flag to denote if there is a
    message in an inport queue
  • inport (T) requests ! T an arbitrary type
    logical eflag
  • do while (.true.) ! Repeat
  • call advance_local_search ! Compute
  • PROBE(requests,emptyeflag) ! Poll for
    requests
  • if(.not. eflag) call respond_to_requests
  • enddo

31
Determinism
  • MERGER and PROBE are non-deterministic constructs
  • Any program not using these is guaranteed to be
    deterministic
  • PROCESSDO i 1,2 PROCESSES
  • PROCESSCALL proc(i,x) PROCESSCALL proc(1,x)
  • ENDPROCESSDO PROCESSCALL proc(2,x) ENDPRO
    CESSES

32
Argument Passing
  • By default, variables passed on ports are copied
    to and from the call
  • If a process only reads a value, then there is no
    need to copy the variable back to the calling
    process
  • Use INTENT statement to keep from copying back to
    the calling process

33
Mapping
  • Mapping in FM changes execution time but not the
    correctness of an algorithm
  • PROCESSORS specifies the shape and dimension of a
    virtual processor array
  • LOCATION maps processes to processors
  • SUBMACHINE specifies that a process should
    execute in a subset of the array

34
Mapping Virtual Computers
35
Mapping Process Placement
  • LOCATION annotation is similar in form to the
    PROCESSOR statement
  • statement LOCATION(index_to_processor_array)

36
Mapping Process Placement
  • Example
  • program ring
  • parameter(P3)
  • PROCESSORS(P)
  • ...
  • PROCESSDO i 1,P
  • PROCESSCALL ringnode(i, P, pi(i), po(i))
    LOCATION(i)
  • ENDPROCESSDO

37
Mapping Submachines
  • Same format as the LOCATION annotation
  • SUBMACHINE sets up a new virtual computer within
    the current virtual computer (set up by a
    PROCESSORS statement or another SUBMACHINE)

38
Performance Issues
  • Because FM directly uses the task/channel
    metaphor, previous techniques can be applied
    directly
  • A SEND incurs only one communication cost (not
    two) although FM code tends to have more SENDs

39
Performance Issues
  • Process creation
  • Cost depends on the compiler
  • If Unix processes are created then they can be
    expensive
  • If threads are created then they are relatively
    cheap
  • Fairness
  • Compiler optimization

40
Break
  • Turn in proposals if you havent already

41
Data Parallelism with HPF
  • Data parallelism is when the same operation is
    applied to elements of a data ensemble
  • A data-parallel program is a sequence of such
    operations

42
Data Parallelism - Concurrency
  • Data structures operated on in a data-parallel
    program can be regular (arrays) or irregular
    (trees, sparse matrix, etc.)
  • HPF requires that they be arrays

43
Data Parallelism - Concurrency
  • Explicit vs. Implicit parallel constructs
  • Explicit
  • A BC ! A, B, C are arrays
  • Implicit
  • do i 1,m
  • do j 1,n
  • A(i,j) B(i,j)C(i,j)
  • enddo
  • enddo

44
Data Parallelism Concurrency
  • HPF compilation can introduce additional
    communication depending on how data elements are
    distributed
  • real y, s, X(100)
  • X Xy
  • do i 2,99
  • X(i) (X(i-1) X(i1))/2
  • enddo
  • s SUM(X)

Possible Communication
45
Data Parallelism Locality
  • Obviously, data location can drastically effect
    the performance of a program
  • HPF allows the programmer to dictate how a data
    structure is to be distributed
  • !HPF PROCESSORS pr(16)
  • real X(1024)
  • !HPF DISTRIBUTE X(BLOCK) ONTO pr

46
Data Parallelism Design
  • Higher level
  • Not required to specify communications
  • Compiler determines communications from data
    distribution specified
  • More restrictive
  • Not all algorithms can be specified as
    data-parallel

47
Data Parallelism Languages
  • Fortran 90
  • High Performance Fortran

48
Fortran 90
  • This is a complex language that extends Fortran
    77
  • Pointers
  • User-defined types
  • Dynamic storage
  • More
  • Array assignment
  • Array intrinsic functions

49
Fortran 90 Array Assignment Statement
  • A typical scalar operation can be applied to
    arrays
  • integer A(10,10), B(10,10), c
  • A B c

50
Fortran 90 Array Assignment Statement
  • Subsets can also be referenced
  • Masked array assignment
  • WHERE(X / 0) X 1.0/X

51
Fortran 90 Array Intrinsic Functions
52
Fortran 90 Finite Difference
53
Data Distribution
  • Fortran 90s array capabilities provide
    opportunities for concurrency but not locality
  • HPF adds directives that gives the programmer
    some control over locality
  • PROCESSORS
  • ALIGN
  • DISTRIBUTE

54
Data Distribution Processors
  • Same as the PROCESSORS statement in FM except for
    the directive flag
  • !HPF PROCESSORS P(32)
  • !HPF PROCESSORS Q(4,8)

55
Data Distribution Alignment
  • Data elements of different arrays may relate to
    one another
  • The ALIGN directive is used to specify which
    elements should, if possible, be collocated
  • !HPF ALIGN array WITH target

56
Data Distribution Alignment
  • Simplest form
  • real B(50), C(50)
  • !HPF ALIGN C() WITH B()

57
Data Distribution Alignment
58
Data Distribution Distribute directive
  • Each dimension of an array can be distributed to
    processors in one of three ways
  • none
  • BLOCK(n) Block (default nN/P)
  • CYCLIC(n) Cyclic (default n1)

59
Data Distribution Distribute directive
60
HPF Finite Difference
61
More Complex Data Mapping
  • Fortran 90 array operations generally require
    balanced or comparable arrays
  • This is not always the mapping that needs to occur

62
FORALL statement
  • Allows more general assignments to sections of an
    array
  • FORALL (i1m, j1n) X(i,j)ij
  • FORALL (i1n, j1n, iltj) Y(i,j)0.0
  • FORALL (i1n) Z(i,i)0.0

63
FORALL statement
  • To maintain determinism a FORALL statement can
    only write to an element once
  • FORALL (i1n) A(Index(i)) B(i)

64
INDEPENDENT directive
  • Modifies a do-loop
  • Tells the compiler that each iteration of a
    do-loop is independent
  • !HPF INDEPENDENT
  • do i1,n
  • A(Index(i)) B(i)
  • enddo

65
Discovering Physical Processors
  • A function call to NUMBER_OF_PROCESSORS() can
    return the number of physical processors
  • !HPF PROCESSORS P(NUMBER_OF_PROCESSORS())

66
Discovering Processor Configuration
  • A function call can be made to PROCESSORS_SHAPE()
    to determine the connection scheme
  • integer Q(SIZE(PROCESSORS_SHAPE()))

67
Discovering Processor Examples
68
Performance Issues
  • Programming skill
  • Compiler
  • Sequential Bottlenecks
  • Communication Costs

69
Performance Issues Compilation
  • HPF compilers typically use the owner-computes
    rule to decide which processor runs what
    operation
  • Communication operations are then optimized
    specifically, message passing is moved out of
    loops

70
Performance Issues Sequential Bottlenecks
  • These occur when
  • the programmer has not provided enough
    opportunities for parallelization
  • When concurrency is implicit and the compiler
    fails to recognize this

71
Performance Issues Communication Costs
  • F90 and HPF can incur a great deal of
    communication costs
  • Intrinsics and array operations can use values
    across an entire array or multiple arrays
  • Non-aligned arrays

72
(No Transcript)
73
Performance Issues Communication Costs
  • Switching decompositions at procedure boundaries
  • Compiler optimizations may not use suggestions
    made by the programmer

74
Summary
  • Fortran M
  • Data parallelism
  • Fortran 90
  • High Performance Fortran
Write a Comment
User Comments (0)
About PowerShow.com