Introduction to PETSc - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to PETSc

Description:

The same program goes to all. Each uses separate memory ... The control node (0) reads in the matrix and distributes the rows amongst the processors. ... – PowerPoint PPT presentation

Number of Views:163
Avg rating:3.0/5.0
Slides: 114
Provided by: tis4
Category:

less

Transcript and Presenter's Notes

Title: Introduction to PETSc


1
Introduction to PETSc
  • VIGRE Seminar, Wednesday, November 8, 2006

2
Parallel Computing
  • How (basically) does it work?

3
Parallel Computing
  • How (basically) does it work?
  • Assign each processor a number

4
Parallel Computing
  • How (basically) does it work?
  • Assign each processor a number
  • The same program goes to all

5
Parallel Computing
  • How (basically) does it work?
  • Assign each processor a number
  • The same program goes to all
  • Each uses separate memory

6
Parallel Computing
  • How (basically) does it work?
  • Assign each processor a number
  • The same program goes to all
  • Each uses separate memory
  • They pass information back and forth as necessary

7
Parallel Computing
  • Example 1 Matrix-Vector Product

8
Parallel Computing
  • Example 1 Matrix-Vector Product

and are inputs into the program. 0
and are inputs into the program. 1
and are inputs into the program. 2
9
Parallel Computing
  • Example 1 Matrix-Vector Product

The control node (0) reads in the matrix and distributes the rows amongst the processors. 0 (a, b, c)
The control node (0) reads in the matrix and distributes the rows amongst the processors. 1 (d, e, f)
The control node (0) reads in the matrix and distributes the rows amongst the processors. 2 (g, h, i)
10
Parallel Computing
  • Example 1 Matrix-Vector Product

The control node also sends the vector to each processors memory. 0 (a, b, c) (j, k, l)
The control node also sends the vector to each processors memory. 1 (d, e, f) (j, k, l)
The control node also sends the vector to each processors memory. 2 (g, h, i) (j, k, l)
11
Parallel Computing
  • Example 1 Matrix-Vector Product

Each processor computes its own dot product. 0 (a, b, c) (j, k, l) ajbkcl
Each processor computes its own dot product. 1 (d, e, f) (j, k, l) djekfl
Each processor computes its own dot product. 2 (g, h, i) (j, k, l) gjhkil
12
Parallel Computing
  • Example 1 Matrix-Vector Product

The processors send their results to the control node, which outputs . 0 (a, b, c) (j, k, l) ajbkcl
The processors send their results to the control node, which outputs . 1 (d, e, f) (j, k, l) djekfl
The processors send their results to the control node, which outputs . 2 (g, h, i) (j, k, l) gjhkil
13
Parallel Computing
  • Example 2 Matrix-Vector Product

Suppose for memory reasons each processor only has part of the vector. 0 (a, b, c) j
Suppose for memory reasons each processor only has part of the vector. 1 (d, e, f) k
Suppose for memory reasons each processor only has part of the vector. 2 (g, h, i) l
14
Parallel Computing
  • Example 2 Matrix-Vector Product

Before the multiply, each processor sends the necessary information elsewhere. 0 (a, b, c) j (k from 1) (l from 2)
Before the multiply, each processor sends the necessary information elsewhere. 1 (d, e, f) (j from 0) k (l from 2)
Before the multiply, each processor sends the necessary information elsewhere. 2 (g, h, i) (j from 0) (k from 1) l
15
Parallel Computing
  • Example 2 Matrix-Vector Product

After the multiply, the space is freed again for other uses. 0 (a, b, c) j
After the multiply, the space is freed again for other uses. 1 (d, e, f) k
After the multiply, the space is freed again for other uses. 2 (g, h, i) l
16
Parallel Computing
  • Example 3 Matrix-Matrix Product

The previous case illustrates how to multiply matrices stored across multiple processors. 0 (a, b, c) (j, k, l)
The previous case illustrates how to multiply matrices stored across multiple processors. 1 (d, e, f) (m, n, o)
The previous case illustrates how to multiply matrices stored across multiple processors. 2 (g, h, i) (p, q, r)
17
Parallel Computing
  • Example 3 Matrix-Matrix Product

Each column is distributed for processing in turn. 1) (a, b, c)(j, m, p)a 0 2) (a, b, c)(k, n, q)ß 3) (a, b, c)(l, o, r)?
Each column is distributed for processing in turn. 1) (d, e, f)(j, m, p)d 1 2) (d, e, f)(k, n, q)e 3) (d, e, f)(l, o, r)?
Each column is distributed for processing in turn. 1) (g, h ,i)(j, m, p)? 2 2) (g, h, i)(k, n, q)? 3) (g, h, i)(l, o, r)?
18
Parallel Computing
  • Example 3 Matrix-Matrix Product

The result is a matrix with the same parallel row structure as the first matrix and column structure as the right. 0 (a, ß, ?)
The result is a matrix with the same parallel row structure as the first matrix and column structure as the right. 1 (d, e, ?)
The result is a matrix with the same parallel row structure as the first matrix and column structure as the right. 2 (?, ?, ?)
19
Parallel Computing
  • Example 3 Matrix-Matrix Product

The original indices could also have been sub-matrices, as long as they were compatible. 0 (a, ß, ?)
The original indices could also have been sub-matrices, as long as they were compatible. 1 (d, e, ?)
The original indices could also have been sub-matrices, as long as they were compatible. 2 (?, ?, ?)
20
Parallel Computing
  • Example 4 Block Diagonal Product

Suppose the second matrix is block diagonal. 0 (A, B, C) (J, 0, 0)
Suppose the second matrix is block diagonal. 1 (D, E, F) (0, K, 0)
Suppose the second matrix is block diagonal. 2 (G, H, I) (0, 0, L)
21
Parallel Computing
  • Example 4 Block Diagonal Product

Much less information needs to be passed between the processors. 1) AJa 0 2) BKß 3) CL?
Much less information needs to be passed between the processors. 1) DJd 1 2) EKe 3) FL?
Much less information needs to be passed between the processors. 1) GJ? 2 2) HK? 3) IL?
22
Parallel Computing
  • When is it worth it to parallelize?

23
Parallel Computing
  • When is it worth it to parallelize?
  • There is a time cost associated with passing
    messages

24
Parallel Computing
  • When is it worth it to parallelize?
  • There is a time cost associated with passing
    messages
  • The amount of message passing is dependent on the
    problem and the program (algorithm)

25
Parallel Computing
  • When is it worth it to parallelize?
  • Therefore, the benefits depend more on the
    structure of the problem and the program than on
    the size/speed of the parallel network
    (diminishing returns).

26
Parallel Networks
  • How do I use multiple processors?

27
Parallel Networks
  • How do I use multiple processors?
  • This depends on the network, but
  • Most networks use some variation of PBS, a job
    scheduler, and mpirun or mpiexec.

28
Parallel Networks
  • How do I use multiple processors?
  • This depends on the network, but
  • Most networks use some variation of PBS, a job
    scheduler, and mpirun or mpiexec.
  • A parallel program needs to be submitted as a
    batch job.

29
Parallel Networks
  • Suppose I have a program myprog, which gets data
    from data.dat, which I call in the following
    fashion when only using one processor
  • ./myprog f data.dat
  • I would write a file myprog.pbs that looks like
    the following

30
Parallel Networks
  • PBS q compute (name of the processing queue
    not necessary on all networks)
  • PBS -N myprog (the name of the job)
  • PBS l nodes2ppn1,walltime001000 (number
    of nodes and number of processes per node,
    maximum time to allow the program to run)
  • PBS -o /home/me/mydir/myprog.out (where the
    output of the program should be written)
  • PBS -e /home/me/mydir/myprog.err (where the
    error stream should be written)
  • These are the headers that tell the job scheduler
    how to handle your job.

31
Parallel Networks
  • Although what follows depends on the MPI software
    that the network runs, it should look something
    like this
  • cd PBS_O_WORKDIR (makes the processors run the
    program in the directory where myprog.pbs is
    saved)
  • mpirun machinefile PBS_NODEFILE np 2 myprog
    f mydata.dat (tells the MPI software which
    processes to use and how many processes to start
    notice that command line arguments follows as
    usual)

32
Parallel Networks
  • Once the .pbs file is written, it can be
    submitted to the job scheduler with qsub
  • qsub myprog.pbs

33
Parallel Networks
  • Once the .pbs file is written, it can be
    submitted to the job scheduler with qsub
  • qsub myprog.pbs
  • You can check to see if your job is running with
    the command qstat.

34
Parallel Networks
  • Some systems (but not all) will allow you to
    simulate running your program in parallel on one
    processor, which is useful for debugging
  • mpirun np 3 myprog f mydata.dat

35
Parallel Networks
  • What parallel systems are available?

36
Parallel Networks
  • What parallel systems are available?
  • RTC Rice Terascale Cluster 244 processors.

37
Parallel Networks
  • What parallel systems are available?
  • RTC Rice Terascale Cluster 244 processors.
  • ADA Cray XD1 632 processors.

38
Parallel Networks
  • What parallel systems are available?
  • RTC Rice Terascale Cluster 244 processors.
  • ADA Cray XD1 632 processors.
  • caamster CAAM department exclusive 8(?)
    processors.

39
PETSc
  • What do I use PETSc for?

40
PETSc
  • What do I use PETSc for?
  • File I/O with minimal understanding of MPI

41
PETSc
  • What do I use PETSc for?
  • File I/O with minimal understanding of MPI
  • Vector and matrix based data management (in
    particular sparse)

42
PETSc
  • What do I use PETSc for?
  • File I/O with minimal understanding of MPI
  • Vector and matrix based data management (in
    particular sparse)
  • Linear algebra routines familiar from the famous
    serial packages

43
PETSc
  • At the moment, ada and caamster (and harvey) have
    PETSc installed

44
PETSc
  • At the moment, ada and caamster (and harvey) have
    PETSc installed
  • You can download and install PETSc on your own
    machine (requires cygwin for Windows), for
    educational and debugging purposes

45
PETSc
  • PETSc builds on existing software BLAS and
    LAPACK which implementations to use can be
    specified at configuration

46
PETSc
  • PETSc builds on existing software BLAS and
    LAPACK which implementations to use can be
    specified at configuration
  • Has (slower) debugging configuration and (faster,
    tacit) optimized configuration

47
PETSc
  • Installation comes with documentation, examples,
    and manual pages.

48
PETSc
  • Installation comes with documentation, examples,
    and manual pages.
  • The biggest part of learning how to use PETSc is
    learning how to use the manual pages.

49
PETSc
  • It is extremely useful to have an environmental
    variable PETSC_DIR in you shell of choice, which
    gives the path to the installation of PETSc, e.g.
  • PETSC_DIR/usr/local/src/petsc-2.3.1-p13/
  • export PETSC_DIR

50
PETSc
  • Makefile

51
PETSc
  • Makefile
  • You can pretty much copy/paste/modify the
    makefiles in the examples, but here is the basic
    setup

52
PETSc
  • Makefile
  • () (Other definitions for CFLAGS, etc.)
  • LOCDIR /mydir
  • include PETSC_DIR/bmake/common/base
  • (This is why it is useful to have this variable
    saved)
  • myprog myprog.o chkopts
  • -CLINKER -o myprog myprog.o
    PETSC_LIB
  • RM myprog.o

53
PETSc
  • Headers

54
PETSc
  • Headers
  • include petsc.h in all files, unless the
    routines that you use need more specific headers.

55
PETSc
  • Headers
  • include petsc.h in all files, unless the
    routines that you use need more specific headers.
  • How do you know? Consult the manual pages!

56
PETSc
  • Data Types

57
PETSc
  • Data Types
  • PETSc has a slew of its own data types PetscInt,
    PetscReal, PetscScalar, etc.

58
PETSc
  • Data Types
  • PETSc has a slew of its own data types PetscInt,
    PetscReal, PetscScalar, etc.
  • Usually aliases of normal data types PetscInt
    int, PetscReal double

59
PETSc
  • Data Types
  • PETSc has a slew of its own data types PetscInt,
    PetscReal, PetscScalar, etc.
  • Usually aliases of normal data types PetscInt
    int, PetscReal double
  • Safer to use for compatibility

60
PETSc
  • Usage in C/C

61
PETSc
  • Usage in C/C
  • The top program should begin
  • Static char helpYour message here.
  • int main(int argc,char argv)
  • ( declarations)
  • PetscInitialize(argc,argv,PETSC_NULL,help)

62
PETSc
  • Usage in C/C
  • The top program should end
  • ()
  • PetscFinalize()
  • return(0)

63
PETSc
  • Usage in C/C
  • When first programming, include the following
    variable
  • PetscErrorCode ierr
  • Where youd call a PETSc routine,
  • Routine(arg)
  • write instead
  • ierrRouting(arg)CHKERRQ(ierr)

64
PETSc
  • Usage in C/C
  • When you try to run your program, you will be
    informed of any problems with incompatible data
    types/dimensions/etc.

65
PETSc
  • Data
  • Anything data type larger than a scalar has a
    Create and a Destroy routine.

66
PETSc
  • Data
  • Anything data type larger than a scalar has a
    Create and a Destroy routine.
  • If you run ./myprog log_summary, you get
    created and destroyed for each data type, to
    find memory leaks.

67
PETSc
  • Example Vec

68
PETSc
  • Example Vec
  • Two types global and local

69
PETSc
  • Example Vec
  • Two types global and local
  • Dependent on function do other processors need
    to see this data?

70
PETSc
  • Example Vec
  • Two types global and local
  • Dependent on function do other processors need
    to see this data?
  • Basic usage
  • Vec X
  • VecCreate(PETSC_COMM_WORLD/PETSC_COMM_SELF,X)

71
PETSc
  • Example Vec
  • Advanced usage

72
PETSc
  • Example Vec
  • Advanced usage
  • VecCreateSeq(PETSC_COMM_SELF,n,X)

73
PETSc
  • Example Vec
  • Advanced usage
  • VecCreateSeq(PETSC_COMM_SELF,n,X)
  • VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,X)

74
PETSc
  • Example Vec
  • Advanced usage
  • VecCreateSeq(PETSC_COMM_SELF,n,X)
  • VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,X)
  • VecLoad(instream,VECSEQ,X)

75
PETSc
  • Example Vec
  • Advanced usage
  • VecCreateSeq(PETSC_COMM_SELF,n,X)
  • VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,X)
  • VecLoad(instream,VECSEQ,X)
  • VecCreateMPI(PETSC_COMM_WORLD,n,PETSC_DETERMINE,X
    )

76
PETSc
  • Example Vec
  • Advanced usage
  • VecCreateSeq(PETSC_COMM_SELF,n,X)
  • VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,X)
  • VecLoad(instream,VECSEQ,X)
  • VecCreateMPI(PETSC_COMM_WORLD,n,PETSC_DETERMINE,X
    )
  • VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,N,X)

77
PETSc
  • Example Vec
  • Advanced usage
  • VecCreateSeq(PETSC_COMM_SELF,n,X)
  • VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,X)
  • VecLoad(instream,VECSEQ,X)
  • VecCreateMPI(PETSC_COMM_WORLD,n,PETSC_DETERMINE,X
    )
  • VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,N,X)
  • VecCreateMPIWithArray(PETSC_COMM_WORLD,n,N,vals,X
    )

78
PETSc
  • Example Vec
  • Advanced usage
  • VecCreateSeq(PETSC_COMM_SELF,n,X)
  • VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,X)
  • VecLoad(instream,VECSEQ,X)
  • VecCreateMPI(PETSC_COMM_WORLD,n,PETSC_DETERMINE,X
    )
  • VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,N,X)
  • VecCreateMPIWithArray(PETSC_COMM_WORLD,n,N,vals,X
    )
  • VecLoad(instream,VECMPI,X)

79
PETSc
  • Example Vec
  • Advanced usage
  • VecCreateSeq(PETSC_COMM_SELF,n,X)
  • VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,X)
  • VecLoad(instream,VECSEQ,X)
  • VecCreateMPI(PETSC_COMM_WORLD,n,PETSC_DETERMINE,X
    )
  • VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,N,X)
  • VecCreateMPIWithArray(PETSC_COMM_WORLD,n,N,vals,X
    )
  • VecLoad(instream,VECMPI,X)
  • VecDuplicate(Y,X)

80
PETSc
  • Example Vec
  • Advanced usage
  • VecCreateSeq(PETSC_COMM_SELF,n,X)
  • VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,X)
  • VecLoad(instream,VECSEQ,X)
  • VecCreateMPI(PETSC_COMM_WORLD,n,PETSC_DETERMINE,X
    )
  • VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,N,X)
  • VecCreateMPIWithArray(PETSC_COMM_WORLD,n,N,vals,X
    )
  • VecLoad(instream,VECMPI,X)
  • VecDuplicate(Y,X)
  • MatGetVecs(M,X,PETSC_NULL)

81
PETSc
  • Example Vec
  • Advanced usage
  • VecCreateSeq(PETSC_COMM_SELF,n,X)
  • VecCreateSeqWithArray(PETSC_COMM_SELF,n,vals,X)
  • VecLoad(instream,VECSEQ,X)
  • VecCreateMPI(PETSC_COMM_WORLD,n,PETSC_DETERMINE,X
    )
  • VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,N,X)
  • VecCreateMPIWithArray(PETSC_COMM_WORLD,n,N,vals,X
    )
  • VecLoad(instream,VECMPI,X)
  • VecDuplicate(Y,X)
  • MatGetVecs(M,X,PETSC_NULL)
  • MatGetVecs(M,PETSC_NULL,X)

82
PETSc
  • Example Vec
  • If not created with array or loaded from file,
    values still needed

83
PETSc
  • Example Vec
  • If not created with array or loaded from file,
    values still needed
  • To copy the values of another Vec, with the same
    parallel structure, use VecCopy(Y,X).

84
PETSc
  • Example Vec
  • If not created with array or loaded from file,
    values still needed
  • To copy the values of another Vec, with the same
    parallel structure, use VecCopy(Y,X).
  • To set all values to a single scalar value, use
    VecSet(X,alpha).

85
PETSc
  • Example Vec
  • There are routines for more complicated ways to
    set values

86
PETSc
  • Example Vec
  • There are other routines for more complicated
    ways to set values
  • PETSc guards the block of data where the actual
    values are stored very closely

87
PETSc
  • Example Vec
  • There are other routines for more complicated
    ways to set values
  • PETSc guards the block of data where the actual
    values are stored very closely
  • An assembly routine must be called after these
    other routines

88
PETSc
  • Example Vec
  • Other routines

89
PETSc
  • Example Vec
  • Other routines
  • VecSetValue

90
PETSc
  • Example Vec
  • Other routines
  • VecSetValue
  • VecSetValueLocal (different indexing used)

91
PETSc
  • Example Vec
  • Other routines
  • VecSetValue
  • VecSetValueLocal (different indexing used)
  • VecSetValues

92
PETSc
  • Example Vec
  • Other routines
  • VecSetValue
  • VecSetValueLocal (different indexing used)
  • VecSetValues
  • VecSetValuesLocal

93
PETSc
  • Example Vec
  • Other routines
  • VecSetValue
  • VecSetValueLocal (different indexing used)
  • VecSetValues
  • VecSetValuesLocal
  • VecSetValuesBlocked

94
PETSc
  • Example Vec
  • Other routines
  • VecSetValue
  • VecSetValueLocal (different indexing used)
  • VecSetValues
  • VecSetValuesLocal
  • VecSetValuesBlocked
  • VecSetValuesBlockedLocal

95
PETSc
  • Example Vec
  • Once a vector is assembled, there are routines
    for (almost) every function we could want from a
    vector AXPY, dot product, absolute value,
    pointwise multiplication, etc.

96
PETSc
  • Example Vec
  • Once a vector is assembled, there are routines
    for (almost) every function we could want from a
    vector AXPY, dot product, absolute value,
    pointwise multiplication, etc.
  • Call VecDestroy(X) to free its array when it
    isnt needed anymore.

97
PETSc
  • Example Mat

98
PETSc
  • Example Mat
  • Like Vec, a Mat can be global or local (MPI/Seq)

99
PETSc
  • Example Mat
  • Like Vec, a Mat can be global or local (MPI/Seq)
  • A Mat can take on a large number of data
    structures to optimize and \, even though the
    same routine is used on all structures.

100
PETSc
  • Example Mat
  • Row compressed
  • Block row compressed
  • Symmetric block row compressed
  • Block diagonal
  • And even dense

101
PETSc
  • File I/O

102
PETSc
  • File I/O
  • The equivalent to a stream is a viewer.

103
PETSc
  • File I/O
  • PETSc has equivalent routines to printf, but you
    must decide if you want every node to print or
    just the control node

104
PETSc
  • File I/O
  • PETSc has equivalent routines to printf, but you
    must decide if you want every node to print or
    just the control node
  • To ensure clarity when multiple nodes print, use
    PetscSynchronizedPrintf followed by
    PetscSynchronizedFlush.

105
PETSc
  • File I/O
  • The equivalent to a stream is a viewer, but a
    viewer organizes data across multiple processors.

106
PETSc
  • File I/O
  • The equivalent to a stream is a viewer, but a
    viewer organizes data across multiple processors.
  • A viewer combines an output location
    (file/stdout/stderr), with a format.

107
PETSc
  • File I/O
  • The equivalent to a stream is a viewer, but a
    viewer organizes data across multiple processors.
  • A viewer combines an output location
    (file/stdout/stderr), with a format.
  • Most data types have a View routine such as
    MatView(M,viewer)

108
PETSc
  • File I/O
  • On a batch server, ASCII I/O can be horrendously
    slow.

109
PETSc
  • File I/O
  • On a batch server, ASCII I/O can be horrendously
    slow.
  • PETSc only reads into a parallel format data
    which is stored in binary form.

110
PETSc
  • File I/O
  • On a batch server, ASCII I/O can be horrendously
    slow.
  • PETSc only reads into a parallel format data
    which is stored in binary form.
  • Lots of output data is likely binary is more
    compressed than ASCII.

111
PETSc
  • I have ASCII input data solution?

112
PETSc
  • I have ASCII input data solution?
  • Write a wrapper program
  • Runs on one processor
  • Creates the data to be used in parallel, and
    views it to a binary input file
  • In parallel, it will be automatically distributed

113
PETSc
  • Next Time, Issues for Large Dynamical Systems
  • Time Stepping
  • Updating algebraically
  • Managing lots of similar equations
    (Scattering/Gathering)
Write a Comment
User Comments (0)
About PowerShow.com