ECE 1747H : Parallel Programming - PowerPoint PPT Presentation

Loading...

PPT – ECE 1747H : Parallel Programming PowerPoint presentation | free to download - id: 148cb8-Y2QyZ



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

ECE 1747H : Parallel Programming

Description:

(this is on our research cluster for the purpose of the homework). Other than that ... May not be wise to change the program (sequential execution would take ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 56
Provided by: CITI
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: ECE 1747H : Parallel Programming


1
ECE 1747H Parallel Programming
  • Lecture 1-2 Overview

2
ECE 1747H
  • Meeting time Mon 4-6 PM
  • Instructor Cristiana Amza,
  • http//www.eecg.toronto.edu/amza
  • amza_at_eecg.toronto.edu, office Pratt 484E

3
Material
  • Course notes
  • Web material (e.g., published papers)
  • No required textbook, some recommended

4
Prerequisites
  • Programming in C or C
  • Data structures
  • Basics of machine architecture
  • Basics of network programming
  • Please send e-mail to ecehelp_at_ece.toronto.edu
  • to get an eecg account !! (name, stuid,
    class, instructor)
  • madalin_at_cs.toronto.edu to get a cluster
    account
  • (this is on our research cluster for the purpose
    of the homework).

5
Other than that
  • No written homeworks, no exams
  • 10 for each small programming assignments
    (expect 1)
  • 10 class participation
  • Rest comes from major course project

6
Programming Project
  • Parallelizing a sequential program, or improving
    the performance or the functionality of a
    parallel program
  • Project proposal and final report
  • In-class project proposal and final report
    presentation
  • Sample project presentation can be posted

7
Parallelism (1 of 2)
  • Ability to execute different parts of a single
    program concurrently on different machines
  • Goal shorter running time
  • Grain of parallelism how big are the parts?
  • Can be instruction, statement, procedure,
  • Will mainly focus on relative coarse grain

8
Parallelism (2 of 2)
  • Coarse-grain parallelism mainly applicable to
    long-running, scientific programs
  • Examples weather prediction, prime number
    factorization, simulations,

9
Lecture material (1 of 4)
  • Parallelism
  • What is parallelism?
  • What can be parallelized?
  • Inhibitors of parallelism dependences

10
Lecture material (2 of 4)
  • Standard models of parallelism
  • shared memory (Pthreads)
  • message passing (MPI)
  • shared memory data parallelism (OpenMP)
  • Classes of applications
  • scientific
  • servers

11
Lecture material (3 of 4)
  • Transaction processing
  • classic programming model for databases
  • now being proposed for scientific programs

12
Lecture material (4 of 4)
  • Perf. of parallel distributed programs
  • architecture-independent optimization
  • architecture-dependent optimization

13
Course Organization
  • First 2-3 weeks of semester
  • lectures on parallelism, patterns, models
  • small programming assignment done individually or
    in teams of up to 3
  • Rest of the semester
  • major programming project, done individually or
    in small group
  • Research paper discussions

14
Parallel vs. Distributed Programming
  • Parallel programming has matured
  • Few standard programming models
  • Few common machine architectures
  • Portability between models and architectures

15
Bottom Line
  • Programmer can now focus on program and use
    suitable programming model
  • Reasonable hope of portability
  • Problem much performance optimization is still
    platform-dependent
  • Performance portability is a problem

16
ECE 1747H Parallel Programming
  • Lecture 1-2 Parallelism, Dependences

17
Parallelism
  • Ability to execute different parts of a program
    concurrently on different machines
  • Goal shorten execution time

18
Measures of Performance
  • To computer scientists speedup, execution time.
  • To applications people size of problem, accuracy
    of solution, etc.

19
Speedup of Algorithm
  • Speedup of algorithm sequential execution time
    / execution time on p processors (with the same
    data set).

speedup
p
20
Speedup on Problem
  • Speedup on problem sequential execution time of
    best known sequential algorithm / execution time
    on p processors.
  • A more honest measure of performance.
  • Avoids picking an easily parallelizable algorithm
    with poor sequential execution time.

21
What Speedups Can You Get?
  • Linear speedup
  • Confusing term implicitly means a 1-to-1 speedup
    per processor.
  • (almost always) as good as you can do.
  • Sub-linear speedup more normal due to overhead
    of startup, synchronization, communication, etc.

22
Speedup
speedup
linear
actual
p
23
Scalability
  • No really precise decision.
  • Roughly speaking, a program is said to scale to a
    certain number of processors p, if going from p-1
    to p processors results in some acceptable
    improvement in speedup (for instance, an increase
    of 0.5).

24
Super-linear Speedup?
  • Due to cache/memory effects
  • Subparts fit into cache/memory of each node.
  • Whole problem does not fit in cache/memory of a
    single node.
  • Nondeterminism in search problems.
  • One thread finds near-optimal solution very
    quickly gt leads to drastic pruning of search
    space.

25
Cardinal Performance Rule
  • Dont leave (too) much of your code sequential!

26
Amdahls Law
  • If 1/s of the program is sequential, then you can
    never get a speedup better than s.
  • (Normalized) sequential execution time 1/s
    (1- 1/s) 1
  • Best parallel execution time on p processors
    1/s (1 - 1/s) /p
  • When p goes to infinity, parallel execution
    1/s
  • Speedup s.

27
Why keep something sequential?
  • Some parts of the program are not parallelizable
    (because of dependences)
  • Some parts may be parallelizable, but the
    overhead dwarfs the increased speedup.

28
When can two statements execute in parallel?
  • On one processor
  • statement 1
  • statement 2
  • On two processors
  • processor1 processor2
  • statement1 statement2

29
Fundamental Assumption
  • Processors execute independently no control over
    order of execution between processors

30
When can 2 statements execute in parallel?
  • Possibility 1
  • Processor1 Processor2
  • statement1
  • statement2
  • Possibility 2
  • Processor1 Processor2
  • statement2
  • statement1

31
When can 2 statements execute in parallel?
  • Their order of execution must not matter!
  • In other words,
  • statement1 statement2
  • must be equivalent to
  • statement2 statement1

32
Example 1
  • a 1
  • b a
  • Statements cannot be executed in parallel
  • Program modifications may make it possible.

33
Example 2
  • a f(x)
  • b a
  • May not be wise to change the program (sequential
    execution would take longer).

34
Example 3
  • a 1
  • a 2
  • Statements cannot be executed in parallel.

35
True dependence
  • Statements S1, S2
  • S2 has a true dependence on S1
  • iff
  • S2 reads a value written by S1

36
Anti-dependence
  • Statements S1, S2.
  • S2 has an anti-dependence on S1
  • iff
  • S2 writes a value read by S1.

37
Output Dependence
  • Statements S1, S2.
  • S2 has an output dependence on S1
  • iff
  • S2 writes a variable written by S1.

38
When can 2 statements execute in parallel?
  • S1 and S2 can execute in parallel
  • iff
  • there are no dependences between S1 and S2
  • true dependences
  • anti-dependences
  • output dependences
  • Some dependences can be removed.

39
Example 4
  • Most parallelism occurs in loops.
  • for(i0 ilt100 i)
  • ai i
  • No dependences.
  • Iterations can be executed in parallel.

40
Example 5
  • for(i0 ilt100 i)
  • ai i
  • bi 2i
  • Iterations and statements can be executed in
    parallel.

41
Example 6
  • for(i0ilt100i) ai i
  • for(i0ilt100i) bi 2i
  • Iterations and loops can be executed in parallel.

42
Example 7
  • for(i0 ilt100 i)
  • ai ai 100
  • There is a dependence on itself!
  • Loop is still parallelizable.

43
Example 8
  • for( i0 ilt100 i )
  • ai f(ai-1)
  • Dependence between ai and ai-1.
  • Loop iterations are not parallelizable.

44
Loop-carried dependence
  • A loop carried dependence is a dependence that is
    present only if the statements are part of the
    execution of a loop.
  • Otherwise, we call it a loop-independent
    dependence.
  • Loop-carried dependences prevent loop iteration
    parallelization.

45
Example 9
  • for(i0 ilt100 i )
  • for(j0 jlt100 j )
  • aij f(aij-1)
  • Loop-independent dependence on i.
  • Loop-carried dependence on j.
  • Outer loop can be parallelized, inner loop cannot.

46
Example 10
  • for( j0 jlt100 j )
  • for( i0 ilt100 i )
  • aij f(aij-1)
  • Inner loop can be parallelized, outer loop
    cannot.
  • Less desirable situation.
  • Loop interchange is sometimes possible.

47
Level of loop-carried dependence
  • Is the nesting depth of the loop that carries the
    dependence.
  • Indicates which loops can be parallelized.

48
Be careful Example 11
  • printf(a)
  • printf(b)
  • Statements have a hidden output dependence due to
    the output stream.

49
Be careful Example 12
  • a f(x)
  • b g(x)
  • Statements could have a hidden dependence if f
    and g update the same variable.
  • Also depends on what f and g can do to x.

50
Be careful Example 13
  • for(i0 ilt100 i)
  • ai10 f(ai)
  • Dependence between a10, a20,
  • Dependence between a11, a21,
  • Some parallel execution is possible.

51
Be careful Example 14
  • for( i1 ilt100i )
  • ai
  • ... ai-1
  • Dependence between ai and ai-1
  • Complete parallel execution impossible
  • Pipelined parallel execution possible

52
Be careful Example 15
  • for( i0 ilt100 i )
  • ai f(aindexai)
  • Cannot tell for sure.
  • Parallelization depends on user knowledge of
    values in indexa.
  • User can tell, compiler cannot.

53
Optimizations Example 16
  • for (i 0 i lt 100000 i) ai 1000 ai
    1

Cannot be parallelized as is. May be parallelized
by applying certain code transformations.
54
An aside
  • Parallelizing compilers analyze program
    dependences to decide parallelization.
  • In parallelization by hand, user does the same
    analysis.
  • Compiler more convenient and more correct
  • User more powerful, can analyze more patterns.

55
To remember
  • Statement order must not matter.
  • Statements must not have dependences.
  • Some dependences can be removed.
  • Some dependences may not be obvious.
About PowerShow.com