To view this presentation, you'll need to enable Flash.

Show me how

After you enable Flash, refresh this webpage and the presentation should play.

Loading...

PPT – Parallel distributed computing techniques PowerPoint presentation | free to download - id: 77de38-ODg1N

The Adobe Flash plugin is needed to view this content

View by Category

Presentations

Products
Sold on our sister site CrystalGraphics.com

About This Presentation

Write a Comment

User Comments (0)

Transcript and Presenter's Notes

Parallel distributed computing techniques

- GVHD
- Ph?m Tr?n Vu

Sinh viên Lê Tr?ng Tín Mai Van Ninh Phùng

Quang Chánh Nguy?n Ð?c C?nh Ð?ng Trung Tín

Contents

Contents

Motivation of Parallel Computing Techniques

- Demand for Computational Speed
- Continual demand for greater computational speed

from a computer system than is currently possible - Areas requiring great computational speed include

numerical modeling and simulation of scienti?c

and engineering problems. - Computations must be completed within a

reasonable time period.

Contents

Message-Passing Computing

- Basics of Message-Passing Programming using

user-level message passing libraries - Two primary mechanisms needed
- A method of creating separate processes for

execution on different computers - A method of sending and receiving messages

Message-Passing Computing

- Static process creation

Source file

Basic MPI way

Compile to suit processor

Source file

Source file

executables

Processor n-1

Processor 0

Message-Passing Computing

- Dynamic process creation

Processor 1

. spawn() . . . . .

PVM way

Start execution of process 2

Processor 2

. . . . . . .

time

Message-Passing Computing

Method of sending and receiving messages?

Contents

Pipelined Computation

- Problem divided into a series of tasks

that have to be completed one after the other

(the basis of sequential programming). - Each task executed by a separate process or

processor.

Pipelined Computation

- Where pipelining can be used to good effect
- 1-If more than one instance of the

complete problem is to be executed - 2-If a series of data items must be

processed, each requiring multiple operations - 3-If information to start the next process

can be passed forward before the process has

completed all its internal operations

Pipelined Computation

- Execution time m p - 1 cycles for a p-stage

pipeline and m instances

Pipelined Computation

Pipelined Computation

Pipelined Computation

Pipelined Computations

Pipelined Computation

Contents

Ideal Parallel Computation

- A computation that can obviously be devided into

a number of completely independent parts - Each of which can be executed by a separate

processor - Each process can do its tasks without any

interaction with other process

Ideal Parallel Computation

- Practical embarrassingly parallel computation

with static process creation and master slave

approach

Ideal Parallel Computation

- Practical embarrassingly parallel computation

with dynamic process creation and master slave

approach

Embarrassingly parallel examples

- Geometrical Transformations of Images
- Mandelbrot set
- Monte Carlo Method

Geometrical Transformations of Images

- Performing on the coordinates of each pixel to

move the position of the pixel without affecting

its value - The transformation on each pixel is totally

independent from other pixels - Some geometrical operations
- Shifting
- Scaling
- Rotation
- Clipping

Geometrical Transformations of Images

- Partitioning into regions for individual

processes - Square region for each process Row region

for each process

80

640

640

80

480

480

10

Mandelbrot Set

- Set of points in a complex plane that are

quasi-stable when computed by iterating the

function - where is the (k 1)th iteration of the complex

number z a bi and c is a complex number

giving position of point in the complex plane.

The initial value for z is zero. - Iterations continued until magnitude of z is

greater than 2 or number of iterations reaches

arbitrary limit. Magnitude of z is the length of

the vector given by

Mandelbrot Set

Mandelbrot Set

Mandelbrot Set

- c.real real_min x (real_max -

real_min)/disp_width - c.imag imag_min y (imag_max -

imag_min)/disp_height

- Static Task Assignment
- Simply divide the region into fixed number of

parts, each computed by a separate processor - Not very successful because different regions

require different numbers of iterations and time

- Dynamic Task Assignment
- Have processor request regions after computing

previouos regions

Mandelbrot Set

- Dynamic Task Assignment
- Have processor request regions after computing

previouos regions

Monte Carlo Method

- Another embarrassingly parallel computation
- Monte Carlo methods use of random selections
- Example To calculate ?
- Circle formed within a square, with unit radius

so that square has side 2x2. Ratio of the area of

the circle to the square given by

Monte Carlo Method

- One quadrant of the construction can be described

by integral - Random pairs of numbers, (xr,yr) generated, each

between 0 and 1. Counted as in circle if that

is,

Monte Carlo Method

- Alternative method to compute integral
- Use random values of x to compute f(x) and sum

values of f(x) - where xr are randomly generated values of x

between x1 and x2 - Monte Carlo method very useful if the function

cannot be integrated numerically (maybe having a

large number of variables)

Monte Carlo Method

- Example computing the integral
- Sequential code
- Routine randv(x1, x2) returns a pseudorandom

number between x1 and x2

Monte Carlo Method

- Parallel Monte Carlo integration

Master

Partial sum

Request

Slaves

Random number

Random-number process

Contents

Partitioning and Divide-and-Conquer

Strategies

Partitioning

- Partitioning simply divides the problem into

parts. - It is the basic of all parallel programming.
- Partitioning can be applied to the program data

(data partitioning or domain decomposition) and

the functions of a program (functional

decomposition). - It is much less mommon to find concurrent

functions in a problem, but data partitioning is

a main strategy for parallel programming.

Partitioning (cont)

A sequence of numbers, x0 ,, xn-1 , are to be

added

n number of items p number of processors

Partitioning a sequence of numbers into parts and

adding them

Divide and Conquer

- Characterized by dividing problem into

subproblems of same form as larger problem.

Further divisions into still smaller

sub-problems, usually done by recursion. - Recursive divide and conquer amenable to

parallelization because separate processes can be

used for divided parts. Also usually data is

naturally localized.

Divide and Conquer (cont)

- A sequential recursive definition for adding
- a list of numbers is
- int add(int s) // add list of numbers, s
- if(number(s) lt 2) return (n1 n2)
- else
- Divide (s, s1, s2) // divide s into two part,

s1, s2 - part_sum1 add(s1)// recursive calls to add

sub lists - part_sum2 add(s2)
- return (part_sum1 part_sum2)

Divide and Conquer (cont)

Initial problem

Divide problem

Final task

Tree construction

42

www.cse.hcmut.edu.vn

Divide and Conquer (cont)

Original list

Initial problem

P0

P0

P4

Divide problem

P2

P0

P6

P4

P7

P6

P5

P4

P3

P2

P1

P0

Final task

x0

xn-1

Partitioning/Divide and Conquer Examples

- Many possibilities.
- Operations on sequences of number such as simply

adding them together - Several sorting algorithms can often be

partitioned or constructed in a recursive fashion - Numerical integration
- N-body problem

Bucket sort

- One bucket assigned to hold numbers that fall

within each region. - Numbers in each bucket sorted using a sequential

sorting algorithm. - Sequental sorting time complexity O(nlog(n/m).
- Works well if the original numbers uniformly

distributed across a known interval, say 0 to a -

1.

n number of items m number of buckets

Parallel version of bucket sort

- Simple approach
- Assign one processor for each bucket.

Further Parallelization

- Partition sequence into m regions, one region for

each processor. - Each processor maintains p small buckets and

separates the numbers in its region into its own

small buckets. - Small buckets then emptied into p ?nal buckets

for sorting, whichrequires each processor to send

one small bucket to each of the other processors

(bucket i to processor i).

Another parallel version of bucket sort

- Introduces new message-passing operation -

all-to-all broadcast.

all-to-all broadcast routine

- Sends data from each process to every other

process

all-to-all broadcast routine (cont)

- all-to-all routine actually transfers rows of

an array to columns - Tranposes a matrix.

Contents

Synchronous Computations

- Synchronous
- Barrier
- Barrier Implementation
- Centralized Counter implementation
- Tree Barrier Implementation
- Butterfly Barrier
- Synchronized Computations
- Fully synchronous
- Data Parallel Computations
- Synchronous Iteration(Synchronous Parallelism)
- Locally synchronous
- Heat Distribution Problem
- Sequential Code
- Parallel Code

Barrier

- A basic mechanism for synchronizing processes -

inserted at the point in each process where it

must wait. - All processes can continue from this point when

all the processes have reached it - Processes reaching barrier at different times

Barrier Image

Barrier Implementation

- Centralized Counter implementation ( linear

barrier) - Tree Barrier Implementation.
- Butterfly Barrier
- Local Synchronization
- Deadlock

Centralized Counter implementation

- Have two phase
- Arrival phase (trapping)
- Departure phase(release)
- A process enters arrival phase and does not leave

this phase until all processes have arrived in

this phase - Then processes move to departure phase and are

released

- Example code
- Master
- for (i 0 i lt n i)/count slaves as they

reach barrier/ - recv(Pany)
- for (i 0 i lt n i)/ release slaves /
- send(Pi)
- Slave processes
- send(Pmaster)
- recv(Pmaster)

Tree Barrier Implementation

- Suppose 8 processes, P0, P1, P2, P3, P4, P5, P6,

P7 - First stage
- P1 sends message to P0 (when P1 reaches its

barrier) - P3 sends message to P2 (when P3 reaches its

barrier) - P5 sends message to P4 (when P5 reaches its

barrier) - P7 sends message to P6 (when P7 reaches its

barrier) - Second stage
- P2 sends message to P0 (P2 P3 reached their

barrier) - P6 sends message to P4 (P6 P7 reached their

barrier) - Second stage
- P4 sends message to P0 (P4, P5, P6, P7

reached barrier) - P0 terminates arrival phase( when P0 reaches

barrier received message from P4)

Tree Barrier Implementation

- Release with a reverse tree construction.

Tree barrier

Butterfly Barrier

- This would be used if data were exchanged between

the processes

Local Synchronization

- Suppose a process Pi needs to be synchronized

and to exchange data with process Pi-1 and

process Pi1 - Not a perfect three-process barrier because

process Pi-1 will only synchronize with Pi and

continue as soon as Pi allows. Similarly,process

Pi1 only synchronizes with Pi.

Synchronized Computations

- Fully synchronous
- In fully synchronous, all processes involved in

the computation must be synchronized. - Data Parallel Computations
- Synchronous Iteration(Synchronous Parallelism)
- Locally synchronous
- In locally synchronous, processes only need to

synchronize with a set of logically nearby

processes, not all processes involved in the

computation - Heat Distribution Problem
- Sequential Code
- Parallel Code

Data Parallel Computations

- Same operation performed on different data

elements simultaneously (SIMD) - Data parallel programming is very convenient for

two reasons - The first is its ease of programming (essentially

only one program) - The second is that it can scale easily to larger

problems sizes

Synchronous Iteration

- Each iteration composed of several processes that

start together at beginning of iteration. Next

iteration cannot begin until all processes have

finished previous iteration Using forall - for (j 0 j lt n j) /for each synch.

iteration / - forall (i 0 i lt N i) /N procs each

using/ - body(i) / specific value of i /

Synchronous Iteration

- Solving a General System of Linear Equations by

Iteration - Suppose the equations are of a general form with

n equations and n unknowns where the unknowns are

x0, x1, x2, xn-1 (0 lt i lt n). - an-1,0x0 an-1,1x1 an-1,2x2

an-1,n-1xn-1 bn-1 - .
- .
- .
- .
- a2,0x0 a2,1x1 a2,2x2 a2,n-1xn-1 b2
- a1,0x0 a1,1x1 a1,2x2 a1,n-1xn-1 b1
- a0,0x0 a0,1x1 a0,2x2 a0,n-1xn-1 b0
- where the unknowns are x0, x1, x2, xn-1 (0lt i

lt n).

Synchronous Iteration

- By rearranging the ith equation
- ai,0x0 ai,1x1 ai,2x2 ai,n-1xn-1 bi
- to
- xi (1/ai,i)bi-(ai,0x0ai,1x1ai,2x2ai,i-1xi-1

ai ,i1xi1ai,n-1xn-1) - Or

Heat Distribution Problem

- An area has known temperatures along each of its

edges. Find thetemperature distribution within.

Divide area into fine mesh of points, hi,j.

Temperature at an inside point taken to be

average of temperatures of four neighboring - points..
- Temperature of each point by iterating the

equation - (0 lt i lt n, 0 lt j lt n)

Heat Distribution Problem

Sequential Code

- Using a fixed number of iterations
- for (iteration 0 iteration lt limit

iteration) - for (i 1 i lt n i)
- for (j 1 j lt n j)
- gij 0.25(hi-1jhi1jhij-1

hij1) - for (i 1 i lt n i)/ update points /
- for (j 1 j lt n j)
- hij gij

Parallel Code

- With fixed number of iterations, Pi,j (except for

the boundary points) - for (iteration 0 iteration lt limit

iteration) - g 0.25 (w x y z)
- send(g, Pi-1,j) / non-blocking sends /
- send(g, Pi1,j)
- send(g, Pi,j-1)
- send(g, Pi,j1)
- recv(w, Pi-1,j) / synchronous receives /
- recv(x, Pi1,j)
- recv(y, Pi,j-1)
- recv(z, Pi,j1)

Local Barrier

Contents

Load Balancing Termination Detection

Load Balancing Termination Detection

Load Balancing

Load Balancing Termination Detection

Static Load Balancing

- Round robin algorithm passes out tasks in

sequential order of processes coming back to the

first when all processes have been given a task - Randomized algorithms selects processes at

random to - take tasks
- Recursive bisection recursively divides the

problem into - subproblems of equal computational effort

while minimizing message passing - Simulated annealing an optimization technique
- Genetic algorithm another optimization

technique, described

Static Load Balancing

- Several fundamental flaws with static load

balancing even if a mathematical solution exists - Very difficult to estimate accurately the

execution times of various parts of a program

without actually executing the parts. - Communication delays that vary under different

circumstances - Some problems have an indeterminate number of

steps to reach their solution.

Dynamic Load Balancing

Centralized dynamic load balancing

- Tasks handed out from a centralized location.

Master-slave structure - Master process(or) holds the collection of tasks

to be performed. - Tasks are sent to the slave processes. When a

slave process completes one task, it requests

another task from the master process. - (Terms used work pool, replicated worker,

processor farm.)

Centralized dynamic load balancing

Termination

- Computation terminates when
- The task queue is empty and
- Every process has made a request for

another task without any new tasks being

generated - Not sufficient to terminate when task queue empty

if one or more processes are still running if a

running process may provide new tasks for task

queue.

Decentralized dynamic load balancing

Fully Distributed Work Pool

- Processes to execute tasks from each other
- Task could be transferred by
- - Receiver-initiated
- - Sender-initiated

Process Selection

- Algorithms for selecting a process
- Round robin algorithm process Pi requests tasks

from process Px,where x is given by a counter

that is incremented after each request, using

modulo n arithmetic (n processes), excluding x

i. - Random polling algorithm process Pi requests

tasks from process Px, where x is a number that

is selected randomly between 0 and n- 1

(excluding i).

Distributed Termination Detection Algorithms

- Termination Conditions
- Application-specific local termination conditions

exist throughout the collection of processes, at

time t. - There are no messages in transit between

processes at time t. - Second condition necessary because a message

in transit might restart a terminated process.

More difficult to recognize. The time that it

takes for messages to travel between processes

will not be known in advance.

Using Acknowledgment Messages

- Each process in one of two states
- Inactive - without any task to perform
- Active
- Process that sent task to make it enter the

active state becomes its parent.

Using Acknowledgment Messages

- When process receives a task, it immediately

sends an acknowledgment message, except if the

process it receives the taskfrom is its parent

process. Only sends an acknowledgment message to

its parent when it is ready to become inactive,

i.e. when - Its local termination condition exists (all tasks

are completed, and It has transmitted all its

acknowledgments for tasks it has received, and It

has received all its acknowledgments for tasks it

has sent out. - A process must become inactive before its parent

process. When first process becomes idle, the

computation can terminate

Load balancing/termination detection Example

EX Finding the shortest distance between two

points on a graph.

References Parallel Programming Techniques and

Applications Using Networked Workstations and

Parallel Computers, Barry Wilkinson and MiChael

Allen, Second Edition, Prentice Hall, 2005.

QA

Thank You !

About PowerShow.com

PowerShow.com is a leading presentation/slideshow sharing website. Whether your application is business, how-to, education, medicine, school, church, sales, marketing, online training or just for fun, PowerShow.com is a great resource. And, best of all, most of its cool features are free and easy to use.

You can use PowerShow.com to find and download example online PowerPoint ppt presentations on just about any topic you can imagine so you can learn how to improve your own slides and presentations for free. Or use it to find and download high-quality how-to PowerPoint ppt presentations with illustrated or animated slides that will teach you how to do something new, also for free. Or use it to upload your own PowerPoint slides so you can share them with your teachers, class, students, bosses, employees, customers, potential investors or the world. Or use it to create really cool photo slideshows - with 2D and 3D transitions, animation, and your choice of music - that you can share with your Facebook friends or Google+ circles. That's all free as well!

For a small fee you can get the industry's best online privacy or publicly promote your presentations and slide shows with top rankings. But aside from that it's free. We'll even convert your presentations and slide shows into the universal Flash format with all their original multimedia glory, including animation, 2D and 3D transition effects, embedded music or other audio, or even video embedded in slides. All for free. Most of the presentations and slideshows on PowerShow.com are free to view, many are even free to download. (You can choose whether to allow people to download your original PowerPoint presentations and photo slideshows for a fee or free or not at all.) Check out PowerShow.com today - for FREE. There is truly something for everyone!

You can use PowerShow.com to find and download example online PowerPoint ppt presentations on just about any topic you can imagine so you can learn how to improve your own slides and presentations for free. Or use it to find and download high-quality how-to PowerPoint ppt presentations with illustrated or animated slides that will teach you how to do something new, also for free. Or use it to upload your own PowerPoint slides so you can share them with your teachers, class, students, bosses, employees, customers, potential investors or the world. Or use it to create really cool photo slideshows - with 2D and 3D transitions, animation, and your choice of music - that you can share with your Facebook friends or Google+ circles. That's all free as well!

For a small fee you can get the industry's best online privacy or publicly promote your presentations and slide shows with top rankings. But aside from that it's free. We'll even convert your presentations and slide shows into the universal Flash format with all their original multimedia glory, including animation, 2D and 3D transition effects, embedded music or other audio, or even video embedded in slides. All for free. Most of the presentations and slideshows on PowerShow.com are free to view, many are even free to download. (You can choose whether to allow people to download your original PowerPoint presentations and photo slideshows for a fee or free or not at all.) Check out PowerShow.com today - for FREE. There is truly something for everyone!

presentations for free. Or use it to find and download high-quality how-to PowerPoint ppt presentations with illustrated or animated slides that will teach you how to do something new, also for free. Or use it to upload your own PowerPoint slides so you can share them with your teachers, class, students, bosses, employees, customers, potential investors or the world. Or use it to create really cool photo slideshows - with 2D and 3D transitions, animation, and your choice of music - that you can share with your Facebook friends or Google+ circles. That's all free as well!

For a small fee you can get the industry's best online privacy or publicly promote your presentations and slide shows with top rankings. But aside from that it's free. We'll even convert your presentations and slide shows into the universal Flash format with all their original multimedia glory, including animation, 2D and 3D transition effects, embedded music or other audio, or even video embedded in slides. All for free. Most of the presentations and slideshows on PowerShow.com are free to view, many are even free to download. (You can choose whether to allow people to download your original PowerPoint presentations and photo slideshows for a fee or free or not at all.) Check out PowerShow.com today - for FREE. There is truly something for everyone!

For a small fee you can get the industry's best online privacy or publicly promote your presentations and slide shows with top rankings. But aside from that it's free. We'll even convert your presentations and slide shows into the universal Flash format with all their original multimedia glory, including animation, 2D and 3D transition effects, embedded music or other audio, or even video embedded in slides. All for free. Most of the presentations and slideshows on PowerShow.com are free to view, many are even free to download. (You can choose whether to allow people to download your original PowerPoint presentations and photo slideshows for a fee or free or not at all.) Check out PowerShow.com today - for FREE. There is truly something for everyone!

Recommended

«

/ »

Page of

«

/ »

Promoted Presentations

Related Presentations

Page of

Home About Us Terms and Conditions Privacy Policy Presentation Removal Request Contact Us Send Us Feedback

Copyright 2018 CrystalGraphics, Inc. — All rights Reserved. PowerShow.com is a trademark of CrystalGraphics, Inc.

Copyright 2018 CrystalGraphics, Inc. — All rights Reserved. PowerShow.com is a trademark of CrystalGraphics, Inc.

The PowerPoint PPT presentation: "Parallel distributed computing techniques" is the property of its rightful owner.

Do you have PowerPoint slides to share? If so, share your PPT presentation slides online with PowerShow.com. It's FREE!