Experiencing Cluster Computing - PowerPoint PPT Presentation

About This Presentation
Title:

Experiencing Cluster Computing

Description:

Suppose you are using the most efficient algorithm with an optimal ... Biology, pharmacology, genome sequencing, genetic engineering, protein folding, ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 45
Provided by: ProjectA7
Category:

less

Transcript and Presenter's Notes

Title: Experiencing Cluster Computing


1
Experiencing Cluster Computing
  • Class 1

2
Introduction to Parallelism
3
Outline
  • Why Parallelism
  • Types of Parallelism
  • Drawbacks
  • Concepts
  • Starting Parallelization
  • Simple Example

4
Why Parallelism
5
Why Parallelism Passively
  • Suppose you are using the most efficient
    algorithm with an optimal implementation and the
    program still takes too long or does not even fit
    onto your machine?
  • Parallelization is the last chance.

6
Why Parallelism Initiatively
  • Faster
  • Finish the work earlier
  • Same work in shorter time
  • Do more work
  • More work in the same time
  • Most importantly, you want to predict the result
    before the event occurs

7
Examples
  • Many of the scientific and engineering problems
    require enormous computational power. Following
    are the few fields to mention
  • Quantum chemistry, statistical mechanics, and
    relativistic physics
  • Cosmology and astrophysics
  • Computational fluid dynamics and turbulence
  • Material design and superconductivity
  • Biology, pharmacology, genome sequencing, genetic
    engineering, protein folding, enzyme activity,
    and cell modeling
  • Medicine, and modeling of human organs and bones
  • Global weather and environmental modeling
  • Machine Vision

8
Parallelism
  • The upper bound for the computing power that can
    be obtained from a single processor is limited by
    the fastest processor available at any certain
    time.
  • The upper bound for the computing power available
    can be dramatically increased by integrating a
    set of processors together.
  • Synchronization and exchange of partial results
    among processors are therefore unavoidable.

9
Computer Architecture
4 categories SISD Single Instruction Single
Data SIMD Single Instruction Multiple
Data MISD Multiple Instruction Single
Data MIMD Multiple Instruction Multiple Data
10
Computer Architecture
11
Multiprocessing Clustering
Parallel Computer Architecture
Shared Memory Symmetric multiprocessors (SMP)
Distributed Memory Cluster
12
Types of Parallelism
13
Parallel Programming Paradigm
  • Multithreading
  • OpenMP
  • Message Passing
  • MPI (Message Passing Interface)
  • PVM (Parallel Virtual Machine)

Shared memory only
Shared memory, Distributed memory
14
Threads
  • In computer programming, a thread is placeholder
    information associated with a single use of a
    program that can handle multiple concurrent
    users.
  • From the program's point-of-view, a thread is the
    information needed to serve one individual user
    or a particular service request.
  • If multiple users are using the program or
    concurrent requests from other programs occur, a
    thread is created and maintained for each of
    them.
  • The thread allows a program to know which user is
    being served as the program alternately gets
    re-entered on behalf of different users.

15
Threads
  • Programmers view
  • Single CPU
  • Single block of memory
  • Several threads of action
  • Parallelization
  • Done by the compiler

Fork-Join Model
16
Shared Memory
  • Programmers view
  • Several CPUs
  • Single block of memory
  • Several threads of action
  • Parallelization
  • Done by the compiler
  • Example
  • OpenMP

Single threaded
P1
P2
P3
Process
Threads
P1
Data exchange via shared memory
Process
P2
Multi-threaded
P3
time
17
Multithreaded Parallelization
18
Distributed Memory
  • Programmers view
  • Several CPUs
  • Several block of memory
  • Several threads of action
  • Parallelization
  • Done by hand
  • Example
  • MPI

19
Drawbacks
20
Drawbacks of Parallelism
  • Traps
  • Deadlocks
  • Process Synchronization
  • Programming Effort
  • Few tools support for automated parallelization
    and debugging
  • Task Distribution (Load balancing)

21
Deadlock
  • The earliest computer operating systems ran only
    one program at a time.
  • All of the resources of the system were available
    to this one program.
  • Later, operating systems ran multiple programs at
    once, interleaving them.
  • Programs were required to specify in advance what
    resources they needed so that they could avoid
    conflicts with other programs running at the same
    time.
  • Eventually some operating systems offered dynamic
    allocation of resources. Programs could request
    further allocations of resources after they had
    begun running. This led to the problem of the
    deadlock.

22
Deadlock
  • Parallel tasks require resources to accomplish
    their work. If the resources are not available,
    the work cannot be finished. Each resource can
    only be locked (controlled) by exactly one task
    at any given point in time.
  • Consider the situation
  • Two tasks need both the same two resources.
  • Each task manages to gain control over just one
    resource, but not the other.
  • Neither task releases the resource that it
    already holds.
  • It is called deadlock and the program will not
    terminate.

23
Deadlock
Resource
Resource
24
Dining Philosophers
  • Each philosopher either thinks or eats.
  • In order to eat, he requires two forks.
  • Each philosopher tries to pick up the right fork
    first.
  • If success, he waits for the left fork to become
    available.
  • ? Deadlock

25
Dining Philosophers Demo
  • Problem
  • http//www.sci.hkbu.edu.hk/tdgc/tutorial/ExpCluste
    rComp/deadlock/Diners.htm
  • Solution
  • http//www.sci.hkbu.edu.hk/tdgc/tutorial/ExpCluste
    rComp/deadlock/FixedDiners.htm

26
Concepts
27
Speedup
  • Given a fixed problem size.
  • TS sequential wall clock execution time (in
    seconds)
  • TN parallel wall clock execution time using N
    processors (in seconds)
  • Ideally, speedup N ? Linear speed up

28
Speedup
  • Absolute Speedup
  • Sequential time on 1 processor/parallel time on N
    processors
  • Relative Speedup
  • Parallel time on 1 processor/parallel time on N
    processors
  • Different because parallel code on 1 processor
    has unnecessary MPI overhead
  • It may be slower than sequential code on 1
    processor

29
Parallel Efficiency
  • Effciency is a measure of process utilization in
    a parallel program, relative to the serial
    program.
  • Parallel Efficiency E Speedup per processor
  • Ideally, EN 1.

30
Amdahls Law
  • It states that potential program speedup is
    defined by the fraction of code (f) which can be
    parallelized
  • If none of the code can be parallelized, f 0
    and the speedup 1 (no speedup). If all of the
    code is parallelized, f 1 and the speedup is
    infinite (in theory).

31
Amdahls Law
Introducing the number of processors performing
the parallel fraction of work, the relationship
can be modeled by the equation where P
parallel fraction S serial fraction N
number of processors

32
Amdahls Law
  • When N ? 8, Speedup 1/S
  • Interpretation
  • No matter how many processors are used, the upper
    bound for the speed up is determined by the
    sequential section.

33
Amdahls Law Example
  • If the sequential section of a program amounts 5
    of the run time, then S 0.05 and hence

34
Behind Amdahls Law
  1. How much faster can a given problem be solved?
  2. Which problem size can be solved on a parallel
    machine in the same time as on a sequential one?
    (Scalability)

35
Starting Parallelization
36
Parallelization Option 1
  • Starting from an existing, sequential program
  • Easy on shared memory architectures (OpenMP)
  • Potentially adequate for small number of
    processes (moderate speed-up)
  • Does not scale to large number of processes
  • Restricted to trivially parallel problems on
    distributed memory machines

37
Parallelization Option 2
  • Starting from scratch
  • Not popular, but often inevitable
  • Needs new program design
  • Increase complexity (data distribution)
  • Widely applicable
  • Often the best choice for large scale problems

38
Goals for Parallelization
  • Avoid or reduce
  • synchronization
  • communication
  • Try to maximize computational intensive sections.

39
Simple Example
40
Summation
  • Given an N-dimensional vector of type integer.
  • // Initialization //
  • for (int i 0 iltlen i)
  • veci ii
  • // Sum Calculation //
  • for (int i 0 iltlen i)
  • sum veci

41
Parallel Algorithm
  1. Divide the vector in certain parts
  2. In each CPU, initialize their own parts
  3. Use global reduction to calculate the sum of the
    vector

42
OpenMP
  • Compiler directives (pragma omp) are inserted to
    tell the compiler to perform parallelization.
  • The compiler would be responsible for
    automatically parallelizing certain types of
    loops.
  • pragma omp parallel for
  • for (int i1 iltlen i)
  • veci ii
  • pragma omp parallel for reduction( sum)
  • for (int i0 iltlen i)
  • sum veci

43
MPI
vec
  • // in each process, do the initialization
  • for(int irank iltlen inp)
  • veci ii
  • // calculate the local sum
  • for(int irank iltlen inp)
  • localsum veci
  • // perform global reduction
  • MPI_Reduce(localsum, sum, 1, MPI_INT, MPI_SUM,
    0, MPI_COMM_WORLD)

44
END
Write a Comment
User Comments (0)
About PowerShow.com