Parallel Computing - PowerPoint PPT Presentation

PPT – Parallel Computing PowerPoint presentation | free to view - id: 61af51-Zjc3O

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

Parallel Computing

Description:

Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti Background Amdahl's law and Gustafson's law ... – PowerPoint PPT presentation

Number of Views:107
Avg rating:3.0/5.0
Slides: 24
Provided by: ali113
Category:
Tags:
Transcript and Presenter's Notes

Title: Parallel Computing

1
Parallel Computing

CS 147 Computer Architecture Instructor
Professor Sin-Min Lee Spring 2011By Alice Cotti
2
Background
• Amdahl's law and Gustafson's law
• Dependencies
• Race conditions, mutual exclusion,
synchronization, and parallel slowdown
• Fine-grained, coarse-grained, and embarrassing
parallelism

3
Amdahl's Law
Amdahl's Law
• The speed-up of a program from parallelization is
limited by how much of the program can be
parallelized.

4
Dependencies
• Consider the following functions, which
demonstrate several kinds of dependencies
• 1 function Dep(a, b)2 c ab3 d
2c4 end function
• Operation 3 in Dep(a, b) cannot be executed
before (or even in parallel with) operation 2,
because operation 3 uses a result from operation
2. It violates condition 1, and thus introduces a
flow dependency.

5
Dependencies
• Consider the following functions
• 1 function NoDep(a, b)2 c ab3
d 2b4 e ab5 end function
• In this example, there are no dependencies
between the instructions, so they can all be run
in parallel.

6
Race condition
• A flaw whereby the output or result of the
process is unexpectedly and critically dependent
on the sequence or timing of other events.
• Can occur in electronics systems, logic circuits,

Race condition in a logic circuit. Here, ?t1 and
?t2 represent the propagation delays of The
logic elements. When the input value (A)
changes, the circuit outputs a short spike of
duration ?t1.
7
Fine-grained, coarse-grained, and embarrassing
parallelism
• Applications are often classified according to
how often their subtasks need to synchronize or
communicate with each other.
communicate many times per second
• Coarse-grained parallelism they do not
communicate many times per second
• Embarrassingly parallel rarely or never have to
communicate. Embarrassingly parallel applications
are the easiest to parallelize

8
Types of parallelism
• Data parallelism
• Bit-level parallelism
• Instruction-level parallelism

A five-stage pipelined superscalar processor,
capable of issuing two instructions per
cycle. It can have two instructions in each stage
of the pipeline, for a total of up to 10
instructions (shown in green) being
simultaneously executed.
9
Hardware
• Memory and communication
• Classes of parallel computers
• Multicore computing
• Symmetric multiprocessing
• Distributed computing

10
Multicore Computing
• PROS
• better than dual core
• won't use the same bandwidth and bus
• therefore be even faster.
• CONS
• heat dissipation problems
• more expensive

11
Software
• Parallel programming languages
• Automatic parallelization
• Application checkpointing

12
Parallel programming languages
• Concurrent programming languages, libraries,
APIs, and parallel programming models (such as
Algorithmic Skeletons) have been created for
programming parallel computers.
• Shared memory
• Distributed memory
• Shared distributed memory

13
Automatic parallelization
• Automatic parallelization of a sequential program
by a compiler is the holy grail of parallel
computing. Despite decades of work by compiler
researchers, has had only limited success.
• Mainstream parallel programming languages remain
either explicitly parallel or (at best) partially
implicit, in which a programmer gives the
compiler directives for parallelization.
• A few fully implicit parallel programming
languages existSISAL, Parallel Haskell, and (for
FPGAs) Mitrion-C.

14
Application checkpointing
• The larger and more complex a computer is, the
more that can go wrong and the shorter the mean
time between failures.
• Application checkpointing is a technique whereby
the computer system takes a "snapshot" of the
application. This information can be used to
restore the program if the computer should fail.

15
Algorithmic methods
• Parallel computing is used in a wide range of
fields, from bioinformatics to economics. Common
types of problems found in parallel computing
applications are
• Dense linear algebra
• Sparse linear algebra
• Dynamic programming
• Finite-state machine simulation

16
Programming
• The parallel architectures of supercomputers
often dictate the use of special programming
techniques to exploit their speed.
• The base language of supercomputer code is, in
general, Fortran or C, using special libraries to
share data between nodes.
• The new massively parallel GPGPUs have hundreds
of processor cores and are programmed using
programming models such as CUDA and OpenCL.

17
Classes of parallel computers
• Parallel computers can be roughly classified
according to the level at which the hardware
supports parallelism.
• Multicore computing
• Symmetric multiprocessing
• Distributed computing
• Specialized parallel computers

18
Multicore computing
• Includes multiple execution units ("cores") on
the same chip.
• Can issue multiple instructions per cycle from
multiple instruction streams. Each core in a
multicore processor can potentially be
superscalar.
• Simultaneous multithreading has only one
execution unit, but when that unit is idling
(such as during a cache miss), it process a
second thread. IBM's Cell microprocessor, for use
in the Sony PlayStation 3 is multithreading.

19
Symmetric multiprocessing
• A computer system with multiple identical
processors that share memory and connect via a
bus.
• Bus contention prevents bus architectures from
scaling. As a result, SMPs generally do not
comprise more than 32 processors.
• Small size of the processors and the significant
reduction in the requirements for bus bandwidth
achieved by large caches, such symmetric
multiprocessors are extremely cost-effective.

20
Distributed computing
• A distributed memory computer system in which the
processing elements are connected by a network.
• Highly scalable.

(a)(b) A distributed system. (c) A parallel
system.
21
Specialized parallel computers
• Within parallel computing, there are specialized
parallel devices that tend to be applicable to
only a few classes of parallel problems.
• Reconfigurable computing
• General-purpose computing on graphics processing
units
• Application-specific integrated circuits
• Vector processors

22
Questions?
23
• References
• Wikipedia.org