Parallel Computing - PowerPoint PPT Presentation


PPT – Parallel Computing PowerPoint presentation | free to view - id: 61af51-Zjc3O


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Parallel Computing


Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti Background Amdahl's law and Gustafson's law ... – PowerPoint PPT presentation

Number of Views:107
Avg rating:3.0/5.0
Slides: 24
Provided by: ali113


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Parallel Computing

Parallel Computing

CS 147 Computer Architecture Instructor
Professor Sin-Min Lee Spring 2011By Alice Cotti
  • Amdahl's law and Gustafson's law
  • Dependencies
  • Race conditions, mutual exclusion,
    synchronization, and parallel slowdown
  • Fine-grained, coarse-grained, and embarrassing

Amdahl's Law
Amdahl's Law
  • The speed-up of a program from parallelization is
    limited by how much of the program can be

  • Consider the following functions, which
    demonstrate several kinds of dependencies
  • 1 function Dep(a, b)2 c ab3 d
    2c4 end function
  • Operation 3 in Dep(a, b) cannot be executed
    before (or even in parallel with) operation 2,
    because operation 3 uses a result from operation
    2. It violates condition 1, and thus introduces a
    flow dependency.

  • Consider the following functions
  • 1 function NoDep(a, b)2 c ab3
    d 2b4 e ab5 end function
  • In this example, there are no dependencies
    between the instructions, so they can all be run
    in parallel.

Race condition
  • A flaw whereby the output or result of the
    process is unexpectedly and critically dependent
    on the sequence or timing of other events.
  • Can occur in electronics systems, logic circuits,
    and multithreaded software.

Race condition in a logic circuit. Here, ?t1 and
?t2 represent the propagation delays of The
logic elements. When the input value (A)
changes, the circuit outputs a short spike of
duration ?t1.
Fine-grained, coarse-grained, and embarrassing
  • Applications are often classified according to
    how often their subtasks need to synchronize or
    communicate with each other.
  • Fine-grained parallelism subtasks must
    communicate many times per second
  • Coarse-grained parallelism they do not
    communicate many times per second
  • Embarrassingly parallel rarely or never have to
    communicate. Embarrassingly parallel applications
    are the easiest to parallelize

Types of parallelism
  • Data parallelism
  • Task parallelism
  • Bit-level parallelism
  • Instruction-level parallelism

A five-stage pipelined superscalar processor,
capable of issuing two instructions per
cycle. It can have two instructions in each stage
of the pipeline, for a total of up to 10
instructions (shown in green) being
simultaneously executed.
  • Memory and communication
  • Classes of parallel computers
  • Multicore computing
  • Symmetric multiprocessing
  • Distributed computing

Multicore Computing
  • PROS
  • better than dual core
  • won't use the same bandwidth and bus
  • therefore be even faster.
  • CONS
  • heat dissipation problems
  • more expensive

  • Parallel programming languages
  • Automatic parallelization
  • Application checkpointing

Parallel programming languages
  • Concurrent programming languages, libraries,
    APIs, and parallel programming models (such as
    Algorithmic Skeletons) have been created for
    programming parallel computers.
  • Shared memory
  • Distributed memory
  • Shared distributed memory

Automatic parallelization
  • Automatic parallelization of a sequential program
    by a compiler is the holy grail of parallel
    computing. Despite decades of work by compiler
    researchers, has had only limited success.
  • Mainstream parallel programming languages remain
    either explicitly parallel or (at best) partially
    implicit, in which a programmer gives the
    compiler directives for parallelization.
  • A few fully implicit parallel programming
    languages existSISAL, Parallel Haskell, and (for
    FPGAs) Mitrion-C.

Application checkpointing
  • The larger and more complex a computer is, the
    more that can go wrong and the shorter the mean
    time between failures.
  • Application checkpointing is a technique whereby
    the computer system takes a "snapshot" of the
    application. This information can be used to
    restore the program if the computer should fail.

Algorithmic methods
  • Parallel computing is used in a wide range of
    fields, from bioinformatics to economics. Common
    types of problems found in parallel computing
    applications are
  • Dense linear algebra
  • Sparse linear algebra
  • Dynamic programming
  • Finite-state machine simulation

  • The parallel architectures of supercomputers
    often dictate the use of special programming
    techniques to exploit their speed.
  • The base language of supercomputer code is, in
    general, Fortran or C, using special libraries to
    share data between nodes.
  • The new massively parallel GPGPUs have hundreds
    of processor cores and are programmed using
    programming models such as CUDA and OpenCL.

Classes of parallel computers
  • Parallel computers can be roughly classified
    according to the level at which the hardware
    supports parallelism.
  • Multicore computing
  • Symmetric multiprocessing
  • Distributed computing
  • Specialized parallel computers

Multicore computing
  • Includes multiple execution units ("cores") on
    the same chip.
  • Can issue multiple instructions per cycle from
    multiple instruction streams. Each core in a
    multicore processor can potentially be
  • Simultaneous multithreading has only one
    execution unit, but when that unit is idling
    (such as during a cache miss), it process a
    second thread. IBM's Cell microprocessor, for use
    in the Sony PlayStation 3 is multithreading.

Symmetric multiprocessing
  • A computer system with multiple identical
    processors that share memory and connect via a
  • Bus contention prevents bus architectures from
    scaling. As a result, SMPs generally do not
    comprise more than 32 processors.
  • Small size of the processors and the significant
    reduction in the requirements for bus bandwidth
    achieved by large caches, such symmetric
    multiprocessors are extremely cost-effective.

Distributed computing
  • A distributed memory computer system in which the
    processing elements are connected by a network.
  • Highly scalable.

(a)(b) A distributed system. (c) A parallel
Specialized parallel computers
  • Within parallel computing, there are specialized
    parallel devices that tend to be applicable to
    only a few classes of parallel problems.
  • Reconfigurable computing
  • General-purpose computing on graphics processing
  • Application-specific integrated circuits
  • Vector processors

  • References