Tuesday, September 04, 2006 - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Tuesday, September 04, 2006

Description:

Tuesday, September 04, 2006. I hear and I forget, I see and ... Evolution of Parallel Systems. Course URL. http://suraj.lums.edu.pk/~cs524a06. Folder on indus ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 40
Provided by: Erud
Category:

less

Transcript and Presenter's Notes

Title: Tuesday, September 04, 2006


1
Tuesday, September 04, 2006
  • I hear and I forget,
  • I see and I remember,
  • I do and I understand.
  • -Chinese Proverb

2
Today
  • Course Overview.
  • Why Parallel Computing?
  • Evolution of Parallel Systems.

3
CS 524 High Performance Computing
  • Course URL
  • http//suraj.lums.edu.pk/cs524a06
  • Folder on indus
  • \\indus\Common\cs524a06
  • Website Check Regularly Course announcements,
    office hours, slides, resources, policies
  • Course Outline

4
  • Several programming exercises will be given
    throughout the course. Assignments will include
    popular programming models for shared memory and
    message passing such as OpenMP and MPI.
  • The development environment will be C/C on
    UNIX.

5
Pre-requisites
  • Computer Organization Assembly Language (CS
    223)
  • Data Structures Algorithms (CS 213)
  • Senior level standing.
  • Operating Systems?

6
  • Five minute rule.

7
Hunger For More Power!
8
Hunger For More Power!
  • Endless quest for more and more computing power.
  • However much computing power there is, it is
    never enough.

9
Why this need for greater computational power?
  • Science, engineering, businesses, entertainment
    etc., all are providing the impetus.
  • Scientists observe, theorize, test through
    experimentation.
  • Engineers design, test prototypes, build.

10
HPC offers a new way to do science
  • Computation used to approximate physical systems
    - Advantages include
  • Playing with simulation parameters to study
    emergent trends
  • Possible replay of a particular simulation event
  • Study systems where no exact theories exist

11
Why Turn to Simulation?
  • When the problem is too . . .
  • Complex
  • Large
  • Expensive
  • Dangerous

12
Why this need for greater computational power?
  • Less expensive to carry out computer simulations.
  • Able to simulate phenomenon that could not be
    studied by experimentation. e.g. evolution of
    universe.

13
Why this need for greater computational power?
  • Problems such as
  • Weather prediction.
  • Aeronautics (airflow analysis, structural
    mechanics, engine efficiency etc) .
  • Simulating world economy.
  • Pharmaceutical (molecular modeling).
  • Understanding drug receptor interactions in
    brain.
  • Automotive crash simulation.
  • are all computationally intensive.
  • The more knowledge we acquire the more complex
    our questions become.

14
Why this need for greater computational power?
  • In 1995, the first full length computer animated
    motion picture, Toy Story, was produced on a
    parallel system composed on hundreds of Sun
    workstations.
  • Decreased cost
  • Decreased Time (Several months on several hundred
    processors)

15
Why this need for greater computational power?
  • Commercial Computing has also come to rely on
    parallel architectures.
  • Computer system speed and capacity ? Scale of
    business.
  • OLTP (Online transaction processing) benchmark
    represent the relation between performance and
    scale of business.
  • Rate performance of system in terms of its
    throughput in transactions per minute.

16
Why this need for greater computational power?
  • Vendors supplying database hardware or software
    offer multiprocessor systems that provide
    performance substantially greater than
    uniprocessor products.

17
  • One solution in the past Make the clock run
    faster.
  • The advance of VLSI technology allowed clock
    rates to increase and larger number of components
    to fit on a chip.
  • However there are limits
  • Electrical signal cannot propagate faster than
    the speed of light 30cm/nsec in vacuum and
    20cm/nsec in copper wire or optical fiber.

18
  • Electrical signal cannot propagate faster than
    the speed of light 30cm/nsec in vacuum and
    20cm/nsec in copper wire or optical fiber.
  • 10-GHz clock - signal path length 2cm in total
  • 100-GHz clock - 2mm
  • 1 THZ (1000 GHz) computer will have to be smaller
    than 100 microns if the signal has to travel from
    one end to the other and back with a single clock
    cycle.

19
  • Another fundamental problem
  • Heat dissipation
  • The faster a computer runs more heat it
    generates
  • High end Pentium systems CPU cooling system
    bigger than the CPU itself.

20
Evolution of Parallel Architecture
  • New dimension added to design space Number of
    processors.
  • Driven by demand for performance at acceptable
    cost.

21
Evolution of Parallel Architecture
  • Advances in hardware capability enable new
    application functionality, which places a greater
    demand on the architecture.
  • This cycle drives the ongoing design, engineering
    and manufacturing effort.

22
Evolution of Parallel Architecture
  • Microprocessor performance has been improving at
    a rate of about 50 per year.
  • A parallel machine of hundred processors can be
    viewed as providing to applications computing
    power that will be available in 10 years time.
  • 1000 processors ? 20 year horizon
  • The advantages of using small, inexpensive, mass
    produced processors as building blocks for
    computer systems are clear.

23
Technology trends
  • With technological advance, transistors, gates
    etc have been getting smaller and faster.
  • More can fit in same area.
  • Processors are getting faster by making more
    effective use of ever larger volume of computing
    resources.
  • Possibilities
  • Place more computer system on chip including
    memory and I/O. (Building block for parallel
    architectures. System-on-a-chip)
  • Or multiple processors on chip. (Parallel
    architecture on single-chip regime)

24
Microprocessor Design Trends
  • Technology determines what is possible.
  • Architecture translates the potential of
    technology into performance.
  • Parallelism is fundamental to conventional
    computer architecture.
  • Current architectural trends are leading to
    multiprocessor designs.

25
Bit level Parallelism
  • From 1970 to 1986 advancements in bit-level
    parallelism
  • 4bit, 8 bit, 16 bit and so-on
  • Doubling the data path reduces the number of
    cycles required to perform an operation.

26
Instruction level Parallelism
  • Mid 1980s to mid 1990s
  • Performing portions of several machine
    instructions concurrently.
  • Pipelining (kind of parallelism also)
  • Fetch multiple instructions at a time and issue
    them in parallel to distinct function units in
    parallel (superscalar)

27
Instruction level Parallelism
  • However
  • Instruction level parallelism is worthwhile only
    if processor can be supplied with instructions
    and data fast enough.
  • Gap between processor cycle time and memory cycle
    time has grown wider.
  • To satisfy increasing bandwidth requirements,
    larger and larger caches are placed on chip with
    the processor.
  • cache miss
  • control transfer
  • Limits

28
  • In mid 1970s, the introduction of vector
    processors marked the beginning of modern
    supercomputing
  • Perform operations on sequences of data elements
    rather than individual scalar data
  • Offered advantage of at least one order of
    magnitude over conventional systems of that time.

29
  • In late 1980s a new generation of systems came on
    market. These were microprocessor based
    supercomputers that initially provided about 100
    processors and increased roughly to 1000 in 1990.
  • These aggregation of processors are known as
    massively parallel processors (MPPs).

30
  • Factors behind emergence of MPPs
  • Increase in performance of standard
    microprocessors
  • Cost advantage
  • Usage of off-the-shelf microprocessors instead
    of custom processors
  • Fostered by government programs for scalable
    parallel computing using distributed memory.

31
  • MPPs claimed to equal or surpass the performance
    of vector multiprocessors.
  • Top500
  • Lists the sites that have the 500 most powerful
    installed computer systems.
  • LINPACK benchmark
  • Most widely used metric of performance on
    numerical applications
  • Collection of Fortran subroutines that analyze
    and solve linear equations and linear least
    squares problems

32
  • Top500 (Updated twice a year since June 1993)
  • In the first Top500 list there were already 156
    MPP and SIMD systems present (around 1/3rd)

33
Some memory related issues
  • Time to access memory has not kept pace with CPU
    clock speeds.
  • SRAM
  • Each bit is stored in a latch made up of
    transistors
  • Faster than DRAM, but is less dense and requires
    greater power
  • DRAM
  • Each bit of memory is stored as a charge on a
    capacitor
  • 1GHz CPU will execute 60 instructions before a
    typical 60ns DRAM can return a single byte

34
Some memory related issues
  • Hierarchy
  • Cache memories
  • Temporal locality
  • Cache lines (64, 128, 256 bytes)

35
Parallel Architectures Memory Parallelism
  • One way to increase performance is to replicate
    computers.
  • Major choice is between shared memory and
    distributed memory

36
Memory Parallelism
  • In mid 1980s, when 32-bit microprocessor was
    first introduced, computers containing multiple
    microprocessors sharing a common memory became
    prevalent.
  • In most of these designs all processors plug into
    a common bus.
  • However, a small number of processors can be
    supported by bus

37
UMA bus based SMP architecture
  • If the bus is busy, when a CPU wants to read or
    write memory, the CPU waits for CPU to become
    idle.
  • Contention of bus can be manageable for small
    number of processors only.
  • The system will be totally limited by bandwidth
    of the bus and most of the CPUs will be idle most
    of the time.

38
UMA bus based SMP architecture
  • One way to alleviate this problem is to add a
    cache to each CPU.
  • Less bus traffic if most reads can be satisfied
    from the cache and system can support more CPUs.
  • Single bus limits UMA microprocessor to about
    16-32 CPUs.

39
SMP
  • SMP (Symmetric multiprocessor)
  • Shared memory multiprocessor where the cost of
    accessing a memory location is same for all
    processors.
Write a Comment
User Comments (0)
About PowerShow.com