Lecture 1: Introduction to High Performance Computing - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Lecture 1: Introduction to High Performance Computing

Description:

Title: CSE 574 Parallel Processing Author: ICS Faculty User Last modified by: Esin Onbasioglu Created Date: 7/12/2005 12:19:29 PM Document presentation format – PowerPoint PPT presentation

Number of Views:195
Avg rating:3.0/5.0
Slides: 27
Provided by: ICSFacu
Category:

less

Transcript and Presenter's Notes

Title: Lecture 1: Introduction to High Performance Computing


1
Lecture 1Introduction to High Performance
Computing
2
Grand challenge problem
  • A grand challenge problem is one that cannot be
    solved in a reasonable amount of time with
    todays computers.

3
Weather Forecasting
  • Cells of size 1 mile x 1 mile x 1 mile
  • gt Whole global atmosphere about 5 x 108 cells
  • If each calculation requires 200 Flops
  • gt 1011 Flops, in one time step
  • To forecast the weather over 10 days using
    10-minute intervals, with a computer operating at
    100 Mflops (108 Flops/s)
  • gt would take 107 seconds or over 100 days.
  • To perform the calculation in 10 minutes would
    require a computer operating at 1.7 Tflops (1.7 x
    1012 Flops/s).

4
Some Grand Challenge Applications
  • Science
  • Global climate modeling
  • Astrophysical modeling
  • Biology genomics protein folding drug design
  • Computational Chemistry
  • Computational Material Sciences and
    Nanosciences
  • Engineering
  • Crash simulation
  • Semiconductor design
  • Earthquake and structural modeling
  • Computation fluid dynamics (airplane design)
  • Combustion (engine design)
  • Business
  • Financial and economic modeling
  • Transaction processing, web services and search
    engines
  • Defense
  • Nuclear weapons -- test by simulations
  • Cryptography

5
Units of High Performance Computing
  • Speed
  • 1 Mflop/s 1 Megaflop/s 106 Flop/second
  • 1 Gflop/s 1 Gigaflop/s 109 Flop/second
  • 1 Tflop/s 1 Teraflop/s 1012 Flop/second
  • 1 Pflop/s 1 Petaflop/s 1015 Flop/second
  • Capacity
  • 1 MB 1 Megabyte 106 Bytes
  • 1 GB 1 Gigabyte 109 Bytes
  • 1 TB 1 Terabyte 1012 Bytes
  • 1 PB 1 Petabyte 1015 Bytes

6
Moores Law
  • Gordon Moore (co-founder of Intel) predicted in
    1965 that the transistor density of semiconductor
    chips would double roughly every 18 months.

7
Moores Law holds also for performance and
capacity
1945 2002
Computer ENIAC Laptop
Number of vacuum tubes / transistors 18 000 6 000 000 000
Weight (kg) 27 200 0.9
Size (m3) 68 0.0028
Power (watts) 20 000 60
Cost () 4 630 000 1 000
Memory (bytes) 200 1 073 741 824
Performance (Flops/s) 800 5 000 000 000
8
Peak Performance
  • A contemporary RISC processor delivers 10 of its
    peak performance
  • Two primary reasons behind this low efficiency
  • IPC inefficiency
  • Memory inefficiency

9
Instructions per cycle (IPC) inefficiency
  • Today the theoretical IPC is 4-6
  • Detailed analysis for a spectrum of applications
    indicates that the average IPC is 1.21.4
  • 75 of the performance is not used

10
Reasons for IPC inefficiency
  • Latency
  • Waiting for access to memory or other parts of
    the system
  • Overhead
  • Extra work that has to be done to manage program
    concurrency and parallel resources the real work
    you want to perform
  • Starvation
  • Not enough work to do due to insufficient
    parallelism or poor load balancing among
    distributed resources
  • Contention
  • Delays due to fighting over what task gets to use
    a shared resource next. Network bandwidth is a
    major constraint

11
Memory Hierarchy
12
Processor-Memory Problem
  • Processors issue instructions roughly every
    nanosecond
  • DRAM can be accessed roughly every 100
    nanoseconds
  • The gap is growing
  • processors getting faster by 60 per year
  • DRAM getting faster by 7 per year

13
Processor-Memory Problem
14
How fast can a serial computer be?
  • Consider the 1 Tflop sequential machine
  • data must travel distance, r, to get from memory
    to CPU
  • to get 1 data element per cycle, this means 1012
    times per second at the speed of light, c 3x108
    m/s
  • so r lt c / 1012 0.3 mm
  • For 1 TB of storage in a 0.3 mm2 area
  • each word occupies about 3 Angstroms2, the size
    of a small atom

15
  • So, we need Parallel Computing!

16
High Performance Computers
  • In 1980s
  • 1x106 Floating Point Ops/sec (Mflop/s)
  • Scalar based
  • In 1990s
  • 1x109 Floating Point Ops/sec (Gflop/s)
  • Vector Shared memory computing
  • Today
  • 1x1012 Floating Point Ops/sec (Tflop/s)
  • Highly parallel, distributed processing, message
    passing

17
What is a Supercomputer?
  • A supercomputer is a hardware and software
    system that provides close to the maximum
    performance that can currently be achieved

18
Top500 Computers
  • Over the last 10 years the range for the
    Top500 has increased greater than Moores law
  • 1993
  • 1 59.7 GFlop/s
  • 500 422 MFlop/s
  • 2004
  • 1 70 TFlop/s
  • 500 850 GFlop/s

19
Top500 List at June 2005
Manuf. Computer Instal. Site Cntry Year Rmax (Tflop/s) proc
1 IBM BlueGene/L LLNL USA 2005 136.8 65536
2 IBM BlueGene/L IBM Watson Res. Center USA 2005 91.3 40960
3 SGI Altix NASA USA 2004 51.9 10160
4 NEC Vector Earth Simulator Center Japan 2002 35.9 5120
5 IBM Cluster Barcelona Supercomp. C. Spain 2005 27.9 4800
20
Performance Development
21
Increasing CPU Performance
  • Manycore Chip
  • Composed of hybrid cores
  • Some general purpose
  • Some graphics
  • Some floating point

22
What is Next?
  • Board composed of multiple manycore chips sharing
    memory
  • Rack composed of multiple boards
  • A room full of these racks
  • ?Millions of cores
  • ?Exascale systems (1018 Flop/s)

23
Moores Law Reinterpreted
  • Number of cores per chip doubles every 2 year,
    while clock speed decreases (not increases).
  • Need to deal with systems with millions of
    concurrent threads
  • Number of threads of execution doubles every 2
    year

24
Performance Projection
25
Directions
  • Move toward shared memory
  • SMPs and Distributed Shared Memory
  • Shared address space with deep memory hierarchy
  • Clustering of shared memory machines for
    scalability
  • Efficiency of message passing and data parallel
    programming
  • MPI and HPF

26
Future of HPC
  • Yesterday's HPC is today's mainframe is
    tomorrow's workstation
Write a Comment
User Comments (0)
About PowerShow.com