PPT – 7a.1 PowerPoint presentation | free to view

About This Presentation

Title:

7a.1

Description:

Computational Grids – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 62

Provided by: BarryWi6

Category:

more less

Transcript and Presenter's Notes

Title: 7a.1

1
Computational Grids
2
Computational Problems

Problems that have lots of computations and
usually lots of data.

3
Demand for Computational Speed

Continual demand for greater computational speed
from a computer system than is currently possible
Areas requiring great computational speed include
numerical modeling and simulation of scientific
and engineering problems.
Computations must be completed within a
reasonable time period.

4
Grand Challenge Problems

One that cannot be solved in a reasonable amount
of time with todays computers. Obviously, an
execution time of 10 years is always
unreasonable.
Examples
Modeling large DNA structures
Global weather forecasting
Modeling motion of astronomical bodies.

5
Weather Forecasting

Atmosphere modeled by dividing it into
3-dimensional cells.
Calculations of each cell repeated many times to
model passage of time.

6
Global Weather Forecasting Example

Suppose global atmosphere divided into cells of
size 1 mile ? 1 mile ? 1 mile to a height of 10
miles - about 5 ? 108 cells.
Suppose each calculation requires 200 floating
point operations. In one time step, 1011 floating
point operations necessary.
To forecast weather over 7 day period using
1-minute intervals, a computer operating at
1Gflops (109 floating point operations/s) takes
106 seconds or over 10 days.

7
Modeling Motion of Astronomical Bodies

Each body attracted to each other body by
gravitational forces. Movement of each body
predicted by calculating total force on each
body.
With N bodies, N - 1 forces to calculate for each
body, or N2 calculations.
(N log2 N for an efficient approximate
algorithm.)
After determining new positions of bodies,
calculations repeated.

A galaxy might have, say, 1011 stars.
Even if each calculation done in 1 ms (extremely
optimistic figure), it takes almost a year for
one iteration using the N log2 N algorithm.
100 years for 100 iterations. Typically require
millions of iterations.

Astrophysical N-body simulation by Scott Linssen
(undergraduate UNC-Charlotte student).

10
High Performance Computing (HPC)

Traditionally, achieved by using the multiple
computers together - parallel computing.
Simple idea! -- Using multiple computers (or
processors) simultaneously should be able can
solve the problem faster than a single computer.

11
Using multiple computers or processors

Key concept - dividing problem into parts that
can be computed simultaneously.
Parallel programming - programming a computing
platform consisting of more than one processor or
computer.
Concept very old (50 years).

12
High Performance Computing

Long History
Multiprocessor system of various types (1950s
onwards)
Supercomputers (1960s-80s)
Cluster computing (1990s)
Grid computing (2000s) ??

Maybe, but lets first look at how to achieve HPC.
13
Speedup Factor

ts is execution time on a single processor
tp is execution time on a multiprocessor.
S(p) gives increase in speed by using
multiprocessor.
Best sequential algorithm for single processor.
Parallel algorithm usually different.

14
Maximum Speedup

Maximum speedup is usually p with p processors
(linear speedup).
Possible to get superlinear speedup (greater than
p) but usually a specific reason such as
Extra memory in multiprocessor system
Non-deterministic algorithm

15
Maximum Speedup Amdahls law
16

Speedup factor is given by
This equation is known as Amdahls law

17
Speedup against number of processors

Even with infinite number of processors, max.
speedup limited to 1/f .
Example With only 5 of computation being
serial, max. speedup 20, irrespective of number
of processors.

18
Superlinear Speedup Example Searching

(a) Searching each sub-space sequentially

(b) Searching each sub-space in parallel

Question
What is the speed-up now?

Worst case for sequential search when solution
found in last sub-space search. Then parallel
version offers greatest benefit, i.e.

Least advantage for parallel version when
solution found in first sub-space search of the
sequential search, i.e.
Actual speed-up depends upon which subspace holds
solution but could be extremely large.

23
Types of Parallel Computers

Two principal types
1. Single computer containing multiple processors
- main memory is shared, hence called Shared
memory multiprocessor
2. Multiple computer system

24
Conventional Computer

Consists of a processor executing a program
stored in a (main) memory
Each main memory location located by its address
within a single memory space.

25
Shared Memory Multiprocessor

Extend single processor model - multiple
processors connected to multiple memory modules
Each processor can access any memory module

Examples
Dual Pentiums
Quad Pentiums

27
Programming Shared Memory Multiprocessors

Threads - programmer decomposes program into
parallel sequences (threads), each being able to
access variables declared outside threads.
Example Pthreads
Use sequential programming language with
preprocessor compiler directives, constructs, or
syntax to declare shared variables and specify
parallelism. Examples OpenMP (an industry
standard), UPC (Unified Parallel C) -- needs
compilers.

Parallel programming language with syntax to
express parallelism. Compiler creates executable
code -- not now common.
Use parallelizing compiler to convert regular
sequential language programs into parallel
executable code - also not now common.

29
Multiple ComputersMessage-passing multicomputer

Complete computers connected through and
interconnection network

30
Networked Computers as a Computing Platform

Became a very attractive alternative to expensive
supercomputers and parallel computer systems for
high-performance computing in 1990s.
Several early projects. Notable
Berkeley NOW (network of workstations)
project.
NASA Beowulf project.

31
Key Hardware Advantages

Very high performance workstations and PCs
readily available at low cost.
Latest processors can easily be incorporated into
the system as they become available.

32
Programming Clusters

Usually based upon explicit message-passing.
Common approach -- a set of user-level libraries
for message passing. Example
Parallel Virtual Machine (PVM) - late 1980s.
Became very popular in mid 1990s.
Message-Passing Interface (MPI) - standard
defined in 1990s and now dominant.

33
Beowulf Clusters

Name given to a group of interconnected
commodity computers designed to achieve high
performance with low cost.
Typically using commodity interconnects
(high-speed Ethernet).
Typically Linux OS.
Beowulf comes from name given by NASA Goddard
Space Flight Center cluster project.

34
Cluster Interconnects

Originally fast Ethernet on low cost clusters
Gigabit Ethernet - easy upgrade path
More Specialized/Higher Performance
Myrinet - 2.4 Gbits/sec - disadvantage single
vendor
Infiniband - may be important as Infiniband
interfaces may be integrated on next generation
PCs

35
Dedicated cluster with a master node
36
WCU Department of Mathematics and CS leo I
cluster(now dismandled)
Being replaced with Pentium IVs and Gigabit
Ethernet.
37
Message-Passing Programming using User-level
Message Passing Libraries

Two primary mechanisms needed
1. A method of creating separate processes for
execution on different computers
2. A method of sending and receiving messages

38
Multiple program, multiple data model(MPMD)
39
Single Program Multiple Data Model(SPMD)

Different processes merged into one program.
Control statements select different parts for
each processor to execute.
All executables started together - static process
creation

40
Single Program Multiple Data Model(SPMD)
41
Multiple Program Multiple Data Model(MPMD)

Separate programs for each processor.
One processor executes master process.
Other processes started from within master
process - dynamic process creation.

42
Multiple Program Multiple Data Model(MPMD)
43
Point-to-point send and receive routines
Passing a message between processes using send()
and recv() library calls
44
Synchronous Message Passing

Routines that return when message transfer
completed.
Synchronous send routine
Waits until complete message can be accepted by
the receiving process before sending the message.
Synchronous receive routine
Waits until the message it is expecting arrives.

45
Synchronous send() and recv() using 3-way protocol
46

Synchronous routines intrinsically perform two
actions
They transfer data and
They synchronize processes.

47
Asynchronous Message Passing

Do not wait for actions to complete before
returning.
More than one version depending upon semantics
for returning.
Usually require local storage for messages.
They do not synchronize processes and allow
processes to move forward sooner. Must be used
with care.

48
MPI Definitions of Blocking and Non-Blocking

Blocking - return after their local actions
complete, though message transfer may not have
been completed.
Non-blocking - return immediately.
Assumes data storage not modified by subsequent
statements prior to being used for transfer, and
it is left to the programmer to ensure this.

49
How message-passing routines return before
transfer completed
Message buffer needed between source and
destination to hold message
50
Asynchronous (blocking) routines changing to
synchronous routines

Buffers only of finite length and a point could
be reached when send routine held up because all
available buffer space exhausted.
Then, send routine will wait until storage
becomes re-available - i.e then routine behaves
as a synchronous routine.

51
Message Tag

Used to differentiate between different types of
messages being sent.
Message tag is carried within message.

52
Message Tag Example
To send a data, x, with message tag 5 from
process, 1, to destination process, 2, and
assign to y
53
Wild Card

If message tag matching not required, wild card
message tag used.
Then, recv() will match with any send().

54
Collective Message Passing Routines

Have routines that send message(s) to a group of
processes or receive message(s) from a group of
processes
Higher efficiency than separate point-to-point
routines although not absolutely necessary.

55
Broadcast
Sending same message to all processes concerned
with problem.
56
Scatter
Sending each element of an array in root process
to a separate process. Contents of ith location
of array sent to ith process.
57
Gather
Having one process collect individual values from
set of processes.
58
Reduce
Gather operation combined with arithmetic/logical
operation. Example Values gathered and added
together
59
Grid Computing

A grid is a form of multiple computer system.
For solving computational problems, it could be
viewed as the next step after cluster computing,
and the same programming techniques used.

Why is this not necessarily true?
60

VERY expensive, sending data across network costs
millions of cycles

Links unreliable

Bandwidth shared with other users

61
Computational Strategies

As a computing platform, a grid favors situations
with absolute minimum communication between
computers.
Next class will look at these strategies and
details of MPI programming.

Write a Comment

User Comments (0)

About PowerShow.com

7a.1 - PowerPoint PPT Presentation

7a.1

Computational Grids – PowerPoint PPT presentation