Introduction to Parallel Processing presentation

About This Presentation

Transcript and Presenter's Notes

Title: Introduction to Parallel Processing

1
???? ?????? ?????? ??' ??? ??-???

???? ????? ??' 1
????? ???? ?', 22/10/2001

2
Introduction to Parallel Processing

Course Number 36113621
??? ?????
http//www.bgu.ac.il/tel-zur/pp.html

3
21.10.01 ?' ???? ???"? ?? ???????? ??????
????? ??? ??' ???? ?????? ?????? ?????
???? ???? ?????? ?????? ( 36113621) ?. ?. ???
????? ???????? ??? ????????, ??? ?????? ???? ??
????????? ????? ???? ?????? ??????. ??? ?????
?? ??????? ?? ?? ??????? ?????? ??? ?????? ??????
???. ????? ????? ?? ????? ???????? ??????
?????? ????? . ?????,
??' ???? ??????
4
???? ??? ???, 800-1100

????? ????? 28, ??? 301

5
???? ??' ??? ??-???

??"?
tel-zur_at_ee.bgu.ac.il

6
?????? ??' ????? ????

???' ?????? ???? ???????
npanov_at_ee.bgu.ac.il

7
????? ???

???? ????
??? ??-??? ???? ?? ??? ??????, ??? 1100 ?-
1200. ????? ??? ????.
????? ???? ??? ?', ??? 1400 ?- 1800. ??? 318
?????? ????? ???? ???????.
Email pp_at_ee.bgu.ac.il
Newsgroup pp_news_at_ee.bgu.ac.il

8
Course Objectives

The goal of this course is to provide in-depth
understanding of modern parallel processing. The
course will cover theoretical and practical
aspects of parallel processing.

9
Task 1

Please send an email containing the following
data
Your first and last name
Your Email at BGU
Phone Number
Year
Course of Study
to pp_at_ee.bgu.ac.il
PLEASE WRITE EMAILS ONLY IN ENGLISH

10
???? ?????

????
??????? ??????
?????????
???????
?????? ?????

11
????? ?' ???"?

???? ?????? ???????? 21.10.01
????? ????? ??? ?' 16.12.01
???? ?????? ?????? ?????? 25.1.02
???? ???? ??????? ??? 18.2.02
???? ?????? ?????? ???? 24.2.02

12
????? ?????? ?? ????? 1/3????? ???????!
13
????? ?????? ?? ????? 2/3
14
????? ?????? ?? ????? 3/3
15
????? ??????? 1/2

????? ??' 1. ??? ?' ????? ???. ???? ????????
????-??? (????? ??' 4) ??? ???? ?? ???? ??????.
????? ??' 2. ???? ?????? ??' 4. ?? ?????? ??????
??' 6. 15 ?????? ?????.

16
????? ??????? 2/2

?????? ???? 5 ?????? ????? ?????????
???? ?????? ?????? ?????? ???? 7. ????? 20
?????? ?????.
????? ??' 3 ???? ?????? 8. ?? ?????? ?????? 10.
????? 15 ?????? ?????
?????? ??? ???? ?? ?- 18.2.02 ?????? ?- 50
?????? ?????.

17
?????? ???

????? ????? ???? C ?? FORTRAN
???? ??? ?? ?????? ????? ????????? ???????
????? ????? ??? ?? ????? ??????, ??? ??????
??-??? ???????????

18
??????

?? ????? ?? ???????? ??????? ????? ????????? ??
?????
?? ???????? ????? ??? ???? (????????)
?? ???? ?????? ???? ?????? ???? ???? ?????? ?????
WORD
?? ?????? ????? ?? ?? ???? ???? ???????
?? ????? ???? ?????? ?????? ?????? ???? ?? ?????

19
?????? - ????

?? ????? ?????? ??????
????? ????? ????? ????? ?????? ??"? ?????? ??

?????? ??? n ????!!!
?? ?????? ?? ?? ???? ????? ????? ??????
???????????
20
References

??? ????? ??? ????? ??? ????? ???? ????
???? ????? ????? ??? ??????
???? ????? ?????? ??????? ?????? ?????? ??? ?????
???? ???? ?? ????????
?? ???? ???? ?????? ?????? ??? ???????!!!

21
Parallel Computer Architecture
David E. Culler et al
22
Introduction to Parallel Computing
Vipin Kumar et al
23
Using MPI
William Gropp et al
24
Parallel Programming With MPI
Peter Pacheco
25
Parallel Programming
Barry Wilkinson Michael Allen
26
????? ?????? ???????

???? ?"???? ?????? ??????"
???? ??? ?? ????? ??????? ???? ????? ??????

27
???????
28
??? ?????, ????? ???????

Parallel Computing
Parallel Processing
Cluster Computing
Beowulf Clusters
HPC High Performance Computing

29
Oxford Dictionary of Science

A technique that allows more than one process
stream of activity to be running at any given
moment in a computer system, hence processes can
be executed in parallel. This means that two or
more processors are active among a group of
processes at any instant.

30
??? ???? ?????? ??? ????? ????-???
31
A Supercomputer

An extremely high power computer that has a large
amount of main memory and very fast processors
Often the processors run in parallel.

http//www.netlib.org/benchmark/top500/top500.list
.html
32
Why Study Parallel Architecture?

Parallelism
Provides alternative to faster clock for
performance
Applies at all levels of system design (H/W S/W
Integration)
Is a fascinating topic
Is increasingly central in information
processing, science and engineering

33
The Demand for Computational Speed

Continual demand for greater computational speed
from a computer system than is currently
possible.Areas requiring great computational
speed include numerical modeling and simulation
of scientific and engineering problems.
Computations must be completed within a
reasonable time period.

34
Large Memory Requirements

Use parallel computing for executing larger
problems which require more memory than exists on
a single computer.

35
Grand Challenge Problems

A grand challenge problem is one that cannot be
solved in a reasonable amount of time with
todays computers.Obviously, an execution time of
10 years is always unreasonable. Examples
Modeling large DNA structures,global weather
forecasting, modeling motion of astronomical
bodies.

36
Scientific Computing Demand
37
?????

???? ???????? ?? 1011 ??????. ???? ?? ???? ?????
?????? 100 ???????? ?? ???? ????? ?? O(N2) ?????
??? ??-????? ?? 1GFLOPS?

38
?????

???? 1011 ?????? ????? 1022 ???????????.
??"? ?????? ???? 100 ???????? 1024
??? ??? ?????? ????

39
????? - ????

????? ??-?? N log(N)

????? ????? ????????? ???? ??? ???? ???? ??????
??????!
40
Technology Trends
41
Clock Frequency Growth Rate
42
?????? ??? ??? ??? ?? ?? ????!

?? ?? ???? ????? ???????
?????? ????? ???? ??? ??
?????? ??????
??? ?????? ??? ?????????? ????? (?????????
??????)
????

43
Parallel Architecture Considerations

Resource Allocation
how large a collection?
how powerful are the elements?
how much memory?
Data access, Communication and Synchronization
how do the elements cooperate and communicate?
how are data transmitted between processors?
what are the abstractions and primitives for
cooperation?
Performance and Scalability
how does it all translate into performance?
how does it scale?

44
Conventional Computer
45
Shared Memory System
46
Message-Passing Multi-computer
47
?????

?? ???? ?? ????? ?????? ??????? ????? ??????
?? ??? ?????? ??? ????? ??? ???? ?? ???? ???
??? ????? ???????/??????? ??? ??????? ?? ????
?????? ?????? Message Passing
??? ??????? (?????? ?? ????? ?????)

48
Distributed Shared Memory
49
Flynn (1966) Taxonomy

SISD - a single instruction stream-single data
stream computer.
SIMD - a single instruction stream-multiple data
stream computer.
MIMD - a multiple instruction stream-multiple
data stream computer.

50
Multiple Program Multiple Data (MPMD)
51
Single Program Multiple Data (SPMD)

A Single source program
Each processor will execute its personal copy of
this program
Independently and not in synchronism

52
Message-Passing Multi-computers
53
???? ??????? ????? ??????? ????? ?????? ??????!

?????? ????? ????? ??????? ???????? ?? ??? ???????

54
Network Criteria 1/6

Bandwidth
Network Latency
Communication Latency (H/WS/W)
Message Latency (see next slide)

55
Network Criteria 2/6

Bandwidth is the inverse of the slope of the line
time latency (1/rate) size_of_message
Latency is sometimes described as time to send a
message of zero bytes. This is true only for
the simple model. The number quoted is sometimes
misleading.

56
Network Criteria 3/6

Bisection Width - links to be cut in order to
divide the network into two equal parts

2
57
Network Criteria 4/6

Diameter The max. distance between any two nodes

P/2
58
Network Criteria 5/6

Connectivity Multiplicity of paths between any
two nodes

2
59
Network Criteria 6/6

Cost Number of links

P
60
????? ??? ?? ?????? ??? ???? P ?????? ????
Fully Connected
61
?????
Diameter 1 Bisectionp2/4 Connectivityp-1 Cost
p(p-1)/2
62
????? ???? ?-Bisection - ????

Number of links p(p-1)/2
Internal links in each half (p/2)(p/2-1)/2
Internal links in both halves (p/2)(p/2-1)
Number of links being cut
p(p-1)/2 (p/2)(p/2-1) p2/4

63
2D Mesh
64
Example Intel Paragon
65
A Binary Tree 1/2
66
A Binary Tree 2/2
Fat tree Thinking Machine CM5, 1993
67
3D Hypercube Network
68
4D Hypercube Network
69
Embedding 1/2
70
Embedding 2/2
71
Deadlock
72
Ethernet
73
Ethernet Frame Format
74
Point-to-Point Communication
75
Performance

Computation/Communication ratio
Speedup Factor
Overhead
Efficiency
Cost
Scalability
Gustafsons Law

76
Computation/Communication Ratio
77
Speedup Factor
The maximum speedup is n (linear speadup)
78
Speedup and Comp/Comm Ratio
79
Overhead

Things that limit the speedup
Serial parts of the computation
Some processors compute while others are idle
Communication time for sending messages
Extra computation in the parallel version not
appearing in the serial version

80
Amdahls Law (1967)
81
Amdahls Law - continue
With only 5 of the computation being serial,
the maximum speedup is 20
82
Speedup
83
Efficiency
E is the fraction of time that the processors are
being used. If E100 then S(n)n.
84
Cost
Cost-optimal algorithm is when the cost is
proportional to the single processor cost ( i.e.
execution time)
85
Scalability

An imprecise term
Reflects H/W and S/W scalability
How to get increased performance when the H/W
increased?
What H/W is needed when problem size (e.g.
cells) is increased?
Problem dependent!

86
Gustafsons Law (1988) 1/3
Gives an argument against the pessimistic
Amdahls Law conclusion. Rather than assume that
the problem size is fixed, we should assume that
the parallel execution time is fixed. Define a
Scaled Speedup for the case of increaseing the
number of processors as well as the problem size
87
Gustafsons Law 2/3
88
Gustafsons Law 3/3
An Example Assume we have n20 and a serial
fraction of s0.05 S(scaled)0.050.952019.05,
while the Speedup according to Amdahls Law
is S20/(0.05(20-1)1)10.26
89
?????

???? ?????? ???? 10 ??????, ??"? ??? ????? ??
200MFLOPS. ??? ?????? ????? ??????? ?? MFLOPS
???? 10 ????? ??? ???? ?- 90 ????? ??? ???????

90
?????

???? ?? ???? ??? ??????, ??? ?????? ???
10200 2000MFLOPs
????? ???? 10 ????? ???? ???? ???? ????? 90
????? ????? 10 ??????, ???

91
Domain Decomposition

????? ????? ?????? ?? ????????? ????? ???????
????? ????? ??????? ????? ?????? ????? ????????
Load Balance
Granularity

92
Load Balance 1/2

All processors must be kept busy!
The parallel cluster may not be homogenous
(CPUs, memory, users/jobs, network)

93
Load Balance 2/2

Static versus Dynamic techniques
Static
Algorithmic assignment based on input wont
change
Low runtime overhead
Computation must be predictable
Preferable when applicable (except in
multiprogrammed/heterogeneous environment)
Dynamic
Adapt at runtime to balance load
Can increase communication and reduce locality
Can increase task management overheads

94
Determining Task Granularity

Task granularity amount of work associated with
a task
General rule
Coarse-grained gt often less load balance
Fine-grained gt more overhead often more comm.,
contention

95
Algorithms Adding 8 Numbers
96
Summary Terms Defined 1

Flynn Taxonomy
Message Passing
Shared Memory
Bandwidth
Latency
Bisection Width

Diameter
Connectivity
Cost
Meshes, Trees, Hypercubes
Deadlock

97
Summary Terms Defined - 2

Embedding
Process
Amdahls Law
Speedup Factor

Efficiency
Cost
Scalability
Gustafsons Law
Load Balance

98
Next Week Class

?????? ??? ?????? ?????? ???????, ???? ?' ??????
????? ???? ???????
?? ????? ????? ????? ?? ????? ??????? ??? ?????
???? ?????? (Email ?????)!!! ????? ??? ????
????? ????? ?? ???? ???? ??????!!!

99
Task 2

Goto http//www.lam-mpi.org/tutorials/
Download and print the file
MPI quick reference sheet
Linux Tutorial
Goto http//www.ctssn.com/, learn at least
lessons 1,2 and 3.

100
Cluster Computing

COTS Commodities of The Shelf
Free O/S, e.g. Linux
LOBOS Lots Of Boxes On the Shelf
PCs connected by a fast network

101
??? ???? ???????????

Cray-J932
16 Processors
200 MFLOPS per CPU
3.2 GFLOPS

102
The Dwarves 1/5

12 PCs of several types
Red Hat Linux 6.0-6.2
Fast Ethernet 100Mbps
Myrinet Network
1.281.28Gbps, SAN

103
The Dwarves 2/5
There are 12 computers with Linux operating
system. dwarf1-12 or dwarf1-12m dwarf1m,
dwarf3m-dwarf7m - Pentium II 300
MHz, dwarf9m-dwarf12m - Pentium III 450 MHz
(dual CPU), dwarf2m, dwarf8m - Pentium III
733 MHz (dual CPU).
104
The Dwarves 3/5

6 PII at 300MHz processors
8 PIII at 450MHz processors
4 PIII at 733MHz processors
Total 18 processors, 8GFlops

105
The Dwarves 4/5

Dwarf1 ..dwarf12 nodes names for the Fast
Ethernet link
Dwarf1m .. Dwarf12m nodes names for the Myrinet
network

106
The Dwarves 5/5

GNU FORTRAN / C Compilers
PVM / MPI

107
Cluster Computing - 1
108
Cluster Computing - 2
109
Cluster Computing - 3
110
Cluster Computing - 4
111
Linux
http//www.ee.bgu.ac.il/tel-zur/linux.html
112
Linux
In Google Linux 38,600,000 Microsoft
21,500,000 Bible 7,590,000

Write a Comment

User Comments (0)

About PowerShow.com

Introduction to Parallel Processing PowerPoint PPT Presentation