Introduction to Parallel Processing - PowerPoint PPT Presentation

1 / 112
About This Presentation
Title:

Introduction to Parallel Processing

Description:

???? ?????? ??????. ??' ??? ??-???. ???? ????? ??' 1. ????? ???? ?', 22/10/2001 ... Example: Intel Paragon. A Binary Tree 1/2. A Binary Tree 2/2 ... – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 113
Provided by: guyte
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Parallel Processing


1
???? ?????? ?????? ??' ??? ??-???
  • ???? ????? ??' 1
  • ????? ???? ?', 22/10/2001

2
Introduction to Parallel Processing
  • Course Number 36113621
  • ??? ?????
  • http//www.bgu.ac.il/tel-zur/pp.html

3
21.10.01 ?' ???? ???"?     ?? ???????? ??????
????? ??? ??' ???? ??????         ?????? ?????
???? ???? ?????? ?????? ( 36113621)   ?. ?. ???
????? ???????? ??? ????????, ??? ?????? ???? ??
????????? ????? ???? ?????? ??????. ??? ?????
?? ??????? ?? ?? ??????? ?????? ??? ?????? ??????
???. ????? ????? ?? ????? ???????? ??????
?????? ????? .   ?????,
??' ???? ??????
4
???? ??? ???, 800-1100
  • ????? ????? 28, ??? 301

5
???? ??' ??? ??-???
  • ??"?
  • tel-zur_at_ee.bgu.ac.il

6
?????? ??' ????? ????
  • ???' ?????? ???? ???????
  • npanov_at_ee.bgu.ac.il

7
????? ???
  • ???? ????
  • ??? ??-??? ???? ?? ??? ??????, ??? 1100 ?-
    1200. ????? ??? ????.
  • ????? ???? ??? ?', ??? 1400 ?- 1800. ??? 318
    ?????? ????? ???? ???????.
  • Email pp_at_ee.bgu.ac.il
  • Newsgroup pp_news_at_ee.bgu.ac.il

8
Course Objectives
  • The goal of this course is to provide in-depth
    understanding of modern parallel processing. The
    course will cover theoretical and practical
    aspects of parallel processing.

9
Task 1
  • Please send an email containing the following
    data
  • Your first and last name
  • Your Email at BGU
  • Phone Number
  • Year
  • Course of Study
  • to pp_at_ee.bgu.ac.il
  • PLEASE WRITE EMAILS ONLY IN ENGLISH

10
???? ?????
  • ????
  • ??????? ??????
  • ?????????
  • ???????
  • ?????? ?????

11
????? ?' ???"?
  • ???? ?????? ???????? 21.10.01
  • ????? ????? ??? ?' 16.12.01
  • ???? ?????? ?????? ?????? 25.1.02
  • ???? ???? ??????? ??? 18.2.02
  • ???? ?????? ?????? ???? 24.2.02

12
????? ?????? ?? ????? 1/3????? ???????!
13
????? ?????? ?? ????? 2/3
14
????? ?????? ?? ????? 3/3
15
????? ??????? 1/2
  • ????? ??' 1. ??? ?' ????? ???. ???? ????????
    ????-??? (????? ??' 4) ??? ???? ?? ???? ??????.
  • ????? ??' 2. ???? ?????? ??' 4. ?? ?????? ??????
    ??' 6. 15 ?????? ?????.

16
????? ??????? 2/2
  • ?????? ???? 5 ?????? ????? ?????????
  • ???? ?????? ?????? ?????? ???? 7. ????? 20
    ?????? ?????.
  • ????? ??' 3 ???? ?????? 8. ?? ?????? ?????? 10.
    ????? 15 ?????? ?????
  • ?????? ??? ???? ?? ?- 18.2.02 ?????? ?- 50
    ?????? ?????.

17
?????? ???
  • ????? ????? ???? C ?? FORTRAN
  • ???? ??? ?? ?????? ????? ????????? ???????
  • ????? ????? ??? ?? ????? ??????, ??? ??????
    ??-??? ???????????

18
??????
  • ?? ????? ?? ???????? ??????? ????? ????????? ??
    ?????
  • ?? ???????? ????? ??? ???? (????????)
  • ?? ???? ?????? ???? ?????? ???? ???? ?????? ?????
    WORD
  • ?? ?????? ????? ?? ?? ???? ???? ???????
  • ?? ????? ???? ?????? ?????? ?????? ???? ?? ?????

19
?????? - ????
  • ?? ????? ?????? ??????
  • ????? ????? ????? ????? ?????? ??"? ?????? ??

?????? ??? n ????!!!
?? ?????? ?? ?? ???? ????? ????? ??????
???????????
20
References
  • ??? ????? ??? ????? ??? ????? ???? ????
  • ???? ????? ????? ??? ??????
  • ???? ????? ?????? ??????? ?????? ?????? ??? ?????
  • ???? ???? ?? ????????
  • ?? ???? ???? ?????? ?????? ??? ???????!!!

21
Parallel Computer Architecture
David E. Culler et al
22
Introduction to Parallel Computing
Vipin Kumar et al
23
Using MPI
William Gropp et al
24
Parallel Programming With MPI
Peter Pacheco
25
Parallel Programming
Barry Wilkinson   Michael Allen
26
????? ?????? ???????
  • ???? ?"???? ?????? ??????"
  • ???? ??? ?? ????? ??????? ???? ????? ??????

27
???????
28
??? ?????, ????? ???????
  • Parallel Computing
  • Parallel Processing
  • Cluster Computing
  • Beowulf Clusters
  • HPC High Performance Computing

29
Oxford Dictionary of Science
  • A technique that allows more than one process
    stream of activity to be running at any given
    moment in a computer system, hence processes can
    be executed in parallel. This means that two or
    more processors are active among a group of
    processes at any instant.

30
??? ???? ?????? ??? ????? ????-???
31
A Supercomputer
  • An extremely high power computer that has a large
    amount of main memory and very fast processors
    Often the processors run in parallel.

http//www.netlib.org/benchmark/top500/top500.list
.html
32
Why Study Parallel Architecture?
  • Parallelism
  • Provides alternative to faster clock for
    performance
  • Applies at all levels of system design (H/W S/W
    Integration)
  • Is a fascinating topic
  • Is increasingly central in information
    processing, science and engineering

33
The Demand for Computational Speed
  • Continual demand for greater computational speed
    from a computer system than is currently
    possible.Areas requiring great computational
    speed include numerical modeling and simulation
    of scientific and engineering problems.
    Computations must be completed within a
    reasonable time period.

34
Large Memory Requirements
  • Use parallel computing for executing larger
    problems which require more memory than exists on
    a single computer.

35
Grand Challenge Problems
  • A grand challenge problem is one that cannot be
    solved in a reasonable amount of time with
    todays computers.Obviously, an execution time of
    10 years is always unreasonable. Examples
    Modeling large DNA structures,global weather
    forecasting, modeling motion of astronomical
    bodies.

36
Scientific Computing Demand
37
?????
  • ???? ???????? ?? 1011 ??????. ???? ?? ???? ?????
    ?????? 100 ???????? ?? ???? ????? ?? O(N2) ?????
    ??? ??-????? ?? 1GFLOPS?

38
?????
  • ???? 1011 ?????? ????? 1022 ???????????.
  • ??"? ?????? ???? 100 ???????? 1024
  • ??? ??? ?????? ????

39
????? - ????
  • ????? ??-?? N log(N)

????? ????? ????????? ???? ??? ???? ???? ??????
??????!
40
Technology Trends
41
Clock Frequency Growth Rate
42
?????? ??? ??? ??? ?? ?? ????!
  • ?? ?? ???? ????? ???????
  • ?????? ????? ???? ??? ??
  • ?????? ??????
  • ??? ?????? ??? ?????????? ????? (?????????
    ??????)
  • ????

43
Parallel Architecture Considerations
  • Resource Allocation
  • how large a collection?
  • how powerful are the elements?
  • how much memory?
  • Data access, Communication and Synchronization
  • how do the elements cooperate and communicate?
  • how are data transmitted between processors?
  • what are the abstractions and primitives for
    cooperation?
  • Performance and Scalability
  • how does it all translate into performance?
  • how does it scale?

44
Conventional Computer
45
Shared Memory System
46
Message-Passing Multi-computer
47
?????
  • ?? ???? ?? ????? ?????? ??????? ????? ??????
  • ?? ??? ?????? ??? ????? ??? ???? ?? ???? ???
  • ??? ????? ???????/??????? ??? ??????? ?? ????
    ?????? ?????? Message Passing
  • ??? ??????? (?????? ?? ????? ?????)

48
Distributed Shared Memory
49
Flynn (1966) Taxonomy
  • SISD - a single instruction stream-single data
    stream computer.
  • SIMD - a single instruction stream-multiple data
    stream computer.
  • MIMD - a multiple instruction stream-multiple
    data stream computer.

50
Multiple Program Multiple Data (MPMD)
51
Single Program Multiple Data (SPMD)
  • A Single source program
  • Each processor will execute its personal copy of
    this program
  • Independently and not in synchronism

52
Message-Passing Multi-computers
53
???? ??????? ????? ??????? ????? ?????? ??????!
  • ?????? ????? ????? ??????? ???????? ?? ??? ???????

54
Network Criteria 1/6
  • Bandwidth
  • Network Latency
  • Communication Latency (H/WS/W)
  • Message Latency (see next slide)

55
Network Criteria 2/6
  • Bandwidth is the inverse of the slope of the line
  • time latency (1/rate) size_of_message
  • Latency is sometimes described as time to send a
    message of zero bytes. This is true only for
    the simple model. The number quoted is sometimes
    misleading.

56
Network Criteria 3/6
  • Bisection Width - links to be cut in order to
    divide the network into two equal parts

2
57
Network Criteria 4/6
  • Diameter The max. distance between any two nodes

P/2
58
Network Criteria 5/6
  • Connectivity Multiplicity of paths between any
    two nodes

2
59
Network Criteria 6/6
  • Cost Number of links

P
60
????? ??? ?? ?????? ??? ???? P ?????? ????
Fully Connected
61
?????
Diameter 1 Bisectionp2/4 Connectivityp-1 Cost
p(p-1)/2
62
????? ???? ?-Bisection - ????
  • Number of links p(p-1)/2
  • Internal links in each half (p/2)(p/2-1)/2
  • Internal links in both halves (p/2)(p/2-1)
  • Number of links being cut
  • p(p-1)/2 (p/2)(p/2-1) p2/4

63
2D Mesh
64
Example Intel Paragon
65
A Binary Tree 1/2
66
A Binary Tree 2/2
Fat tree Thinking Machine CM5, 1993
67
3D Hypercube Network
68
4D Hypercube Network
69
Embedding 1/2
70
Embedding 2/2
71
Deadlock
72
Ethernet
73
Ethernet Frame Format
74
Point-to-Point Communication
75
Performance
  • Computation/Communication ratio
  • Speedup Factor
  • Overhead
  • Efficiency
  • Cost
  • Scalability
  • Gustafsons Law

76
Computation/Communication Ratio
77
Speedup Factor
The maximum speedup is n (linear speadup)
78
Speedup and Comp/Comm Ratio
79
Overhead
  • Things that limit the speedup
  • Serial parts of the computation
  • Some processors compute while others are idle
  • Communication time for sending messages
  • Extra computation in the parallel version not
    appearing in the serial version

80
Amdahls Law (1967)
81
Amdahls Law - continue
With only 5 of the computation being serial,
the maximum speedup is 20
82
Speedup
83
Efficiency
E is the fraction of time that the processors are
being used. If E100 then S(n)n.
84
Cost
Cost-optimal algorithm is when the cost is
proportional to the single processor cost ( i.e.
execution time)
85
Scalability
  • An imprecise term
  • Reflects H/W and S/W scalability
  • How to get increased performance when the H/W
    increased?
  • What H/W is needed when problem size (e.g.
    cells) is increased?
  • Problem dependent!

86
Gustafsons Law (1988) 1/3
Gives an argument against the pessimistic
Amdahls Law conclusion. Rather than assume that
the problem size is fixed, we should assume that
the parallel execution time is fixed. Define a
Scaled Speedup for the case of increaseing the
number of processors as well as the problem size
87
Gustafsons Law 2/3
88
Gustafsons Law 3/3
An Example Assume we have n20 and a serial
fraction of s0.05 S(scaled)0.050.952019.05,
while the Speedup according to Amdahls Law
is S20/(0.05(20-1)1)10.26
89
?????
  • ???? ?????? ???? 10 ??????, ??"? ??? ????? ??
    200MFLOPS. ??? ?????? ????? ??????? ?? MFLOPS
    ???? 10 ????? ??? ???? ?- 90 ????? ??? ???????

90
?????
  • ???? ?? ???? ??? ??????, ??? ?????? ???
  • 10200 2000MFLOPs
  • ????? ???? 10 ????? ???? ???? ???? ????? 90
    ????? ????? 10 ??????, ???

91
Domain Decomposition
  • ????? ????? ?????? ?? ????????? ????? ???????
  • ????? ????? ??????? ????? ?????? ????? ????????
  • Load Balance
  • Granularity

92
Load Balance 1/2
  • All processors must be kept busy!
  • The parallel cluster may not be homogenous
  • (CPUs, memory, users/jobs, network)

93
Load Balance 2/2
  • Static versus Dynamic techniques
  • Static
  • Algorithmic assignment based on input wont
    change
  • Low runtime overhead
  • Computation must be predictable
  • Preferable when applicable (except in
    multiprogrammed/heterogeneous environment)
  • Dynamic
  • Adapt at runtime to balance load
  • Can increase communication and reduce locality
  • Can increase task management overheads

94
Determining Task Granularity
  • Task granularity amount of work associated with
    a task
  • General rule
  • Coarse-grained gt often less load balance
  • Fine-grained gt more overhead often more comm.,
    contention

95
Algorithms Adding 8 Numbers
96
Summary Terms Defined 1
  • Flynn Taxonomy
  • Message Passing
  • Shared Memory
  • Bandwidth
  • Latency
  • Bisection Width
  • Diameter
  • Connectivity
  • Cost
  • Meshes, Trees, Hypercubes
  • Deadlock

97
Summary Terms Defined - 2
  • Embedding
  • Process
  • Amdahls Law
  • Speedup Factor
  • Efficiency
  • Cost
  • Scalability
  • Gustafsons Law
  • Load Balance

98
Next Week Class
  • ?????? ??? ?????? ?????? ???????, ???? ?' ??????
    ????? ???? ???????
  • ?? ????? ????? ????? ?? ????? ??????? ??? ?????
    ???? ?????? (Email ?????)!!! ????? ??? ????
    ????? ????? ?? ???? ???? ??????!!!

99
Task 2
  • Goto http//www.lam-mpi.org/tutorials/
  • Download and print the file
  • MPI quick reference sheet
  • Linux Tutorial
  • Goto http//www.ctssn.com/, learn at least
    lessons 1,2 and 3.

100
Cluster Computing
  • COTS Commodities of The Shelf
  • Free O/S, e.g. Linux
  • LOBOS Lots Of Boxes On the Shelf
  • PCs connected by a fast network

101
??? ???? ???????????
  • Cray-J932
  • 16 Processors
  • 200 MFLOPS per CPU
  • 3.2 GFLOPS

102
The Dwarves 1/5
  • 12 PCs of several types
  • Red Hat Linux 6.0-6.2
  • Fast Ethernet 100Mbps
  • Myrinet Network
  • 1.281.28Gbps, SAN

103
The Dwarves 2/5
There are 12 computers with Linux operating
system. dwarf1-12 or dwarf1-12m dwarf1m,
dwarf3m-dwarf7m - Pentium II 300
MHz, dwarf9m-dwarf12m - Pentium III 450 MHz
(dual CPU), dwarf2m, dwarf8m - Pentium III
733 MHz (dual CPU).
104
The Dwarves 3/5
  • 6 PII at 300MHz processors
  • 8 PIII at 450MHz processors
  • 4 PIII at 733MHz processors
  • Total 18 processors, 8GFlops

105
The Dwarves 4/5
  • Dwarf1 ..dwarf12 nodes names for the Fast
    Ethernet link
  • Dwarf1m .. Dwarf12m nodes names for the Myrinet
    network

106
The Dwarves 5/5
  • GNU FORTRAN / C Compilers
  • PVM / MPI

107
Cluster Computing - 1
108
Cluster Computing - 2
109
Cluster Computing - 3
110
Cluster Computing - 4
111
Linux
http//www.ee.bgu.ac.il/tel-zur/linux.html
112
Linux
In Google Linux 38,600,000 Microsoft
21,500,000 Bible 7,590,000
Write a Comment
User Comments (0)
About PowerShow.com