ECE 669 Parallel Computer Architecture Lecture 20 Evaluation and Message Passing - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

ECE 669 Parallel Computer Architecture Lecture 20 Evaluation and Message Passing

Description:

flits flits. read miss in 1.2507 yes 4 yes * (see. write mode formula) ... in-siz: 4 (block size/flit size) ECE669 L20: Evaluation and Message Passing. April 13, 2004 ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 21
Provided by: RussTe7
Learn more at: http://www.ecs.umass.edu
Category:

less

Transcript and Presenter's Notes

Title: ECE 669 Parallel Computer Architecture Lecture 20 Evaluation and Message Passing


1
ECE 669Parallel Computer ArchitectureLecture
20Evaluation and Message Passing
2
Performance Evaluation
  • Why?
  • Evaluate tradeoffs
  • Estimate machine performance
  • Measure application behavior
  • How?
  • Ask an expert --- hire a consultant?
  • Measure existing machines --- (What?!)
  • Build simulators
  • Analytical models
  • Hybrids --- combination of 2-3-4

3
In the lab
Parallel Traces Network Model
4
Example
  • 1. Trace simulation, awk filter
  • 2. Compute network parameters m,B
  • 3. Compute processor utilization
  • Given m,B and kd, n, N
  • Derive U.

Event Prob out-msg out-siz in-msg in-siz fli
ts flits read miss in 1.2507 yes 4 yes
(see write mode formula) other
msgs 2.8483 yes 4 yes 4
. . .
. . .
. . .
. . .
. . .
. . .
. . .
in-siz 4 (block size/flit size)
5
Evaluation
Weather
1.0
  • Barriers implemented using distributed trees
  • Read-only sharing marked

U
0.8
Processor utilization
LimitLESS
0.6
0.4
0.2
0.0
pointers
Speech
U
pointers
6
Evaluation
Processor
Memory
Inter- connection network
32
12
5
16
16
8
38
12
3
5
32
24
8
28
1
40
64 ...
7
32
... 2
3
24
28
7
5
52
5 9
64
7
Full system simulation (coupled)
Compiled application

Processor simulator
Processor
Memory requests
Synchronization requests
Wait, traps
Cache and memory systems simulator
Memory
Network requests
Acknowledgements, responses
Inter- connection network
Interconnection network simulator
32
12
5
16
16
38
12
8
3
5
32
24
8
28
1
40
64 ...
7
32
... 2
3
24
28
7
5
52
5 9
64
8
Trace-driven simulation (coupled)
E.g. From M.I.T. many parallel traces exist

Sequential address trace
Processor
Trace scheduler
Address trace
Port ready
...
Cache and memory systems simulator
Memory
Network requests
Acknowledgements, responses
Inter- connection network
Network simulator
32
12
5
16
16
8
38
12
3
5
32
24
8
28
1
40
64 ...
7
32
... 2
3
24
28
7
5
52
5 9
64
9
Hybrid Network Model

Compiled application
Processor
Processor simulator
Parallel address trace
Wait
Memory
Cache and memory system simulator
Requests
Response, latency
Time window average
Inter- connection network
Network model
32
12
5
16
16
38
12
8
3
5
32
24
8
28
1
40
64 ...
7
32
... 2
3
24
28
7
5
52
5 9
64
10
Many hybrids possible
traces Network model

Processor
Parallel address traces
Cache and memory system simulator
Memory
Events counts
Request rate, message size
Analytical network model
Inter- connection network
32
12
5
16
16
8
38
12
3
5
32
24
8
28
1
40
64 ...
7
32
... 2
3
24
28
7
5
52
5 9
64
11
Hybrid - Network model
Compiled application

Direct execution - Round robin/process - Switch
on mem. req.
Processor
Memory requests
Switch
Cache and memory system simulator
Memory
Responses, latency
Network requests, time window ave
Inter- connection network
Network model
32
12
5
16
16
8
38
12
3
5
32
24
8
28
1
40
64 ...
7
32
... 2
3
24
28
7
5
52
5 9
64
12
Trace-driven (decoupled)

Processor
No trace scheduler!
Address trace
Cache and memory system simulator
Memory
No feedback
Responses
Network requests
Inter- connection network
Network simulator
Synchronization constraints may be violated
Garbage!
32
12
5
16
16
38
12
8
3
5
32
24
8
28
1
40
64 ...
7
32
... 2
3
24
28
7
5
52
5 9
64
13
Message passing
  • Bulk transfers
  • Complex synchronization semantics
  • more complex protocols
  • More complex action
  • Synchronous
  • Send completes after matching recv and source
    data sent
  • Receive completes after data transfer complete
    from matching send
  • Asynchronous
  • Send completes after send buffer may be reused

14
Synchronous Message Passing
Processor Action?
  • Constrained programming model.
  • Deterministic! What happens when threads
    added?
  • Destination contention very limited.
  • User/System boundary?

15
Asynchronous Message Passing Optimistic
  • More powerful programming model
  • Wildcard receive gt non-deterministic
  • Storage required within msg layer?

16
Asynchronous Message Passing Conservative
D
e
s
t
i
n
a
t
i
o
n
S
o
u
r
c
e
(
1
)

I
n
i
t
i
a
t
e

s
e
n
d
(
2
)

A
d
d
r
e
s
s

t
r
a
n
s
l
a
t
i
o
n

o
n

P
S
e
n
d

P
,

l
o
c
a
l

V
A
,

l
e
n
(
3
)

L
o
c
a
l
/
r
e
m
o
t
e

c
h
e
c
k
S
e
n
d
-
r
d
y

r
e
q
(
4
)

S
e
n
d
-
r
e
a
d
y

r
e
q
u
e
s
t
(
5
)

R
e
m
o
t
e

c
h
e
c
k

f
o
r

p
o
s
t
e
d

r
e
c
e
i
v
e

(
a
s
s
u
m
e

f
a
i
l
)


R
e
t
u
r
n

a
n
d

c
o
m
p
u
t
e
r
e
c
o
r
d

s
e
n
d
-
r
e
a
d
y
T
a
g

c
h
e
c
k
(
6
)

R
e
c
e
i
v
e
-
r
e
a
d
y

r
e
q
u
e
s
t
R
e
c
v

P

l
o
c
a
l

V
A
,

l
e
n
(
7
)

B
u
l
k

d
a
t
a

r
e
p
l
y
S
o
u
r
c
e

V
A


D
e
s
t

V
A

o
r

I
D
R
e
c
v
-
r
d
y

r
e
q
D
a
t
a
-
x
f
e
r

r
e
p
l
y
T
i
m
e
  • Where is the buffering?
  • Contention control? Receiver initiated protocol?
  • Short message optimizations

17
Key Features of Message Passing Abstraction
  • Source knows send data address, dest. knows
    receive data address
  • after handshake they both know both
  • Arbitrary storage outside the local address
    spaces
  • may post many sends before any receives
  • non-blocking asynchronous sends reduces the
    requirement to an arbitrary number of descriptors
  • fine print says these are limited too
  • Fundamentally a 3-phase transaction
  • includes a request / response
  • can use optimisitic 1-phase in limited Safe
    cases
  • credit scheme

18
Active Messages
  • User-level analog of network transaction
  • transfer data packet and invoke handler to
    extract it from the network and integrate with
    on-going computation
  • Request/Reply
  • Event notification interrupts, polling, events?
  • May also perform memory-to-memory transfer

19
Common Challenges
  • Input buffer overflow
  • N-1 queue over-commitment gt must slow sources
  • reserve space per source (credit)
  • when available for reuse?
  • Ack or Higher level
  • Refuse input when full
  • backpressure in reliable network
  • tree saturation
  • deadlock free
  • what happens to traffic not bound for congested
    dest?
  • Reserve ack back channel
  • drop packets
  • Utilize higher-level semantics of programming
    model

20
Summary
  • Evaluation important to understand intermediate
    messages in cache protocol
  • Message sizes may vary based on function
  • Two main types of message passing protocols
  • Synchronous and asynchronous
  • Active messages involve remote operations
  • Message techniques depend on network reliability
Write a Comment
User Comments (0)
About PowerShow.com