Reducing NoC Energy Consumption Through Compiler-Directed Channel Voltage Scaling - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Reducing NoC Energy Consumption Through Compiler-Directed Channel Voltage Scaling

Description:

Mahmut Kandemir, Mary Jane Irwin. Microsystems Design Lab, Department of CSE ... message size) / (max. data rate) (u,v) = (max. message size) / (max. data rate) ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 53
Provided by: Fei84
Category:

less

Transcript and Presenter's Notes

Title: Reducing NoC Energy Consumption Through Compiler-Directed Channel Voltage Scaling


1
Reducing NoC Energy Consumption Through
Compiler-Directed Channel Voltage Scaling
  • Guangyu Chen, Feihui Li, Mahmut Kandemir, Mary
    Jane Irwin
  • Microsystems Design Lab, Department of CSE
  • The Pennsylvania State University
  • mdl_at_cse.psu.edu

2
Why NoCs?
  • Scalability
  • Support for large number of processing units
  • Flexibility
  • Topology and routing policy can be configured
    according to the needs of a particular
    application
  • Point-to-point, broadcasting (one-to-multiple),
    gathering (multiple-to-one)
  • Performance
  • Low latency, high bandwidth
  • Reliability
  • Multiple routes between a source/target pair
  • Signal strengthening in routers

3
Mesh-Based NoC Abstraction
Communication Channel
Router
CPU
CPU
CPU
Memory
Memory
Memory
CPU
CPU
CPU
Memory
Memory
Memory
CPU
CPU
CPU
Memory
Memory
Memory
4
Related Work
  • Communication channels can account for a
    significant portion to the chip energy
    consumption (between 20 and 45)
  • Prior efforts
  • Simunic and Boyd NoC power modeling (DATE02)
  • Benini and De Micheli Design methodology for
    energy-efficient reliable SoC networks (ISSS01)
  • Shang et al Hardware-directed DVS for
    communication links (HPCA03)
  • Kim et al Communication link shutdown
    (ISLPED03)
  • Soteriou and Peh Design space exploration for
    link turn on/off (ICCD04)
  • Soteriou et al Software-directed power-aware
    interconnection networks (CASES05)
  • Li et al Software-directed DVS for communication
    links (CASES05)
  • Li et al Compiler-directed link turnoff and
    routing (ICCAD05, EMSOFT05, POPL06)
  • Our goal is to save network energy through
    voltage/frequency scaling

5
Motivational Example (1)
Node 2
Node 1
for i 0 to N send(2, Ai0..1023
receive(2, buffer)
for i 0 to N send(1, Ai0..255
receive(1, buffer)
i0
i1
i2
i3
i4
6
Motivational Example (2)
Node 2
Node 1
for i 0 to N send(2, Ai0..255 short
computation receive(2, buffer)
for i 0 to N send(1, Ai0..255 long
computation receive(1, buffer)
Node 1
Node 2
i0
i1
i2
i3
i4
7
Overview of Our Approach
CriticalPathAnalysis
BuildingIPCG
InputParallel Code
IPCG
CodeModification
Scaling Factorfor EachConnection
OutputParallelCode
8
Assumptions
  • Array-based embedded applications
  • Message-passing based parallel program
  • For each send(p, m) instruction, the destination
    node p, and the size of message m can be
    statically determined at compilation time
  • For each receive(p, m) instruction, the source
    node p can be determined at compilation time
  • A send instruction is blocked if the previous
    message send by the same node has not been
    delivered to the destination node
  • A receive instruction is blocked if the message
    is not ready in the buffer of the receiver node
  • Code is parallelized and process-to-node mapping
    is performed
  • Network is exposed to the compiler

9
Inter-Process Communication Graph (IPCG)
  • IPCG G(P) captures the communication behavior of
    application P
  • G(P) (V(P), E(P), ?, ? )
  • V(P) the set of vertices
  • E(P) the set of edges
  • ?, ? the weights for edges, capturing
    minimum/maximum execution latencies

10
Vertices of IPCG
  • V(P) X(P) ? B(P) ? S(P) ? D(P) ? R(P)
  • x ? X(P) the entry point of a loop in program P
  • b ? B(P) the back jump of a loop in program P
  • s ? S(P) the point in P at which a message is
    sent
  • d ? D(P) the point in P at which a message is
    delivered
  • r ? R(P) the point in P at which a message is
    used

send(2,..)
Node 1
s
Node 2
d
r
messagedelivered
receive(1,..)
11
Edges of IPCG
  • Task edges
  • Communication edge (s, d) a message is sent at
    point s ? S(P) and delivered at point d ? D(P)
  • Computation edge (u, v) a computation task
    starts at point u and ends at point v
  • u, v ? X(P) ? S(P) ? R(P)
  • Control edges
  • Enforce the order at which the points of the
    given program can be reached
  • Back-jump edge
  • Other control edges

12
? and ? Functions
  • ?(u,v) and ?(u,v) the minimum and maximum times
    required to execute task (u,v)
  • For communication edge (s,d)
  • ?(s,d) (min. message size) / (max. data rate)
  • ?(u,v) (max. message size) / (max. data rate)
  • For computation edge (u, v)
  • ?(s,d) the minimum time for executing the
    instructions between u and v
  • ?(u,v) the maximum time for executing the
    instructions between u and v
  • For control edge(u,v)
  • ?(s,d) ?(u,v) 0

13
IPCG Example (1)
// Process 1 x3for(...) r1receive(2,..)
2025 cycles s2send(2,..)
// Process 2 x1for(...) s1send(1,..)
x2for(...) 10 cycles s3send(3,..)
1015 cycles s4send(3,..) 80-90
cycles r5receive(3,..) 20 cycles
r2receive(1,..)
// Process 3 x4for(...) 10 cycles
r3receive(2,..) 15 cycles r4receive(2,..)
40-50 cycles s5send(2,..)
14
IPCG Example (2)
x4
10/10
10/10
10/10
0/0
x1
x3
s3
r3
d3
0/0
15/15
10/15
s1
d1
r1
s4
r4
d4
0/0
10/15
20/25
10/15
x2
40/50
80/90
s2
120/?
s5
d5
r5
d2
r2
0/0
10/10
0/0
20/20
10/10
b3
0/0
b4
b2
b1
p2
p3
p1
15
IPCG Example (2)
x4
x1
x3
s3
r3
d3
s1
d1
r1
s4
r4
d4
x2
s2
s5
d5
r5
d2
r2
b3
b4
b2
b1
p2
p3
p1
16
IPCG Example (2)
x4
x1
x3
s3
r3
d3
s1
d1
r1
s4
r4
d4
x2
s2
s5
d5
r5
d2
r2
b3
b4
b2
b1
p2
p3
p1
17
IPCG Example (2)
x4
x1
x3
s3
r3
d3
s1
d1
r1
s4
r4
d4
x2
s2
s5
d5
r5
d2
r2
b3
b4
b2
b1
p2
p3
p1
18
IPCG Example (2)
x4
x1
x3
s3
r3
d3
s1
d1
r1
s4
r4
d4
x2
s2
s5
d5
r5
d2
r2
b3
b4
b2
b1
p2
p3
p1
19
IPCG Example (2)
x4
10/10
x1
x3
s3
r3
d3
s1
d1
r1
s4
r4
d4
10/15
10/15
x2
s2
s5
d5
r5
d2
r2
10/10
10/10
b3
b4
b2
b1
p2
p3
p1
20
IPCG Example (2)
x4
10/10
10/10
10/10
0/0
x1
x3
s3
r3
d3
0/0
15/15
10/15
s1
d1
r1
s4
r4
d4
0/0
10/15
20/25
10/15
x2
40/50
80/90
s2
120/?
s5
d5
r5
d2
r2
0/0
10/10
0/0
20/20
10/10
b3
0/0
b4
b2
b1
p2
p3
p1
21
IPCG Example (2)
x4
10/10
10/10
10/10
0/0
x1
x3
s3
r3
d3
0/0
15/15
10/15
s1
d1
r1
s4
r4
d4
0/0
10/15
20/25
10/15
x2
40/50
80/90
s2
120/?
s5
d5
r5
d2
r2
0/0
10/10
0/0
20/20
10/10
b3
0/0
b4
b2
b1
p2
p3
p1
22
IPCG Example (2)
x4
10/10
10/10
10/10
0/0
x1
x3
s3
r3
d3
0/0
15/15
10/15
s1
d1
r1
s4
r4
d4
0/0
10/15
20/25
10/15
x2
40/50
80/90
s2
120/?
s5
d5
r5
d2
r2
0/0
10/10
0/0
20/20
10/10
b3
0/0
b4
b2
b1
p2
p3
p1
23
IPCG Example (2)
x4
10/10
10/10
10/10
0/0
x1
x3
s3
r3
d3
0/0
15/15
10/15
s1
d1
r1
s4
r4
d4
0/0
10/15
20/25
10/15
x2
40/50
80/90
s2
120/?
s5
d5
r5
d2
r2
0/0
10/10
0/0
20/20
10/10
b3
0/0
b4
b2
b1
p2
p3
p1
24
IPCG Example (2)
25
Parallel Loop Group
  • A set of loops that communicate with each other
  • Unit of granularity for optimization

x4
10/10
10/10
10/10
0/0
x1
x3
s3
r3
d3
0/0
15/15
10/15
s1
d1
r1
s4
r4
d4
0/0
10/15
20/25
10/15
x2
40/50
80/90
s2
120/?
s5
d5
r5
d2
r2
0/0
10/10
0/0
20/20
10/10
b3
0/0
b4
b2
b1
26
Representative Iterations
  • A set of loop iterations that represent the
    timing behavior of the entire parallel loop group

Time
27
Critical Path Analysis
  • Determine q and Q such that q, Q 1 are the
    set of representative loop iterations
  • Determine t?i,j the earliest time that node vi
    at the jth iteration (j ?q, Q-1) can be
    reached, assuming each task is completed in the
    shortest time
  • Determine t?i,j the earliest time that node vi
    at the jth iteration (j ?q, Q-1) can be
    reached, assuming each task takes the longest
    time
  • Determine the scaling factor for each
    communication channel such that the overall
    performance degradation due to voltage scaling is
    within ? (a preset bound)

28
Determining t?i,j - Constraints
where
the set of intra-iteration edges
at each iteration j, u must be reached before v
the set of inter-iteration edges
u at the (j 1)th iteration must be reached
before v at the jth iteration
29
Examples of Intra- and Inter-Iteration Edges
x4
x1
x3
s3
r3
d3
s1
d1
r1
s4
r4
d4
x2
s2
s5
d5
r5
d2
r2
b3
b4
b2
b1
p2
p3
p1
Intra-Iteration edge
Inter-Iteration edge
30
Determining t?i,j - Example
x2
x3
x1
s2
s3
d3
s1
d1
d1
20/25
20/25
20/25
25/30
20/20
20/25
r1
r2
r3
25/30
15/15
10/10
b1
b2
b3
p2
p3
p1
31
Determining t?i,j - Example
x1 s1 d1 r1 b1 x2 s2 d2 r2 b2 x3 s3 d3 r3 b3
t?i,0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
20
32
Determining t?i,j - Example
x1 s1 d1 r1 b1 x2 s2 d2 r2 b2 x3 s3 d3 r3 b3
t?i,0 0 0 20 20 30 0 0 20 25 50 0 0 20 20 35
t?i,1 30 20 0 0 0 20 50 0 0 0 35 20 0 0 0
33
Determining t?i,j - Example
x1 s1 d1 r1 b1 x2 s2 d2 r2 b2 x3 s3 d3 r3 b3
t?i,0 0 0 20 20 30 0 0 20 25 50 0 0 20 20 35
t?i,1 30 30 50 55 65 50 50 70 75 100 35 35 55 70 85
34
Determining t?i,j Example
x1 s1 d1 r1 b1 x2 s2 d2 r2 b2 x3 s3 d3 r3 b3
t?i,0 0 0 20 20 30 0 0 20 25 50 0 0 20 20 35
t?i,1 30 30 50 55 65 50 50 70 75 100 35 35 55 70 85
t?i,2 65 65 85 105 115 100 100 120 125 150 85 85 105 120 135
t?i,3 115 115 135 155 165 150 150 170 175 200 135 135 155 170 185
t?i,4 165 .... .... .... .... 200 .... .... .... .... 185 .... .... .... ....
q 2, Q 4, T 50
35
Determining t?i,j - Constraints
where
the set of intra-iteration edges
the set of inter-iteration edges
36
Determining Scaling Factor -Constraints
where
the set of intra-iteration and inter-iteration
edges
the node that executes operation v
the maximum performance degradation allowed
the scaling factor for the network connection
from node n1 to n2 We try to maximize k(n1, n2)
for each connection
37
Determining Scaling Factor - Algorithm
  • repeat
  • select a connection C
  • scale down the data rate of C by one grade
  • determine ti, j using
  • if
  • make the data rate of C permanent
  • else
  • restore the data rate of C
  • until no more connection can be scale down

38
Determining Scaling Factor - Example
q 2, Q 4, T 100, ? 10,
k 1, 0.8, 0.6, 0.4, 0.2
x1 s1 d1 r1 b1 x2 s2 d2 r2 b2 x3 s3 d3 r3 b3
t?i,q 65 65 85 105 115 100 100 120 125 150 85 85 105 120 135
t?i,Q 165 .... .... .... .... 200 .... .... .... .... 185 .... .... .... ....
t?i,Q 170 .... .... .... .... 210 .... .... .... .... 190 .... .... .... ....
tmaxi,Q 175 .... .... .... .... 210 .... .... .... .... 195 .... .... .... ....
39
Determining Scaling Factor - Example
q 2, Q 4, T 100, ? 10,
k 1, 0.8, 0.6, 0.4, 0.2
x1 s1 d1 r1 b1 x2 s2 d2 r2 b2 x3 s3 d3 r3 b3
t?i,q 65 65 85 105 115 100 100 120 125 150 85 85 105 120 135
t?i,Q 165 .... .... .... .... 200 .... .... .... .... 185 .... .... .... ....
t?i,Q 170 .... .... .... .... 210 .... .... .... .... 190 .... .... .... ....
tmaxi,Q 175 .... .... .... .... 210 .... .... .... .... 195 .... .... .... ....
k1, 2 0.8, k2, 3 1, k3, 1 1
x1 s1 d1 r1 b1 x2 s2 d2 r2 b2 x3 s3 d3 r3 b3
ti,Q 170 .... .... .... .... 210 .... .... .... .... 190 .... .... .... ....
40
Determining Scaling Factor - Example
q 2, Q 4, T 100, ? 10,
k 1, 0.8, 0.6, 0.4, 0.2
x1 s1 d1 r1 b1 x2 s2 d2 r2 b2 x3 s3 d3 r3 b3
t?i,q 65 65 85 105 115 100 100 120 125 150 85 85 105 120 135
t?i,Q 165 .... .... .... .... 200 .... .... .... .... 185 .... .... .... ....
t?i,Q 170 .... .... .... .... 210 .... .... .... .... 190 .... .... .... ....
tmaxi,Q 175 .... .... .... .... 210 .... .... .... .... 195 .... .... .... ....
k1, 2 0.8, k2, 3 0.8, k3, 1 1
x1 s1 d1 r1 b1 x2 s2 d2 r2 b2 x3 s3 d3 r3 b3
ti,Q 170 .... .... .... .... 210 .... .... .... .... 196.25 .... .... .... ....
41
Determining Scaling Factor - Example
q 2, Q 4, T 100, ? 10,
k 1, 0.8, 0.6, 0.4, 0.2
x1 s1 d1 r1 b1 x2 s2 d2 r2 b2 x3 s3 d3 r3 b3
t?i,q 65 65 85 105 115 100 100 120 125 150 85 85 105 120 135
t?i,Q 165 .... .... .... .... 200 .... .... .... .... 185 .... .... .... ....
t?i,Q 170 .... .... .... .... 210 .... .... .... .... 190 .... .... .... ....
tmaxi,Q 175 .... .... .... .... 210 .... .... .... .... 195 .... .... .... ....
k1, 2 0.8, k2, 3 1, k3, 1 0.8
x1 s1 d1 r1 b1 x2 s2 d2 r2 b2 x3 s3 d3 r3 b3
ti,Q 176.25 .... .... .... .... 210 .... .... .... .... 190 .... .... .... ....
42
Determining Scaling Factor - Example
q 2, Q 4, T 100, ? 10,
k 1, 0.8, 0.6, 0.4, 0.2
x1 s1 d1 r1 b1 x2 s2 d2 r2 b2 x3 s3 d3 r3 b3
t?i,q 65 65 85 105 115 100 100 120 125 150 85 85 105 120 135
t?i,Q 165 .... .... .... .... 200 .... .... .... .... 185 .... .... .... ....
t?i,Q 170 .... .... .... .... 210 .... .... .... .... 190 .... .... .... ....
tmaxi,Q 175 .... .... .... .... 210 .... .... .... .... 195 .... .... .... ....
k1, 2 0.6, k2, 3 1, k3, 1 1
x1 s1 d1 r1 b1 x2 s2 d2 r2 b2 x3 s3 d3 r3 b3
ti,Q 170 .... .... .... .... 210 .... .... .... .... 190 .... .... .... ....
43
Determining Scaling Factor - Example
q 2, Q 4, T 100, ? 10,
k 1, 0.8, 0.6, 0.4, 0.2
x1 s1 d1 r1 b1 x2 s2 d2 r2 b2 x3 s3 d3 r3 b3
t?i,q 65 65 85 105 115 100 100 120 125 150 85 85 105 120 135
t?i,Q 165 .... .... .... .... 200 .... .... .... .... 185 .... .... .... ....
t?i,Q 170 .... .... .... .... 210 .... .... .... .... 190 .... .... .... ....
tmaxi,Q 175 .... .... .... .... 210 .... .... .... .... 195 .... .... .... ....
k1, 2 0.4, k2, 3 1, k3, 1 1
x1 s1 d1 r1 b1 x2 s2 d2 r2 b2 x3 s3 d3 r3 b3
ti,Q 170 .... .... .... .... 210 .... .... .... .... 190 .... .... .... ....
44
Determining Scaling Factor - Example
q 2, Q 4, T 100, ? 10,
k 1, 0.8, 0.6, 0.4, 0.2
x1 s1 d1 r1 b1 x2 s2 d2 r2 b2 x3 s3 d3 r3 b3
t?i,q 65 65 85 105 115 100 100 120 125 150 85 85 105 120 135
t?i,Q 165 .... .... .... .... 200 .... .... .... .... 185 .... .... .... ....
t?i,Q 170 .... .... .... .... 210 .... .... .... .... 190 .... .... .... ....
tmaxi,Q 175 .... .... .... .... 210 .... .... .... .... 195 .... .... .... ....
k1, 2 0.2, k2, 3 1, k3, 1 1
x1 s1 d1 r1 b1 x2 s2 d2 r2 b2 x3 s3 d3 r3 b3
ti,Q 170 .... .... .... .... 210 .... .... .... .... 190 .... .... .... ....
45
Determining Scaling Factor - Example
q 2, Q 4, T 100, ? 10,
k 1, 0.8, 0.6, 0.4, 0.2
x1 s1 d1 r1 b1 x2 s2 d2 r2 b2 x3 s3 d3 r3 b3
t?i,q 65 65 85 105 115 100 100 120 125 150 85 85 105 120 135
t?i,Q 165 .... .... .... .... 200 .... .... .... .... 185 .... .... .... ....
t?i,Q 170 .... .... .... .... 210 .... .... .... .... 190 .... .... .... ....
tmaxi,Q 175 .... .... .... .... 210 .... .... .... .... 195 .... .... .... ....
k1, 2 0.2, k2, 3 1, k3, 1 1
x1 s1 d1 r1 b1 x2 s2 d2 r2 b2 x3 s3 d3 r3 b3
ti,Q 170 .... .... .... .... 270 .... .... .... .... 190 .... .... .... ....
RESULT k1, 2 0.4, k2, 3 1, k3, 1 1
46
Shared Communication Channels
v1
a
c
  • The voltage level of the channel shared by
    multiple connections is determined by the
    connection that requires the highest voltage
    level

v1
v3
v2
v2
v2
b
b?
v3
v1
v3
v1
c?
a?
47
Code Modification
p0
p1
v1
v2
v3
v4
v5
v6
p2
send(p1, CTRL, v1, v2, v3) send(p2, CTRL, v4,
v5, v6) for(...) ... send(p1, ...)
send(p2,..) ...
// loop executed on p0 for(...) ...
send(p1, ...) send(p2, ...) ...
48
Experimental Setup
Voltage (V) Rate (bps) Energy (pJ/bit)
0.7 200M 4.21
0.9 660M 5.25
1.1 1.33G 6.49
1.3 1.93G 8.31
1.5 2.50G 10.21
Parameter Value
NoC topology 5 5 mesh
Idle channel power 8.6pJ/cycle
Voltage switch energy 1020pJ,
Voltage delay 120 cycles
Processor 1GHz, 2-issue
Node local memory 20KB
Package header size 3 flits
Flit size 39bits
49
Impact on Energy Consumption
50
Energy Consumption Breakdown
51
Accuracy of Voltage Selection
52
Conclusions and Research Directions
  • NoC presents unique opportunities for compilers
  • Expose network layout to compiler for energy
    reduction through voltage scaling and channel
    shutdown
  • We implemented a compiler directed voltage
    scaling algorithm and compared its performance to
    a hardware scheme
  • Promising results
  • Research Directions
  • Evaluating impact of process-to-node mapping
  • Combined voltage/frequency scaling for NoC and
    CPUs
  • Metrics other than energy (e.g., temperature,
    reliability,)

53
Thank you!
  • http//www.cse.psu.edu/mdl
  • mdl_at_cse.psu.edu

Funded in part by GSRC and NSF
Write a Comment
User Comments (0)
About PowerShow.com