Network-on-Chip - PowerPoint PPT Presentation

About This Presentation
Title:

Network-on-Chip

Description:

Title: On Chip Communication Architectures Author: Guest Last modified by: hamdi Created Date: 8/16/2006 12:00:00 AM Document presentation format – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 143
Provided by: Gue285
Category:

less

Transcript and Presenter's Notes

Title: Network-on-Chip


1
  • Network-on-Chip
  • (2/2)
  • Ben Abdallah, Abderazek
  • The University of Aizu
  • E-mail benab_at_u-aizu.ac.jp

KUST University, March 2011
2
Part 3
  • Routing
  • Routing Algorithms
  • Deterministic Routing
  • Oblivious Routing
  • Adaptive Routing

3
Routing Basics
  • Once topology is fixed
  • Routing algorithm determines path(s) from source
    to destination
  • They must prevent deadlock, livelock , and
    starvation

4
Routing Deadlock
  • Without routing restrictions, a resource cycle
    can occur
  • Leads to deadlock

5
Deadlock Definition
  • Deadlock A packet does not reach its
    destination, because it is blocked at some
    intermediate resource
  • Livelock A packet does not reach its
    destination, because it enters a cyclic path
  • Starvation A packet does not reach its
    destination, because some resource does not
    grant access (wile it grants access to other
    packets)

6
Routing Algorithm Attributes
  • Number of destinations
  • Unicast, Multicast, Broadcast?
  • Adaptivity
  • Deterministic , Oblivious or Adaptive?
  • Implementation (Mechanisms)
  • Source or node routing?
  • Table or circuit?

7
  • Deterministic Routing

8
Deterministic Routing
  • Always choose the same path between two nodes
  • Easy to implement and to make deadlock free
  • Do not use path diversity and thus bad on load
    balancing
  • Packets arrive in order

9
Deterministic Routing - Example Destination-Tag
Routing in Butterfly Networks
  • Depends on the destination address only (not on
    source)

2
1
101
1
3
0
The destination address interpreted as quaternary
digits. 111011(2) 23(4), selects the route
The destination address in binary is 5 101
down, up, down, selects the route.
Note Starting from any source and using the
same pattern always routes to destination.
10
Deterministic Routing- Dimension-Order Routing
  • For n-dimensional hypercubes and meshes,
    dimension-order routing produces deadlock-free
    routing algorithms.
  • It is called XY routing in 2-D mesh and e-cube
    routing in hypercubes

11
Dimension-Order Routing - XY Routing Algorithm
D
D
12
Dimension-Order Routing -XY Routing Algorithm
XY routing algorithm for 2 D Mesh
13
Deterministic Routing - E-cube Routing Algorithm
Dimension order routing algorithm for Hypercubes
14
  • Oblivious Routing

15
Oblivious (unconscious) Routing
  • Always choose a route without knowing about the
    state of the network
  • Random algorithms that do not consider the
    network state, are oblivious algorithms
  • Include deterministic routing algorithms as a
    subset

16
Minimal Oblivious Routing
  • Minimal oblivious routing attempts to achieve the
    load balance of randomized routing without giving
    up the locality
  • This is done by restricting routes to minimal
    paths
  • Again routing is done in two steps
  • Route to random node
  • Route to destination

17
Minimal Oblivious Routing - (Torus)
  • Idea For each packet randomly determine a node x
    inside the minimal quadrant, such that the packet
    is routed from source node s to x and then to
    destination node d
  • Assumption At each node routing in x or y
    direction is allowed.

18
Minimal Oblivious Routing - (Torus)
  • For each node in quadrant (00, 10, 20, 01, 11,
    21)
  • Determine a minimal route via x
  • Start with x 00
  • Three possible routes
  • (00, 01, 11, 21) (p0.33)
  • (00, 10, 20, 21) (p0.33)
  • (00,10,11,21) (p0.33)

19
Minimal Oblivious Routing - (Torus)
  • x 01
  • One possible route
  • (00, 01, 11, 21) (p1)

20
Minimal Oblivious Routing - (Torus)
  • x 10
  • Two possible routes
  • (00, 10, 20, 21) (p0.5)
  • (00, 10, 11, 21) (p0.5)

21
Minimal Oblivious Routing - (Torus)
  • x 11
  • Two possible routes
  • (00, 10, 11, 21) (p0.5)
  • (00, 01, 11, 21) (p0.5)

22
Minimal Oblivious Routing - (Torus)
  • x 20
  • One possible route
  • (00, 10, 20, 21) (p1)

23
Minimal Oblivious Routing - (Torus)
  • x 21
  • Three possible routes
  • (00, 01, 11, 21) (p0.33)
  • (00, 10, 20, 21) (p0.33)
  • (00, 10, 11, 21) (p0.33)

24
Minimal Oblivious Routing - (Torus)
  • Adding the probabilities on each channel
  • Example, link (00,01)
  • P1/3, x 00
  • P1, x 01
  • P0, x 10
  • P1/2, x 11
  • P0, x 20
  • P1/3, x 21
  • P(00,01)(21/31/21)/6
  • 2.17/6

25
Minimal Oblivious Routing - (Torus)
  • Results
  • Load is not very balanced
  • Path between node 10 and 11 is very seldomly used
  • Good locality performance is achieved at expense
    of worst-case performance

26
  • Adaptive Routing
  • (route influenced by traffic along the way)

27
Adaptive Routing
  • Uses network state to make routing decisions
  • Buffer occupancies often used
  • Couple with flow control mechanism
  • Local information readily available
  • Global information more costly to obtain
  • Network state can change rapidly
  • Use of local information can lead to non-optimal
    choices
  • Can be minimal or non-minimal

28
Adaptive RoutingLocal Information not enough
  • In each cycle
  • Node 5 sends packet to node 6
  • Node 3 sends packet to node 7

29
Adaptive RoutingLocal Information not enough
  • Node 3 does not know about the traffic between 5
    and 6 before the input buffers between node 3 and
    5 are completely filled with packets!

30
Adaptive RoutingLocal Information is not enough
  • Adaptive flow works better with smaller buffers,
    since small buffers fill faster and thus
    congestion is propagated earlier to the sensing
    node (stiff backpressure)

31
Adaptive Routing
  • How does the adaptive routing algorithm sense the
    state of the network?
  • It can only sense current local information
  • Global information is based on historic local
    information
  • Changes in the traffic flow in the network are
    observed much later

32
Minimal Adaptive Routing
  • Minimal adaptive routing chooses among the
    minimal routes from source s to destination d

33
Minimal Adaptive Routing
  • At each hop a routing function generates a
    productive output vector that identifies which
    output channels of the current node will move the
    packet closer to its destination
  • Network state is then used to select one of these
    channels for the next hop

34
Minimal Adaptive Routing
  • Good at locally balancing load
  • Poor at globally balancing load
  • Minimal adaptive routing algorithms are unable to
    avoid congestion of source-destination pairs with
    no minimal path diversity.

35
Fully Adaptive Routing
  • Fully-Adaptive Routing does not restrict packets
    to take the shortest path
  • Misrouting is allowed
  • This can help to avoid congested areas and
    improves load balance

36
Fully Adaptive RoutingLive-Lock
  • Fully-Adaptive Routing may result in live-lock!
  • Mechanisms must be added to prevent livelock
  • Misrouting may only be allowed a fixed number of
    times

37
Summary of Routing Algorithms
  • Deterministic routing is a simple and inexpensive
    routing algorithm, but does not utilize path
    diversity and thus is weak on load balancing
  • Oblivious algorithms give often good results
    since they allow load balancing and their effects
    are easy to analyse
  • Adaptive algorithms, though in theory superior,
    suffer from that global information is not
    available at a local node

38
Summary of Routing Algorithms
  • Latency paramount concern
  • Minimal routing most common for NoC
  • Non-minimal can avoid congestion and deliver low
    latency
  • To date NoC research favors DOR for simplicity
    and deadlock freedom
  • Only covered unicast routing
  • Recent work on extending on-chip routing to
    support multicast

39
  • Part 4
  • NoC Routing Mechanisms

40
Routing
The term routing mechanics refers to the
mechanism that is used to implement any routing
algorithm
  • Two approaches
  • Fixed routing tables at the source or at each hop
  • Algorithmic routing uses specialized hardware to
    compute the route or next hop at run-time

41
Table-based Routing
  • Two approaches
  • Source-table routing implements all-at-once
    routing by looking up the entire route at the
    source
  • Node-table routing performs incremental routing
    by looking up the hop-by-hop routing relation at
    each node along the route
  • Major advantage
  • A routing table can support any routing relation
    on any topology

42
Table-based Routing
Example routing mechanism for deterministic
source routing NoCs. The NI uses a LUT to store
the route map.
43
Source Routing
  • All routing decisions are made at the source
    terminal
  • To route a packet
  • the table is indexed using the packet destination
  • a route or a set of routes are returned
  • one route is selected
  • the route is prepended to the packet
  • Because of its speed, simplicity and scalability
    source routing is very often used for
    deterministic and oblivious routing

44
Source Routing - Example
  • The example shows a routing table for a 4x2 torus
    network
  • In this example there are two alternative routes
    for each destination
  • Each node has its own routing table

4x2 torus network
In this example the order of XY should be the
opposite, i.e. 21-gt12
Source routing table for node 00 of 4x2 torus
network
Destination Route 0 Route 1
00 X X
10 EX WWWX
20 EEX WWX
30 WX EEEX
01 NX SX
11 NEX ENX
21 NEEX WWNX
31 NWX WNX
Example -Routing from 00 to 21 -Table is indexed
with 21 -Two routes NEEX and WWNX -The source
arbitrarily selects NEEX
index
select
45
Arbitrary Length Encoding of Source Routes
  • Advantage
  • It can be used for arbitrary-sized networks
  • The complexity of routing is moved from the
    network nodes to the terminal nodes
  • But routers must be able to handle arbitrary
    length routes

46
Arbitrary Length-Encoding
  • Router has
  • 16-bit phits
  • 32-bit flits
  • Route has 13 hops NENNWNNENNWNN
  • Extra symbols
  • P Phit continuation selector
  • F Flit continuation Phit
  • The tables entries in the terminals must be of
    arbitrary length

47
Node-Table Routing
  • Table-based routing can also be performed by
    placing the routing table in the routing nodes
    rather than in the terminals
  • Node-table routing is appropriate for adaptive
    routing algorithms, since it can use state
    information at each node

48
Node-Table Routing
  • A table lookup is required, when a packet arrives
    at a router, which takes additional time compared
    to source routing
  • Scalability is sacrificed, since different nodes
    need tables of varying size
  • Difficult to give two packets arriving from a
    different node a different way through the
    network without expanding the tables

49
Example
N
E
  • Table shows a set of routing tables
  • There are two choices from a source to a
    destination

Routing Table for Node 00
Note Bold font ports are misroutes
50
Example
Livelock can occur
A packet passing through node 00 destined for
node 11. If the entry for (00-gt11) is N , go to
10 and (10-gt 11) is S gt 00 lt-gt 10 (livelock)
51
Algorithmic Routing
  • Instead of using a table, algorithms can be used
    to compute the next route
  • In order to be fast, algorithms are usually not
    very complicated and implemented in hardware

52
Algorithmic Routing - Example
  • Dimension-Order Routing
  • sx and sy indicated the preferred directions
  • sx0, x sx1, -x
  • sy0, y sy1, -y
  • x and y represent the number of hops in x and y
    direction
  • The PDV is used as an input for selection of a
    route

Determines the type of the routing
Indicates which channels advance the packet
53
Algorithmic Routing - Example
  • A minimal oblivious router - Implemented by
    randomly selecting one of the active bits of the
    PDV as the selected direction
  • Minimal adaptive router - Achieved by making
    selection based on the length of the respective
    output Qs.
  • Fully adaptive router Implemented by picking up
    unproductive direction if Qs gt threshold results

54
Summary
  • Routing Mechanics
  • Table based routing
  • Source routing
  • Node-table routing
  • Algorithmic routing

55
Exercise
  • Compression of source routes. In the source
    routes, each port selector symbol N,S,W,E, and
    X was encoded with three bits. Suggest an
    alternative encoding to reduce the average length
    (in bits) required to represent a source route.
    Justify your encoding in terms of typical routes
    that might occur on a torus. Also compare the
    original three bits per symbol with your encoding
    on the following routes
  • NNNNNEEX
  • WNEENWWWWWNX

56
  • Part 5
  • NoC Flow Control
  • Resources in a Network Node
  • Bufferless Flow Control
  • Buffered Flow control

57
Flow Control (FC)
FC determines how the resources of a network,
such as channel bandwidth and buffer capacity are
allocated to packets traversing a network.
  • Goal is to use resources as efficient as possible
    to allow a high throughput
  • An efficient FC is a prerequisite to achieve a
    good network performance

58
Flow Control
  • FC can be viewed as a problem of
  • Resource allocation
  • Contention resolution
  • Resources in form of channels, buffers and state
    must be allocated to each packet
  • If two packets compete for the same channel flow
    control can only assign the channel to one
    packet, but must also deal with the other packet

59
Flow Control
  • Flow Control can be divided into
  • Bufferless flow control
  • Packets are either dropped or misrouted
  • Buffered flow control
  • Packets that cannot be routed via the desired
    channel are stored in buffers

60
Resources in a Network Node
  • Control State
  • Tracks the resources allocated to the packet in
    the node and the state of the packet
  • Buffer
  • Packet is stored in
    a buffer before it is

    send to next node
  • Bandwidth
  • To travel to the next node bandwidth has to be
    allocated for the packet

61
Units of Resource Allocation -Packet or Flits?
  • Contradictory requirements on packets
  • Packets should be very large in order to reduce
    overhead of routing and sequencing
  • Packets should be very small to allow efficient
    and fine-grained resource allocation and minimize
    blocking latency
  • Flits try to eliminate this conflict
  • Packets can be large (low overhead)
  • Flits can be small (efficient resource allocation)

62
Units of Resource Allocation - Size Phit, Flit,
Packet
  • There are no fixed rules for the size of phits,
    flits and packets
  • Typical values
  • Phits 1 bit to 64 bits
  • Flits 16 bits to 512 bits
  • Packets 128 bits to 1024 bits

63
Bufferless Flow Control
  • No buffers ?less implementation cost
  • If more than 1 packet shall be routed to the same
    output, 1 has to be
  • Misrouted or
  • Dropped
  • Example two
  • packets A, and B
  • (consisting
  • of several flits) arrive at a network node.

64
Bufferless Flow Control
  • Packet B is dropped and must be resended
  • There must be a protocol that informs the sending
    node that the packet has been dropped
  • Example Resend after no acknowledge has been
    received within a given time

65
Bufferless Flow Control
  • Packet B is misrouted
  • No further action is required here, but at the
    receiving node packets have to be sorted into
    original order

66
Circuit Switching
  • Circuit-Switching is a bufferless flow control,
    where several channels are reserved to form a
    circuit
  • A request (R) propagates from source to
    destination, which is answered by an
    acknowledgement (A)
  • Then data is sent (here two five flit packets
    (D)) and a tail flit (T) is sent to deallocate
    the channels

67
Circuit Switching
  • Circuit-switching does not suffer from dropping
    or misrouting packets
  • However there are two weaknesses
  • High latency T 3 H tr L/b
  • Low throughput, since channel is used to a large
    fraction of time for signaling and not for
    delivery of the payload

68
Circuit Switching Latency
T 3 H tr L/b
Where H time required to set up the channel
and delivers the head flit tr serialization
latency L time of flight b contention time
Note 3 x header latency because the path from
source to destination must be traversed 3 times
to deliver the packet Once in each direction to
set up the circuit and then again to deliver the
first flit
69
Buffered Flow Control
  • More efficient flow control can be achieved by
    adding buffers
  • With sufficient buffers packets do not need to be
    misrouted or dropped, since packets can wait for
    the outgoing channel to be ready

70
Buffered Flow Control
  • Two main approaches
  • Packet-Buffer Flow Control
  • Store-And-Forward
  • Cut-Through
  • Flit-Buffer Flow Control
  • Wormhole Flow Control
  • Virtual Channel Flow Control

71
Store Forward Flow Control
  • Each node along a route waits until a packet is
    completely received (stored) and then the packet
    is forwarded to the next node
  • Two resources are needed
  • Packet-sized buffer in the switch
  • Exclusive use of the outgoing channel

72
Store Forward Flow Control
  • Advantage While waiting to acquire resources, no
    channels are being held idle and only a single
    packet buffer on the current node is occupied
  • Disadvantage Very high latency
  • T H (tr L/b)

73
Cut-Through Flow Control
  • Advantages
  • Cut-through
    reduces the latency
  • T H tr L/b
  • Disadvantages
  • No good utilization of buffers, since they are
    allocated in units of packets
  • Contention latency is increased, since packets
    must wait until a whole packet leaves the
    occupied channel

74
Wormhole Flow Control
  • Wormhole FC operates like cut-through, but with
    channel and buffers allocated to flits rather
    than packets
  • When the head flit arrives at a node, it must
    acquire resources (VC, B,) before it can be
    forwarded to the next node
  • Tail flits behave like body flits, but release
    also the channel

75
Wormhole (WH) Flow Control
  • Virtual channels hold the state needed to
    coordinate the handling of flits of a packet over
    a channel
  • Comparison to cut-through
  • wormhole flow control makes far more efficient
    use of buffer space
  • Throughput maybe less, since wormhole flow
    control may block a channels mid-packets

76
Example for WH Flow Control
  • Input virtual channel is in idle state (I)
  • Upper output channel is occupied, allocated to
    lower channel (L)

77
Example for WH Flow Control
  • Input channel enters waiting state (W)
  • Head flit is buffered

78
Example for WH Flow Control
  • Body flit is also buffered
  • No more flits can be buffered, thus congestion
    arises if more flits want to enter the switch

79
Example for WH Flow Control
  • Virtual channel enters active state (A)
  • Head flit is output on upper channel
  • Second body flit is accepted

80
Example for WH Flow Control
  • First body flit is output
  • Tail flit is accepted

81
Example for WH Flow Control
  • Second body flit is output

82
Example for WH Flow Control
  • Tail flit is output
  • Virtual channels is deallocated and returns to
    idle state

83
Wormhole Flow Control
  • The main advantage of wormhole to cut-through is
    that buffers in the routers do not need to be
    able to hold full packets, but only need to store
    a number of flits
  • This allows to use smaller and faster routers

84
  • Part 6
  • NoC Flow Control (continued)
  • Blocking
  • Virtual Channel-Flow Control
  • Virtual Channel Router
  • Credit-Based Flow Control
  • On/Off Flow Control
  • Flow Control Summary

85
Blocking - Cut-Through and Wormhole
Cut-Through (Buffer-Size 1 Packet)
Blocked
Wormhole (Buffer-Size 2 Flits)
Blocked
  • If a packet is blocked, the flits of the wormhole
    packet are stored in different routers

86
Wormhole Flow Control
  • There is only one virtual channel for each
    physical channel
  • Packet A is blocked and cannot acquire channel p
  • Though channels p and q are idle packet A cannot
    use these channels since B owns channel p

87
Virtual Channel-Flow Control
  • In virtual channel flow-control several channels
    are associated with a single physical channel
  • This allows to use the bandwidth that otherwise
    is left idle when a packet blocks the channel
  • Unlike wormhole flow control subsequent flits
    are not guaranteed bandwidth, since they have to
    compete for bandwidth with other flits

88
Virtual Channel Flow Control
  • There are several virtual channels for each
    physical channel
  • Packet A can use a second virtual channel and
    thus proceed over channel p and q

89
Virtual Channel Allocation
  • Flits must be delivered in order, H, B, B, T.
  • Only the head flit carries routing information
  • Allocate VC at the packet level, i.e.,
    packet-by-packet
  • The head flit responsible for allocating VCs
    along the route.
  • Body and tail flits must follow the VC path, and
    the tail flit releases the VCs.
  • The flits of a packet cannot interleave with
    those of any other packet

90
Virtual Channel Flow Control -Fair Bandwidth
Arbitration
  • VCs interleave their flits ? Results in a high
    average latency

91
Virtual Channel Flow Control -Winner-Take-All
Arbitration
  • A winner-take all arbitration reduces the average
    latency with no throughput penalty

92
Virtual Channel Flow Control -Buffer Storage
  • Buffer storage is organized in two dimensions
  • Number of virtual channels
  • Number of flits that can be buffered per channel

93
Virtual Channel Flow Control - Buffer Storage
  • Virtual channel buffer shall at least be as deep
    as needed to cover round-trip credit latency
  • In general it is usually better to add more
    virtual channels than to increase the buffer size

94
Virtual Channel
A active W waiting I idle
95
Virtual Channel Router
96
Buffer Organization
Single buffer per input
Multiple fixed length queues per physical channel
97
Buffer Management
  • In buffered CF nodes there is a need for
    communication between nodes in order to inform
    about the availability of buffers
  • Backpressure informs upstream nodes that they
    must stop sending to a downstream node when the
    buffers of that downstream node are full

Traffic Flow
upstream node
downstream node
98
Credit-Based Flow Control
  • The upstream router keeps a count of the number
    of free flit buffers in each virtual channel
    downstream
  • Each time the upstream router forwards a flit, it
    decrements the counter
  • If a counter reaches zero, the downstream buffer
    is full and the upstream node cannot send a new
    flit
  • If the downstream node forwards a flit, it frees
    the associated buffer and sends a credit to the
    upstream buffer, which increments its counter

99
Credit-Based Flow Control
100
Credit-Based Flow Control
  • The minimum time between the credit being sent at
    time t1 and a credit send for the same buffer at
    time t5 is the credit round-trip delay tcrt

All buffers on the downstream are full
101
Credit-Based Flow Control
  • If there is only a single flit buffer, a flit
    waits for a new credit and the maximum throughput
    is limited to one flit for each tcrt
  • The bit rate would be then Lf / tcrt where Lf is
    the length of a flit in bits

102
Credit-Based Flow Control
  • If there are F flit buffers on the virtual
    channel, F flits could be sent before waiting for
    the credit, which gives a throughput of F flits
    for each tcrt and a bit rate of FLf / tcrt

103
Credit-Based Flow Control
  • In order not to limit the throughput by low level
    flow control the flit buffer should be at least
  • where b is the bandwidth of a channel

104
Credit-Based Flow Control
  • For each flit sent downstream a corresponding
    credit is set upstream
  • Thus there is a large amount of upstream
    signaling, which especially for small flits can
    represent a large overhead!

105
On/Off Flow Control
  • On/off Flow control tries to reduce the amount of
    upstream signaling
  • An off signal is sent to the upstream node, if
    the number of free buffers falls below the
    threshold Foff
  • An on signal is sent to the upstream node, if
    the number of free buffers rises above the
    threshold Fon
  • With carefully dimensioned buffers on/off flow
    control can achieve a very low overhead in form
    of upstream signaling

106
Ack/Nack Flow Control
  • In ack/nack flow control the upstream node sends
    packets without knowing, if there are free
    buffers in the downstream node

107
Ack/Nack Flow Control
  • If there is no buffer available
  • the downstream node sends nack and drops the
    flit
  • the flit must be resent
  • flits must be reordered at the downstream node
  • If there is a buffer available
  • The downstream node sends ack and stores the flit
    in a buffer

108
Buffer Management
  • Because of its buffer and bandwidth inefficiency
    ack/nack is rarely used
  • Credit-based flow control is used in systems with
    small numbers of buffers
  • On/off flow control is used in systems that have
    large numbers of flit buffers

109
Flow Control Summary
  • Bufferless flow control
  • Dropping, misroute packets
  • Circuit switching
  • Buffered flow control
  • Packet-Buffer Flow Control SAF vs. Cut Through
  • Flit-Buffer Flow Control Wormhole and Virtual
    Channel
  • Switch-to-switch (link level) flow control
  • Credit-based, On/Off, Ack/Nack

110
Part 7
  • Router Architecture
  • Virtual-channel Router
  • Virtual channel state fields
  • The Router Pipeline
  • Pipeline Stalls

111
Router Microarchitecture -Virtual-channel Router
  • Modern routers are pipelined and work at the flit
    level
  • Head flits proceed through buffer stages that
    perform routing and virtual channel allocation
  • All flits pass through switch allocation and
    switch traversal stages
  • Most routers use credits to allocate buffer space

112
Typical Virtual Channel Router
  • A routers functional blocks can be divided into
  • Datapath handles storage and movement of a
    packets payload
  • Input buffers
  • Switch
  • Output buffers
  • Control coordinating the movements of the
    packets through the resources of the datapath
  • Route Computation
  • VC Allocator
  • Switch Allocator

113
Typical Virtual Channel Router
  • The input unit contains a set of flit buffers
  • Maintains the state for each virtual channel
  • G Global State
  • R Route
  • O Output VC
  • P Pointers
  • C Credits

114
Virtual Channel State Fields(Input)
115
Typical Virtual Channel Router
  • During route computation the output port for the
    packet is determined
  • Then the packet requests an output virtual
    channel from the virtual-channel allocator

116
Typical Virtual Channel Router
  • Flits are forwarded via the virtual channel by
    allocating a time slot on the switch and output
    channel using the switch allocator
  • Flits are forwarded to the appropriate output
    during this time slot
  • The output unit forwards the flits to the next
    router in the packets path

117
Virtual Channel State Fields(Output)
118
Packet Rate and Flit Rate
  • The control of the router operates at two
    distinct frequencies
  • Packet Rate (performed once per packet)
  • Route computation
  • Virtual-channel allocation
  • Flit Rate (performed once per flit)
  • Switch allocation
  • Pointer and credit count update

119
The Router Pipeline
  • A typical router pipeline includes the following
    stages
  • RC (Routing Computation)
  • VC (Virtual Channel Allocation)
  • SA (Switch Allocation)
  • ST (Switch Traversal

no pipeline stalls
120
The Router Pipeline
  • Cycle 0
  • Head flit arrives and the packet is directed to
    an virtual channel of the input port (G I)

no pipeline stalls
121
The Router Pipeline
  • Cycle 1
  • Routing computation
  • Virtual channel state changes to routing (G R)
  • Head flit enters RC-stage
  • First body flit arrives at router

no pipeline stalls
122
The Router Pipeline
  • Cycle 2 Virtual Channel Allocation
  • Route field (R) of virtual channel is updated
  • Virtual channel state is set to waiting for
    output virtual channel (G V)
  • Head flit enters VA state
  • First body flit enters RC stage
  • Second body flit arrives at router

no pipeline stalls
123
The Router Pipeline
  • Cycle 2 Virtual Channel Allocation
  • The result of the routing computation is input to
    the virtual channel allocator
  • If successful, the allocator assigns a single
    output virtual channel
  • The state of the virtual channel is set to active
    (G A

no pipeline stalls
124
The Router Pipeline
  • Cycle 3 Switch Allocation
  • All further processing is done on a flit base
  • Head flit enters SA stage
  • Any active VA (G A) that contains buffered
    flits (indicated by P) and has downstream buffers
    available (C gt 0) bids for a single-flit time
    slot through the switch from its input VC to the
    output VC

no pipeline stalls
125
The Router Pipeline
  • Cycle 3 Switch Allocation
  • If successful, pointer field is updated
  • Credit field is decremented

no pipeline stalls
126
The Router Pipeline
  • Cycle 4 Switch Traversal
  • Head flit traverses the switch
  • Cycle 5
  • Head flit starts traversing the channel to the
    next router

no pipeline stalls
127
The Router Pipeline
  • Cycle 7
  • Tail traverses the switch
  • Output VC set to idle
  • Input VC set to idle (G I), if buffer is empty
  • Input VC set to routing (G R), if another head
    flit is in the buffer

no pipeline stalls
128
The Router Pipeline
  • Only the head flits enter the RC and VC stages
  • The body and tail flits are stored in the flit
    buffers until they can enter the SA stage

no pipeline stalls
129
Pipeline Stalls
  • Pipeline stalls can be divided into
  • Packet stalls
  • can occur if the virtual channel cannot advance
    to its R, V, or A state
  • Flit stalls
  • If a virtual channel is in active state and the
    flit cannot successfully complete switch
    allocation due to
  • Lack of flit
  • Lack of credit
  • Losing arbitration for the switch time slot

130
Example for Packet Stall
  • Virtual-channel allocation stall
  • Head flit of A can first enter the VA stage when
    the tail flit of packet B completes switch
    allocation and releases the virtual channel

131
Example for Packet Stall
  • Virtual-channel allocation stall

Head flit of A can first enter the VA stage when
the tail flit of packet B completes switch
allocation and releases the virtual channel
132
Example for Flit Stalls
Switch allocation stall
Second body flit fails to allocate the requested
connection in cycle 5
133
Example for Flit Stalls
Buffer empty stall
Body flit 2 is delayed three cycles. However,
since it does not have to enter the RC and VA
stage the output is only delayed one cycle!
134
Credits
  • A buffer is allocated in the SA stage on the
    upstream (transmitting) node
  • To reuse the buffer, a credit is returned over a
    reverse channel after the same flit departs the
    SA stage of the downstream (receiving) node
  • When the credit reaches the input unit of the
    upstream node the buffer is available can be
    reused

135
Credits
  • The credit loop can be viewed by means of a token
    that
  • Starting at the SA stage of the upstream node
  • Traveling downwards with the flit
  • Reaching the SA stage at the downstream node
  • Returning upstream as a credit

136
Credit Loop Latency
  • The credit loop latency tcrt, expressed in flit
    times, gives a lower bound on the number of flit
    buffers needed on the upstream size for the
    channel to operate with full bandwidth
  • tcrt in flit times is given by

137
Credit Loop Latency
  • If the number of buffers available per virtual
    channel is F, the duty factor of the channel will
    be
  • d min (1, F / tcrt)
  • The duty factor will be 100 as long as there are
    sufficient flit buffers to cover the round trip
    latency

138
Credit Stall
Virtual Channel Router with 4 flit buffers
139
Flit and Credit Encoding
  • Flits and credits are send over separated lines
    with separate width
  • Flits and credits are transported via the same
    line. This can be done by
  • Including credits into flits
  • Multiplexing flits and credits at phit level
  • Option (A) is considered more efficient. For a
    more detailed discussion check Section 16.6 in
    the Dally-book

140
Summary
  • NoC is a scalable platform for billion-transistor
    chips
  • Several driving forces behind it
  • Many open research questions
  • May change the way we structure and model VLSI
    systems

141
References
  • OASIS NoC Architecture Design in Verilog HDL,
    Technical Report,TR-062010-OASIS, Adaptive
    Systems Laboratory, the University of Aizu, June
    2010.
  • OASIS NoC Project
  • http//web-ext.u-aizu.ac.jp/benab/research/projec
    ts/oasis/

142
  • Network-on-Chip
  • Ben Abdallah, Abderazek
  • The University of Aizu
  • E-mail benab_at_u-aizu.ac.jp

KUST University, March 2011
Write a Comment
User Comments (0)
About PowerShow.com