Dynamic Networks - PowerPoint PPT Presentation

About This Presentation

Title:

Dynamic Networks

Description:

Dynamic Network is the network that can connect any input to any output by ... TCP Offload: Offload TCP/IP Checksum and Segmentation to Interface hardware or ... – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 44

Provided by: david3085

Learn more at: http://www.cs.ucr.edu

Category:

more less

Transcript and Presenter's Notes

Title: Dynamic Networks

1
Dynamic Networks

L.N. Bhuyan
Partly from Berkeley Notes

2
What is Dynamic Network

Dynamic Network is the network that can connect
any input to any output by enabling or disabling
some switches in the network
Examples
- Shared Bus The bus arbiter connects a
processor to a memory
- Crossbar Consists of a lot of switching
elements, which can be enabled to connect many
inputs to many outputs simultaneously
- Multistage Network Consists of several
stages of switches that are enabled to get
connections
- The nodes in static networks (like Mesh)
also consist of dynamic crossbars

3
Crossbar Switch Design

Complexity O(N2) for an NXN Crossbar Why?
See next page

4
How do you build a crossbar
From Control
N2 switches gt Cost O(N2) Time taken by the
arbiter O(N2)
Multiplexors are controlled from controller
5
Crossbar Contd.

An NXN Crossbar allows all N inputs to be
connected simultaneously to all N outputs
It allows all one-to-one mappings, called
permutations. No. of permutations N!
When two or more inputs request the same output,
only one of them is connected and others are
either dropped or buffered
When processors access memories through crossbar,
this situation is called memory access conflicts
Given p as the probability of request by a
processor per cycle and assuming that a
processors request is uniformly directed to all
N memories, the average number of connections
allowed per cycle, called Bandwidth (BW) is
BW N1(1-p/N)(N-1) Derive this!!!

6
Input buffered swtich

Independent routing logic per input - FSM
Scheduler logic arbitrates each output -
priority, FIFO, random
Head-of-line blocking problem The head packet
in a buffer cannot depart because the output is
busy with another packet. The second packet may
be destined to an output that is free, but cannot
depart due to blocking by the first packet gt One
solution is to create multiple input queues, one
per output, called Virtual Output Queuing
adopted in most routers.
Scheduler Design How to ensure maximum
simultaneous connections is a challenging
research area.

7
Problems with Input-Buffered Switch

FIFO Input buffers give rise to Head of the Line
(HOL) problem
Current routers employ a separate input queue for
each output, called virtual output queue (VOQ)
Then how to schedule the packets from different
VOQs for transmission?

8
VOQ-based Input Buffered Switch
9
Scheduling in Input Buffered Switch

n independent arbitration problems?
static priority, random, round-robin
simplifications due to routing algorithm?
general case is max bipartite matching
Iterative algorithms iSLIP in Cisco

10
Iterative Matching A 3-step Procedure
Request
Accept
Grant
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
Fair Scheduling in Crossbar (Infocom 2002)

Motivation
Current routers employ fair scheduling at the
output link, but with high link speed there are
very few packets at the output buffer. These
packets were selected by the crossbar with equal
probability from the input buffers.
Many more packets are waiting in the input
queues. Choosing packets during arbitration
depending on the reservation will ensure better
QoS among competing flows at the input buffers.

16
The iFS Algorithm
Initially, all inputs and outputs are considered
as unmatched and none of the inputs have any
candidates.Then in each iteration Grant stage
Each unmatched output selects a flow with the
smallest virtual time for its
head-of-line cell and marks the cell
as a candidate for the corresponding
input. Grant signal is then given to the
input. Accept stage Each unmatched input
examines its candidate set, selects
a winner according to age and sends an
accept signal to its output. The input and
output are then considered
as matched. Reset the candidate set
to empty.
17
Output/Shared Buffered Switch
Shared Buffer
RAM speed has to be N times the link speed.
Output Buffered Switch has buffers at output to
store packets. There is always a minimal
transmitting buffer at the input. What happens if
there are 2 or more packets to the same output at
the same time. In order to capture both, the
switch speed has to be N times that of link speed
gt Difficult to design.
18
Shared Buffer Switch IBM SP Vulcan switch

Many gigabit Ethernet switches use similar design
without the cut-through
128 8-byte chunks in central queue, LRU per
output

19
SGI SPIDER IEEE Micro Jan 1997
20
Flow Control

What do you do when push comes to shove?
Ethernet collision detection and retry after
delay
FDDI, token ring arbitration token
TCP/WAN buffer, drop, adjust rate
any solution must adjust to output rate
Link-level flow control

21
Examples

Short Links
long links
several flits on the wire

22
Multistage Interconnection Network

A network consisting of multiple stages of
crossbar switches has the following properties.
NxN network for N2n
Consists of log2N stages of 2x2 switches
Has N/2 2x2 switches per stage
Cost O(N log n) instead of O(N2) for Crossbar
For N an, a MIN can be similarly designed with
axa switches

23
Multistage interconnection networks
0
000
1
1
001
2
010
1
3
011
4
100
5
101
6
110
0
7
111
Omega Network Complexity O(Nlog2N)
24
000
000
000
000
0
001
001
001
001
1
010
010
010
010
2
011
011
011
011
3
100
100
100
100
4
101
101
101
101
5
110
110
110
110
6
111
111
111
111
7
(a) Perfect shuffle
(b) Inverse perfect shuffle
shuffle interconnection S(an-1 an-2 a1 a0)
(an-2 an-3 a0 an-1 )
25
Omega Network

Every stage of switches is preceded by a perfect
shuffle interconnection
S(an-1 an-2 a1 a0) (an-2 an-3 a0 an-1 )
An input can be connected to a straight or
exchange output in a 2x2 switch.
E(an-1 an-2 a1 a0) (an-1 an-2 a1 a0)
To route a message/packet in an Omega network,
the destination tag which is binary equivalent of
the destination is used, (dn-1 dn-2 d1 d0). The
ith bit di is used to control the routing at the
ith stage counted from the right with 0 lt i lt
n-1. If di 0, the input is connected to the
upper output. If di 1, it is connected to the
lower output.

26
Self Routing

A processor generates a tag that is binary
equivalent of the destination
MSB controls the leftmost stage and the lsb
controls the rightmost stage of the Omega
network. A small controller inside the 2 x 2
switch senses this bit and enables the connection
If bit ci 0, the request is to the upper
output if it is 1, the request is to the lower
output.
Based on digit if switch size is greater than 2
Network conflict - Select Round Robin
Less Bandwidth than crossbar, but more cost
effective
What about QoS? Future research

27
Theorem The Omega network is self routing

Let source be (sn-1sn-2 s2 s1s0) and
destination be (dn-1dn-2 d2 d1d0). Before
Stage 1, the source is switched to the position
(sn-2sn-3 s1 s0sn-1) due to perfect shuffle
connection. After Stage 1 it is switched to
(sn-2sn-3 s1 s0dn-1) as per the (n-1)th of
the destination.
Before 2nd stage of the switches, the source is
connected to (sn-3 s0dn-1sn-2) as after 2nd
stage it becomes (sn-3 s0dn-1dn-2)
If we continue like this for n stages, the
source matches (dn-1dn-2 di d1d0) which is
the destination.

28
Example SP

8-port switch, 40 MB/s per link, 8-bit phit,
16-bit flit, single 40 MHz clock
packet sw, cut-through, no virtual channel,
source-based routing
variable packet lt 255 bytes, 31 byte fifo per
input, 7 bytes per output, 16 phit links

29
Summary

Routing Algorithms restrict the set of routes
within the topology
simple mechanism selects turn at each hop
arithmetic, selection, lookup
Deadlock-free if channel dependence graph is
acyclic
limit turns to eliminate dependences
add separate channel resources to break
dependences
combination of topology, algorithm, and switch
design
Deterministic vs. adaptive routing
Switch design issues
input/output/pooled buffering, routing logic,
selection logic
Flow control
Real networks are a package of design choices

30
Protocols HW/SW Interface

Internetworking allows computers on independent
and incompatible networks to communicate reliably
and efficiently
Enabling technologies SW standards that allow
reliable communications without reliable networks
Hierarchy of SW layers, giving each layer
responsibility for portion of overall
communications task, called protocol families or
protocol suites
Transmission Control Protocol/Internet Protocol
(TCP/IP)
This protocol family is the basis of the Internet
IP makes best effort to deliver TCP guarantees
delivery
TCP/IP used even when communicating locally NFS
uses IP even though communicating across
homogeneous LAN

31
TCP/IP packet

Application sends message
TCP breaks into 64KB segements, adds 20B header
IP adds 20B header, sends to network
If Ethernet, broken into 1500B packets with
headers, trailers
Header, trailers have length field, destination,
window number, version, ...

Ethernet
IP Header
TCP Header
IP Data
TCP data ( 64KB)
32
Communicating with the Server The O/S Wall

Problems
O/S overhead to move a packet between network
and application level gt Protocol Stack (TCP/IP)
O/S interrupt
Data copying from kernel space to user space and
vice versa
Oh, the PCI Bottleneck!

33
The Send/Receive Operation

The application writes the transmit data to the
TCP/IP sockets interface for transmission in
payload sizes ranging from 4 KB to 64 KB.
The data is copied from the User space to the
Kernel space
The OS segments the data into maximum
transmission unit (MTU)size packets, and then
adds TCP/IP header information to each packet.
The OS copies the data onto the network interface
card (NIC) send queue.
The NIC performs the direct memory access (DMA)
transfer of each data packet from the TCP buffer
space to the NIC, and interrupts CPU activities
to indicate completion of the transfer.

34
Transmitting data across the memory bus using a
standard NIC
http//www.dell.com/downloads/global/power/1q04-he
r.pdf
35
Timing Measurement in UDP Communication
X.Zhang, L. Bhuyan and W. Feng, Anatomy of UDP
and M-VIA for Cluster Communication JPDC,
October 2005
36
I/O Acceleration Techniques

TCP Offload Offload TCP/IP Checksum and
Segmentation to Interface hardware or
programmable device (Ex. TOEs) A TOE-enabled
NIC using Remote Direct Memory Access (RDMA) can
use zero-copy algorithms to place data directly
into application buffers.
O/S Bypass User-level software techniques to
bypass protocol stack Zero Copy Protocol
(Needs programmable device in the NIC for
direct user level memory access Virtual to
Physical Memory Mapping. Ex. VIA)
Architectural Techniques Instruction set
optimization, Multithreading, copy engines,
onloading, prefetching, etc.

37
Comparing standard TCP/IP and TOE enabled TCP/IP
stacks
(http//www.dell.com/downloads/global/power/1q04-h
er.pdf)
38
Chelsio 10 Gbs TOE
39
Cluster (Network) of Workstations/PCs
40
Myrinet Interface Card
41
InfiniBand Interconnection

Zero-copy mechanism. The zero-copy mechanism
enables a user-level application to perform I/O
on the InfiniBand fabric without being required
to copy data between user space and kernel space.
RDMA. RDMA facilitates transferring data from
remote memory to local memory without the
involvement of host CPUs.
Reliable transport services. The InfiniBand
architecture implements reliable transport
services so the host CPU is not involved in
protocol-processing tasks like segmentation,
reassembly, NACK/ACK, etc.
Virtual lanes. InfiniBand architecture provides
16 virtual lanes (VLs) to multiplex independent
data lanes into the same physical lane, including
a dedicated VL for management operations.
High link speeds. InfiniBand architecture defines
three link speeds, which are characterized as 1X,
4X, and 12X, yielding data rates of 2.5 Gbps, 10
Gbps, and 30 Gbps, respectively.
Reprinted from Dell Power Solutions, October
2004. BY ONUR CELEBIOGLU, RAMESH RAJAGOPALAN, AND
RIZWAN ALI

42
InfiniBand system fabric
43
UDP Communication Life of a Packet

X. Zhang, L. Bhuyan and W. Feng, Anatomy of
UDP and M-VIA for Cluster Communication Journal
of Parallel and Distributed Computing (JPDC),
Special issue on Design and Performance of
Networks for Super-, Cluster-, and
Grid-Computing, Vol. 65, Issue 10, October 2005,
pp. 1290-1298.

Write a Comment

User Comments (0)