Dynamic Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Dynamic Networks

Description:

Dynamic Network is the network that can connect any input to any output by ... TCP Offload: Offload TCP/IP Checksum and Segmentation to Interface hardware or ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 44
Provided by: david3085
Learn more at: http://www.cs.ucr.edu
Category:

less

Transcript and Presenter's Notes

Title: Dynamic Networks


1
Dynamic Networks
  • L.N. Bhuyan
  • Partly from Berkeley Notes

2
What is Dynamic Network
  • Dynamic Network is the network that can connect
    any input to any output by enabling or disabling
    some switches in the network
  • Examples
  • - Shared Bus The bus arbiter connects a
    processor to a memory
  • - Crossbar Consists of a lot of switching
    elements, which can be enabled to connect many
    inputs to many outputs simultaneously
  • - Multistage Network Consists of several
    stages of switches that are enabled to get
    connections
  • - The nodes in static networks (like Mesh)
    also consist of dynamic crossbars

3
Crossbar Switch Design
  • Complexity O(N2) for an NXN Crossbar Why?
    See next page

4
How do you build a crossbar
From Control
N2 switches gt Cost O(N2) Time taken by the
arbiter O(N2)
Multiplexors are controlled from controller
5
Crossbar Contd.
  • An NXN Crossbar allows all N inputs to be
    connected simultaneously to all N outputs
  • It allows all one-to-one mappings, called
    permutations. No. of permutations N!
  • When two or more inputs request the same output,
    only one of them is connected and others are
    either dropped or buffered
  • When processors access memories through crossbar,
    this situation is called memory access conflicts
  • Given p as the probability of request by a
    processor per cycle and assuming that a
    processors request is uniformly directed to all
    N memories, the average number of connections
    allowed per cycle, called Bandwidth (BW) is
  • BW N1(1-p/N)(N-1) Derive this!!!

6
Input buffered swtich
  • Independent routing logic per input - FSM
  • Scheduler logic arbitrates each output -
    priority, FIFO, random
  • Head-of-line blocking problem The head packet
    in a buffer cannot depart because the output is
    busy with another packet. The second packet may
    be destined to an output that is free, but cannot
    depart due to blocking by the first packet gt One
    solution is to create multiple input queues, one
    per output, called Virtual Output Queuing
    adopted in most routers.
  • Scheduler Design How to ensure maximum
    simultaneous connections is a challenging
    research area.

7
Problems with Input-Buffered Switch
  • FIFO Input buffers give rise to Head of the Line
    (HOL) problem
  • Current routers employ a separate input queue for
    each output, called virtual output queue (VOQ)
  • Then how to schedule the packets from different
    VOQs for transmission?

8
VOQ-based Input Buffered Switch
9
Scheduling in Input Buffered Switch
  • n independent arbitration problems?
  • static priority, random, round-robin
  • simplifications due to routing algorithm?
  • general case is max bipartite matching
    Iterative algorithms iSLIP in Cisco

10
Iterative Matching A 3-step Procedure
Request
Accept
Grant
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
Fair Scheduling in Crossbar (Infocom 2002)
  • Motivation
  • Current routers employ fair scheduling at the
    output link, but with high link speed there are
    very few packets at the output buffer. These
    packets were selected by the crossbar with equal
    probability from the input buffers.
  • Many more packets are waiting in the input
    queues. Choosing packets during arbitration
    depending on the reservation will ensure better
    QoS among competing flows at the input buffers.

16
The iFS Algorithm
Initially, all inputs and outputs are considered
as unmatched and none of the inputs have any
candidates.Then in each iteration Grant stage
Each unmatched output selects a flow with the
smallest virtual time for its
head-of-line cell and marks the cell
as a candidate for the corresponding
input. Grant signal is then given to the
input. Accept stage Each unmatched input
examines its candidate set, selects
a winner according to age and sends an
accept signal to its output. The input and
output are then considered
as matched. Reset the candidate set
to empty.
17
Output/Shared Buffered Switch
Shared Buffer
RAM speed has to be N times the link speed.
Output Buffered Switch has buffers at output to
store packets. There is always a minimal
transmitting buffer at the input. What happens if
there are 2 or more packets to the same output at
the same time. In order to capture both, the
switch speed has to be N times that of link speed
gt Difficult to design.
18
Shared Buffer Switch IBM SP Vulcan switch
  • Many gigabit Ethernet switches use similar design
    without the cut-through
  • 128 8-byte chunks in central queue, LRU per
    output

19
SGI SPIDER IEEE Micro Jan 1997
20
Flow Control
  • What do you do when push comes to shove?
  • Ethernet collision detection and retry after
    delay
  • FDDI, token ring arbitration token
  • TCP/WAN buffer, drop, adjust rate
  • any solution must adjust to output rate
  • Link-level flow control

21
Examples
  • Short Links
  • long links
  • several flits on the wire

22
Multistage Interconnection Network
  • A network consisting of multiple stages of
    crossbar switches has the following properties.
  • NxN network for N2n
  • Consists of log2N stages of 2x2 switches
  • Has N/2 2x2 switches per stage
  • Cost O(N log n) instead of O(N2) for Crossbar
  • For N an, a MIN can be similarly designed with
    axa switches

23
Multistage interconnection networks
0
000
1
1
001
2
010
1
3
011
4
100
5
101
6
110
0
7
111
Omega Network Complexity O(Nlog2N)
24
000
000
000
000
0
001
001
001
001
1
010
010
010
010
2
011
011
011
011
3
100
100
100
100
4
101
101
101
101
5
110
110
110
110
6
111
111
111
111
7
(a) Perfect shuffle
(b) Inverse perfect shuffle
shuffle interconnection S(an-1 an-2 a1 a0)
(an-2 an-3 a0 an-1 )
25
Omega Network
  • Every stage of switches is preceded by a perfect
    shuffle interconnection
  • S(an-1 an-2 a1 a0) (an-2 an-3 a0 an-1 )
  • An input can be connected to a straight or
    exchange output in a 2x2 switch.
  • E(an-1 an-2 a1 a0) (an-1 an-2 a1 a0)
  • To route a message/packet in an Omega network,
    the destination tag which is binary equivalent of
    the destination is used, (dn-1 dn-2 d1 d0). The
    ith bit di is used to control the routing at the
    ith stage counted from the right with 0 lt i lt
    n-1. If di 0, the input is connected to the
    upper output. If di 1, it is connected to the
    lower output.

26
Self Routing
  • A processor generates a tag that is binary
    equivalent of the destination
  • MSB controls the leftmost stage and the lsb
    controls the rightmost stage of the Omega
    network. A small controller inside the 2 x 2
    switch senses this bit and enables the connection
  • If bit ci 0, the request is to the upper
    output if it is 1, the request is to the lower
    output.
  • Based on digit if switch size is greater than 2
  • Network conflict - Select Round Robin
  • Less Bandwidth than crossbar, but more cost
    effective
  • What about QoS? Future research

27
Theorem The Omega network is self routing
  • Let source be (sn-1sn-2 s2 s1s0) and
    destination be (dn-1dn-2 d2 d1d0). Before
    Stage 1, the source is switched to the position
    (sn-2sn-3 s1 s0sn-1) due to perfect shuffle
    connection. After Stage 1 it is switched to
    (sn-2sn-3 s1 s0dn-1) as per the (n-1)th of
    the destination.
  • Before 2nd stage of the switches, the source is
    connected to (sn-3 s0dn-1sn-2) as after 2nd
    stage it becomes (sn-3 s0dn-1dn-2)
  • If we continue like this for n stages, the
    source matches (dn-1dn-2 di d1d0) which is
    the destination.

28
Example SP
  • 8-port switch, 40 MB/s per link, 8-bit phit,
    16-bit flit, single 40 MHz clock
  • packet sw, cut-through, no virtual channel,
    source-based routing
  • variable packet lt 255 bytes, 31 byte fifo per
    input, 7 bytes per output, 16 phit links

29
Summary
  • Routing Algorithms restrict the set of routes
    within the topology
  • simple mechanism selects turn at each hop
  • arithmetic, selection, lookup
  • Deadlock-free if channel dependence graph is
    acyclic
  • limit turns to eliminate dependences
  • add separate channel resources to break
    dependences
  • combination of topology, algorithm, and switch
    design
  • Deterministic vs. adaptive routing
  • Switch design issues
  • input/output/pooled buffering, routing logic,
    selection logic
  • Flow control
  • Real networks are a package of design choices

30
Protocols HW/SW Interface
  • Internetworking allows computers on independent
    and incompatible networks to communicate reliably
    and efficiently
  • Enabling technologies SW standards that allow
    reliable communications without reliable networks
  • Hierarchy of SW layers, giving each layer
    responsibility for portion of overall
    communications task, called protocol families or
    protocol suites
  • Transmission Control Protocol/Internet Protocol
    (TCP/IP)
  • This protocol family is the basis of the Internet
  • IP makes best effort to deliver TCP guarantees
    delivery
  • TCP/IP used even when communicating locally NFS
    uses IP even though communicating across
    homogeneous LAN

31
TCP/IP packet
  • Application sends message
  • TCP breaks into 64KB segements, adds 20B header
  • IP adds 20B header, sends to network
  • If Ethernet, broken into 1500B packets with
    headers, trailers
  • Header, trailers have length field, destination,
    window number, version, ...

Ethernet
IP Header
TCP Header
IP Data
TCP data ( 64KB)
32
Communicating with the Server The O/S Wall
  • Problems
  • O/S overhead to move a packet between network
    and application level gt Protocol Stack (TCP/IP)
  • O/S interrupt
  • Data copying from kernel space to user space and
    vice versa
  • Oh, the PCI Bottleneck!

33
The Send/Receive Operation
  • The application writes the transmit data to the
    TCP/IP sockets interface for transmission in
    payload sizes ranging from 4 KB to 64 KB.
  • The data is copied from the User space to the
    Kernel space
  • The OS segments the data into maximum
    transmission unit (MTU)size packets, and then
    adds TCP/IP header information to each packet.
  • The OS copies the data onto the network interface
    card (NIC) send queue.
  • The NIC performs the direct memory access (DMA)
    transfer of each data packet from the TCP buffer
    space to the NIC, and interrupts CPU activities
    to indicate completion of the transfer.

34
Transmitting data across the memory bus using a
standard NIC
http//www.dell.com/downloads/global/power/1q04-he
r.pdf
35
Timing Measurement in UDP Communication
X.Zhang, L. Bhuyan and W. Feng, Anatomy of UDP
and M-VIA for Cluster Communication JPDC,
October 2005
36
I/O Acceleration Techniques
  • TCP Offload Offload TCP/IP Checksum and
    Segmentation to Interface hardware or
    programmable device (Ex. TOEs) A TOE-enabled
    NIC using Remote Direct Memory Access (RDMA) can
    use zero-copy algorithms to place data directly
    into application buffers.
  • O/S Bypass User-level software techniques to
    bypass protocol stack Zero Copy Protocol
  • (Needs programmable device in the NIC for
    direct user level memory access Virtual to
    Physical Memory Mapping. Ex. VIA)
  • Architectural Techniques Instruction set
    optimization, Multithreading, copy engines,
    onloading, prefetching, etc.

37
Comparing standard TCP/IP and TOE enabled TCP/IP
stacks
(http//www.dell.com/downloads/global/power/1q04-h
er.pdf)
38
Chelsio 10 Gbs TOE
39
Cluster (Network) of Workstations/PCs
40
Myrinet Interface Card
41
InfiniBand Interconnection
  • Zero-copy mechanism. The zero-copy mechanism
    enables a user-level application to perform I/O
    on the InfiniBand fabric without being required
    to copy data between user space and kernel space.
  • RDMA. RDMA facilitates transferring data from
    remote memory to local memory without the
    involvement of host CPUs.
  • Reliable transport services. The InfiniBand
    architecture implements reliable transport
    services so the host CPU is not involved in
    protocol-processing tasks like segmentation,
    reassembly, NACK/ACK, etc.
  • Virtual lanes. InfiniBand architecture provides
    16 virtual lanes (VLs) to multiplex independent
    data lanes into the same physical lane, including
    a dedicated VL for management operations.
  • High link speeds. InfiniBand architecture defines
    three link speeds, which are characterized as 1X,
    4X, and 12X, yielding data rates of 2.5 Gbps, 10
    Gbps, and 30 Gbps, respectively.
  • Reprinted from Dell Power Solutions, October
    2004. BY ONUR CELEBIOGLU, RAMESH RAJAGOPALAN, AND
    RIZWAN ALI

42
InfiniBand system fabric
43
UDP Communication Life of a Packet
  • X. Zhang, L. Bhuyan and W. Feng, Anatomy of
    UDP and M-VIA for Cluster Communication Journal
    of Parallel and Distributed Computing (JPDC),
    Special issue on Design and Performance of
    Networks for Super-, Cluster-, and
    Grid-Computing, Vol. 65, Issue 10, October 2005,
    pp. 1290-1298.
Write a Comment
User Comments (0)
About PowerShow.com