Hot interconnects tutorial - PowerPoint PPT Presentation

About This Presentation
Title:

Hot interconnects tutorial

Description:

Hot interconnects tutorial – PowerPoint PPT presentation

Number of Views:270
Slides: 127
Provided by: aks.05

less

Transcript and Presenter's Notes

Title: Hot interconnects tutorial


1
High Performance Switches and Routers Theory
and Practice
  • Hot Interconnects 7
  • August 20, 1999
  • Stanford University

Nick McKeown Assistant Professor of Electrical
Engineering and Computer Science nickm_at_stanford.e
du http//www.stanford.edu/nickm
2
Tutorial Outline
  • Introduction What is a Packet Switch?
  • Packet Lookup and Classification Where does a
    packet go next?
  • Switching FabricsHow does the packet get there?

3
IntroductionWhat is a Packet Switch?
  • Basic Architectural Components
  • Some Example Packet Switches
  • The Evolution of IP Routers

4
Basic Architectural Components
Congestion Control
Control
Admission Control
Reservation
Routing
Datapath per-packet processing
Output Scheduling
Switching
Policing
5
Basic Architectural ComponentsDatapath
per-packet processing
3.
1.
Output Scheduling
2.
Forwarding Table
Interconnect
Forwarding Decision
Forwarding Table
Forwarding Decision
Forwarding Table
Forwarding Decision
6
Where high performance packet switches are used
- Carrier Class Core Router - ATM Switch - Frame
Relay Switch
The Internet Core
7
IntroductionWhat is a Packet Switch?
  • Basic Architectural Components
  • Some Example Packet Switches
  • The Evolution of IP Routers

8
ATM Switch
  • Lookup cell VCI/VPI in VC table.
  • Replace old VCI/VPI with new.
  • Forward cell to outgoing interface.
  • Transmit cell onto link.

9
Ethernet Switch
  • Lookup frame DA in forwarding table.
  • If known, forward to correct port.
  • If unknown, broadcast to all ports.
  • Learn SA of incoming frame.
  • Forward frame to outgoing interface.
  • Transmit frame onto link.

10
IP Router
  • Lookup packet DA in forwarding table.
  • If known, forward to correct port.
  • If unknown, drop packet.
  • Decrement TTL, update header Cksum.
  • Forward packet to outgoing interface.
  • Transmit packet onto link.

11
IntroductionWhat is a Packet Switch?
  • Basic Architectural Components
  • Some Example Packet Switches
  • The Evolution of IP Routers

12
First-Generation IP Routers
Shared Backplane
Buffer Memory
CPU
13
Second-Generation IP Routers
Buffer Memory
CPU
14
Third-Generation Switches/Routers
Switched Backplane
Line Card
CPU Card
Line Card
Local Buffer Memory
Local Buffer Memory
MAC
MAC
15
Fourth-Generation Switches/RoutersClustering and
Multistage
13
14
15
16
17
18
25
26
27
28
29
30
1
2
3
4
5
6
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
19
20
21
22
23
24
31
32
21
7
8
9
10
11
12
16
Packet SwitchesReferences
  • J. Giacopelli, M. Littlewood, W.D. Sincoskie
    Sunshine A high performance self-routing
    broadband packet switch architecture, ISS 90.
  • J. S. Turner Design of a Broadcast packet
    switching network, IEEE Trans Comm, June 1988,
    pp. 734-743.
  • C. Partridge et al. A Fifty Gigabit per second
    IP Router, IEEE Trans Networking, 1998.
  • N. McKeown, M. Izzard, A. Mekkittikul, W.
    Ellersick, M. Horowitz, The Tiny Tera A Packet
    Switch Core, IEEE Micro Magazine, Jan-Feb 1997.

17
Tutorial Outline
  • Introduction What is a Packet Switch?
  • Packet Lookup and Classification Where does a
    packet go next?
  • Switching FabricsHow does the packet get there?

18
Basic Architectural ComponentsDatapath
per-packet processing
3.
1.
Output Scheduling
2.
Forwarding Table
Interconnect
Forwarding Decision
Forwarding Table
Forwarding Decision
Forwarding Table
Forwarding Decision
19
Forwarding Decisions
  • ATM and MPLS switches
  • Direct Lookup
  • Bridges and Ethernet switches
  • Associative Lookup
  • Hashing
  • Trees and tries
  • IP Routers
  • CIDR
  • Patricia trees/tries
  • Other methods
  • Caching
  • Packet Classification

20
ATM and MPLS SwitchesDirect Lookup
(Port, VCI)
VCI
Memory
Address
Data
21
Forwarding Decisions
  • ATM and MPLS switches
  • Direct Lookup
  • Bridges and Ethernet switches
  • Associative Lookup
  • Hashing
  • Trees and tries
  • IP Routers
  • CIDR
  • Patricia trees/tries
  • Other methods
  • Caching
  • Packet Classification

22
Bridges and Ethernet SwitchesAssociative Lookups
Associative Memory or CAM
Network Address
Associated Data
Search Data
48
23
Bridges and Ethernet SwitchesHashing
Search Data
Hashing Function
16
Data
Memory
Address
48
24
Lookups Using HashingAn example
Memory
1
2
3
4
Search Data
Hashing Function
16
1
2
CRC-16
48
1
2
3
Linked lists
25
Lookups Using HashingPerformance of simple
example
26
Lookups Using Hashing
  • Advantages
  • Simple
  • Expected lookup time can be small
  • Disadvantages
  • Non-deterministic lookup time
  • Inefficient use of memory

27
Trees and Tries
Binary Search Tree
lt
gt
lt
gt
lt
gt
28
Trees and TriesMultiway tries
16-ary Search Trie
0000, ptr
1111, ptr
0000, 0
1111, ptr
1111, ptr
0000, 0
000011110000
111111111111
29
Trees and TriesMultiway tries
Table produced from 215 randomly generated 48-bit
addresses
30
Forwarding Decisions
  • ATM and MPLS switches
  • Direct Lookup
  • Bridges and Ethernet switches
  • Associative Lookup
  • Hashing
  • Trees and tries
  • IP Routers
  • CIDR
  • Patricia trees/tries
  • Other methods
  • Caching
  • Packet Classification

31
IP RoutersClass-based addresses
IP Address Space
Class A
Class B
Class C
D
32
IP RoutersCIDR
Class-based
A
B
C
D
0
232-1
Classless
65/24
128.9/16
0
232-1
128.9.16.14
33
IP RoutersCIDR
128.9/16
0
232-1
128.9.16.14
34
IP RoutersMetrics for Lookups
  • Lookup time
  • Storage space
  • Update time
  • Preprocessing time

35
IP Router Lookup
  • IPv4 unicast destination address based lookup

36
Need more than IPv4 unicast lookups
  • Multicast
  • PIMSM
  • Longest Prefix Matching on the source and group
    address
  • Try (S,G) followed by (,G) followed by (,,RP)
  • Check Incoming Interface
  • DVMRP
  • Incoming Interface Check followed by (S,G)
    lookup
  • IPv6
  • 128bit destination address field
  • Exact address architecture not yet known

37
Lookup Performance Required
  • Gigabit Ethernet (84B packets) 1.49 Mpps

38
Size of the Routing Table
  • Source http//www.telstra.net/ops/bgptable.html

39
Method 1 Ternary CAMs
Associative Memory
Value
Mask
10.0.0.0
R1
255.0.0.0
255.255.0.0
Next Hop
10.1.0.0
R2
255.255.255.0
10.1.1.0
R3
10.1.3.0
R4
255.255.255.0
255.255.255.255
10.1.3.1
R4
Priority Encoder
40
Method 2 Binary Tries
Example Prefixes
0
1
a) 00001
b) 00010
c) 00011
d) 001
e) 0101
d
g
f
f) 011
g) 100
i
h
h) 1010
e
i) 1100
a
c
b
j) 11110000
j
41
Four-way tries
Reduced number of memory accesses
But greater wasted space...
42
Method 3 Patricia Tree
Example Prefixes
0
1
a) 00001
b) 00010
c) 00011
d) 001
Skip5
e) 0101
f) 011
f
g
j
d
g) 100
h) 1010
h
e
i
i) 1100
a
b
c
j) 11110000
Advantages
Disadvantages
  • Many memory accesses
  • General solution
  • May need backtracking
  • Extensible to wider fields
  • Pointers take a lot of space

(Total storage for 40K entries is 2MB)
43
Method 4 Level Compressed Tries
j
f
g
d
c
h
b
e
i
a
.


Expected depth of a trie
log
n (1log
(logn))
.
For bernoulli type distributions, expected depth
O(loglogn)
.
Achieves approx 0.5Mpps on a Pentium with
a 40k routing table, occupying less than 0.8MB
Advantages
Disadvantages
  • May be useful forIPv6
  • No practical performance gain
  • Handling updates is complex
  • Nice theoretical idea

44
Method 5 Compacting Forwarding Tables
  • Optimize the data structure to store 40,000
    routing table entries in about 150-160kBytes.
  • Rely on the compacted data structure to be
    residing in the primary or secondary cache
    of a fast processor.
  • Achieves approx 2Mpps.

Advantages
Disadvantages
  • Good software solution for
  • Only 60 actually cached

low speeds and small routing
  • Scalability to larger tables

tables.
  • Handling updates is complex

45
Method 6 A Hash-based Scheme
Example Prefixes
Example Prefixes
Store a hash table for each prefix length
10.0.0.0/8
10.0.0.0/8
10.1.0.0/16
10.1.0.0/16
Length
Hash
10.1.1.0/24
10.1.1.0/24
10.1.2.0/24
10.1.2.0/24
8
10
10.2.3.0/24
10.2.3.0/24
12
Example Addrs
16
10.1, 10.2
10.1.1.4
24
10.4.4.3
10.2.3.9
10.2.4.8
10.1.1, 10.1.2, 10.2.3
46
A Hash-Based Scheme (contd.)
  • Binary search of the prefix lengths

O(log
N )
hashes
2
  • Need to provide intermediate markers
  • But then we need precomputation per marker
  • Asymmetric binary search
  • Performance is about 2.2Mpps in the worst case
    for 33K table.

Advantages
Disadvantages
  • Good software solution for
  • Need multiple hashes

low speeds and small routing
  • Scalability to larger tables

tables.
  • Handling updates is complex

47
Method 8 Routing Lookups in Hardware
Number
Prefix length
Most prefixes are 24-bits or shorter
48
Routing Lookups in Hardware
224 16M entries
Prefixes up to 24-bits
142.19.6
142.19.6.14
14
49
Routing Lookups in Hardware
Prefixes up to 24-bits
1
Next Hop
128.3.72
128.3.72.44
44
50
Routing Lookups in Hardware (Contd.)
Prefixes up to n-bits 2n entries
entries
j
Prefixes longer than NM bits
i
0
N
Next Hop
N M
51
Routing Updates
10.4.24.0
10.4.24.0
Depth 3
10.4.0.0
10.4.0.0
Depth 2
Depth 1
10.0.0.0
10.0.0.0
  • Disadvantages
  • Large memory required
  • Depends on prefix length distribution
  • Advantages
  • 20 Mpps with 50ns DRAM
  • Easy to implement in hardware

52
IP Router LookupsReferences
  • A. Brodnik, S. Carlsson, M. Degermark, S. Pink.
    Small Forwarding Tables for Fast Routing
    Lookups, Sigcomm 1997, pp 3-14.
  • B. Lampson, V. Srinivasan, G. Varghese. IP
    lookups using multiway and multicolumn search,
    Infocom 1998, pp 1248-56, vol. 3.
  • M. Waldvogel, G. Varghese, J. Turner, B.
    Plattner. Scalable high speed IP routing
    lookups, Sigcomm 1997, pp 25-36.
  • P. Gupta, S. Lin, N.McKeown. Routing lookups in
    hardware at memory access speeds, Infocom 1998,
    pp 1241-1248, vol. 3.
  • S. Nilsson, G. Karlsson. Fast address lookup for
    Internet routers, IFIP Intl Conf on Broadband
    Communications, Stuttgart, Germany, April 1-3,
    1998.
  • V. Srinivasan, G.Varghese. Fast IP lookups using
    controlled prefix expansion, Sigmetrics, June
    1998.

53
Caching Addresses
Slow Path
Buffer Memory
CPU
Fast Path
54
Caching Addresses
55
Forwarding Decisions
  • ATM and MPLS switches
  • Direct Lookup
  • Bridges and Ethernet switches
  • Associative Lookup
  • Hashing
  • Trees and tries
  • IP Routers
  • CIDR
  • Patricia trees/tries
  • Other methods
  • Caching
  • Packet Classification

56
Providing ValueAdded ServicesSome examples
  • Differentiated services
  • Regard traffic from AS33 as platinumgrade
  • Access Control Lists
  • Deny udp host 194.72.72.33 194.72.6.64 0.0.0.15
    eq snmp
  • Committed Access Rate
  • Rate limit WWW traffic from subinterface739 to
    10Mbps
  • Policybased Routing
  • Route all voice traffic through the ATM network
  • Peering Arrangements
  • Restrict the total amount of traffic of
    precedence 7 from
  • MAC address N to 20 Mbps between 10 am and 5pm
  • Accounting and Billing
  • Generate hourly reports of traffic from MAC
    address M

57
Flow Classification
58
A Packet Classifier
Given a classifier, find the action associated
with the highest priority rule (here, the lowest
numbered rule) matching an incoming packet.
59
Geometric Interpretation in 2D
R7
R6
R2
R1
R4
R5
R3
e.g. (144.24/16, 64/24)
e.g. (128.16.46.23, )
Field 2
Field 1
60
Proposed Schemes
61
Proposed Schemes (Contd.)
62
Proposed Schemes (Contd.)
63
Packet ClassificationReferences
  • T.V. Lakshman. D. Stiliadis. High speed policy
    based packet forwarding using efficient
    multi-dimensional range matching, Sigcomm 1998,
    pp 191-202.
  • V. Srinivasan, S. Suri, G. Varghese and M.
    Waldvogel. Fast and scalable layer 4 switching,
    Sigcomm 1998, pp 203-214.
  • V. Srinivasan, G. Varghese, S. Suri. Fast packet
    classification using tuple space search, to be
    presented at Sigcomm 1999.
  • P. Gupta, N. McKeown, Packet classification
    using intelligent hierarchical cuttings, Hot
    Interconnects VII, 1999.
  • P. Gupta, N. McKeown, Packet classification on
    multiple fields, Sigcomm 1999.

64
Tutorial Outline
  • Introduction What is a Packet Switch?
  • Packet Lookup and Classification Where does a
    packet go next?
  • Switching FabricsHow does the packet get there?

65
Switching Fabrics
  • Output and Input Queueing
  • Output Queueing
  • Input Queueing
  • Scheduling algorithms
  • Combining input and output queues
  • Multicast traffic
  • Other non-blocking fabrics
  • Multistage Switches

66
Basic Architectural ComponentsDatapath
per-packet processing
3.
1.
Output Scheduling
2.
Forwarding Table
Interconnect
Forwarding Decision
Forwarding Table
Forwarding Decision
Forwarding Table
Forwarding Decision
67
InterconnectsTwo basic techniques
Input Queueing
Output Queueing
Usually a non-blocking switch fabric (e.g.
crossbar)
Usually a fast bus
68
InterconnectsOutput Queueing
Individual Output Queues
Centralized Shared Memory
1
2
N
1
2
N
69
Output QueueingThe ideal
70
Output QueueingHow fast can we make centralized
shared memory?
5ns SRAM
Shared Memory
  • 5ns per memory operation
  • Two memory operations per packet
  • Therefore, up to 160Gb/s
  • In practice, closer to 80Gb/s

1
2
N
200 byte bus
71
Switching Fabrics
  • Output and Input Queueing
  • Output Queueing
  • Input Queueing
  • Scheduling algorithms
  • Combining input and output queues
  • Multicast traffic
  • Other non-blocking fabrics
  • Multistage Switches

72
InterconnectsInput Queueing with Crossbar
Scheduler
Data In
Data Out
configuration
73
Input QueueingHead of Line Blocking
Delay
Load
100
58.6
74
Head of Line Blocking
75
(No Transcript)
76
(No Transcript)
77
Input QueueingVirtual output queues
78
Input QueuesVirtual Output Queues
Delay
Load
100
79
Input Queueing
Scheduler
80
Input QueueingScheduling
81
Input QueueingScheduling
1
7
1
2
2
2
4
2
3
3
5
4
4
2
Request
Graph
Question Maximum weight or maximum size?
82
Input QueueingScheduling
  • Maximum Size
  • Maximizes instantaneous throughput
  • Does it maximize long-term throughput?
  • Maximum Weight
  • Can clear most backlogged queues
  • But does it sacrifice long-term throughput?

83
Input QueueingScheduling
84
Input QueueingLongest Queue First orOldest Cell
First



Queue Length
Weight
100



Waiting Time
85
Input QueueingWhy is serving long/old queues
better than serving maximum number of queues?
  • When traffic is uniformly distributed, servicing
    themaximum number of queues leads to 100
    throughput.
  • When traffic is non-uniform, some queues become
    longer than others.
  • A good algorithm keeps the queue lengths
    matched, and services a large number of queues.

86
Input QueueingPractical Algorithms
  • Maximal Size Algorithms
  • Wave Front Arbiter (WFA)
  • Parallel Iterative Matching (PIM)
  • iSLIP
  • Maximal Weight Algorithms
  • Fair Access Round Robin (FARR)
  • Longest Port First (LPF)

87
Wave Front Arbiter
Requests
Match
1
1
1
1
2
2
2
2
3
3
3
3
4
4
4
4
88
Wave Front Arbiter
Requests
Match
89
Wave Front ArbiterImplementation
Combinational Logic Blocks
90
Wave Front ArbiterWrapped WFA (WWFA)
N steps instead of 2N-1
Requests
Match
91
Input QueueingPractical Algorithms
  • Maximal Size Algorithms
  • Wave Front Arbiter (WFA)
  • Parallel Iterative Matching (PIM)
  • iSLIP
  • Maximal Weight Algorithms
  • Fair Access Round Robin (FARR)
  • Longest Port First (LPF)

92
Parallel Iterative Matching
Random Selection
Random Selection
Requests
93
Parallel Iterative MatchingMaximal is not Maximum
Requests
94
Parallel Iterative MatchingAnalytical Results
Number of iterations to converge
95
Parallel Iterative Matching
96
Parallel Iterative Matching
97
Parallel Iterative Matching
98
Input QueueingPractical Algorithms
  • Maximal Size Algorithms
  • Wave Front Arbiter (WFA)
  • Parallel Iterative Matching (PIM)
  • iSLIP
  • Maximal Weight Algorithms
  • Fair Access Round Robin (FARR)
  • Longest Port First (LPF)

99
iSLIP
Round-Robin Selection
Round-Robin Selection
Requests
100
iSLIPProperties
  • Random under low load
  • TDM under high load
  • Lowest priority to MRU
  • 1 iteration fair to outputs
  • Converges in at most N iterations. On average lt
    log2N
  • Implementation N priority encoders
  • Up to 100 throughput for uniform traffic

101
iSLIP
102
iSLIP
103
iSLIPImplementation
Programmable Priority Encoder
1
1
State
Decision
log2N
N
Grant
Accept
2
2
Grant
Accept
N
log2N
N
N
Grant
Accept
log2N
N
104
Input Queueing ReferencesReferences
  • M. Karol et al. Input vs Output Queueing on a
    Space-Division Packet Switch, IEEE Trans Comm.,
    Dec 1987, pp. 1347-1356.
  • Y. Tamir, Symmetric Crossbar arbiters for VLSI
    communication switches, IEEE Trans Parallel and
    Dist Sys., Jan 1993, pp.13-27.
  • T. Anderson et al. High-Speed Switch Scheduling
    for Local Area Networks, ACM Trans Comp Sys.,
    Nov 1993, pp. 319-352.
  • N. McKeown, The iSLIP scheduling algorithm for
    Input-Queued Switches, IEEE Trans Networking,
    April 1999, pp. 188-201.
  • C. Lund et al. Fair prioritized scheduling in an
    input-buffered switch, Proc. of IFIP-IEEE Conf.,
    April 1996, pp. 358-69.
  • A. Mekkitikul et al. A Practical Scheduling
    Algorithm to Achieve 100 Throughput in
    Input-Queued Switches, IEEE Infocom 98, April
    1998.

105
Switching Fabrics
  • Output and Input Queueing
  • Output Queueing
  • Input Queueing
  • Scheduling algorithms
  • Combining input and output queues
  • Multicast traffic
  • Other non-blocking fabrics
  • Multistage Switches

106
Input QueueingSpeedup
  • Input queued switches can not easily control
    delay
  • But output queued switches can.
  • How can we emulate the behavior of an output
    queued switch?

107
Output QueueingThe ideal
108
Using Speedup
109
Using Speedup
Output Queued Switch
1
N
N
N
110
Using Speedup
Theorem For a switch with combined input and
output queueing to exactly mimic an output queued
switch, for all types of traffic, a speedup of
2-1/N is necessary and sufficient.
111
Switching Fabrics
  • Output and Input Queueing
  • Output Queueing
  • Input Queueing
  • Scheduling algorithms
  • Combining input and output queues
  • Multicast traffic
  • Other non-blocking fabrics
  • Multistage Switches

112
Multicast Traffic
113
Multicast Traffic
  • Virtual output (fanout) queues are not practical
    for multicast.
  • Fanout splitting leads to a large increase in
    throughput.
  • Scheduling is simpler than for unicast.

114
Multicast TrafficFanout splitting
115
Multicast TrafficScheduling
1
1
1
1
2
2
2
2
3
3
3
3
4
4
4
4
Requests
Grant
Match
116
Switching Fabrics
  • Output and Input Queueing
  • Output Queueing
  • Input Queueing
  • Scheduling algorithms
  • Combining input and output queues
  • Multicast traffic
  • Other non-blocking fabrics
  • Multistage Switches

117
Other Non-Blocking FabricsClos Network
118
Other Non-Blocking FabricsClos Network
119
Other Non-Blocking FabricsSelf-Routing Networks
000
000
001
001
010
010
011
011
100
100
101
101
110
110
111
111
120
Other Non-Blocking FabricsSelf-Routing Networks
The Non-blocking Batcher Banyan Network
Batcher Sorter
Self-Routing Network
3
7
7
7
7
7
7
000
7
2
5
0
4
6
6
001
5
3
2
5
5
4
5
010
2
5
3
1
6
5
4
011
6
6
1
3
0
3
3
100
0
1
0
4
3
2
2
101
1
0
6
2
1
0
1
110
4
4
4
6
2
2
0
111
  • Fabric can be used as scheduler.
  • Batcher-Banyan network is blocking for multicast.

121
Switching Fabrics
  • Output and Input Queueing
  • Output Queueing
  • Input Queueing
  • Scheduling algorithms
  • Combining input and output queues
  • Multicast traffic
  • Other non-blocking fabrics
  • Multistage Switches

122
Multistage switchesSelf-Routing
000
000
001
001
010
010
011
011
100
100
101
101
110
110
111
111
Stage-by-stage flow-control
123
Multistage switchesSelf-Routing
Buffered multistage switch
Multicast copy network
124
Tutorial Outline
  • Introduction What is a Packet Switch?
  • Packet Lookup and Classification Where does a
    packet go next?
  • Switching FabricsHow does the packet get there?

125
Basic Architectural Components
Congestion Control
Control
Admission Control
Reservation
Routing
Datapath per-packet processing
Output Scheduling
Switching
Policing
126
Basic Architectural ComponentsDatapath
per-packet processing
3.
1.
Output Scheduling
2.
Forwarding Table
Interconnect
Forwarding Decision
Forwarding Table
Forwarding Decision
Forwarding Table
Forwarding Decision
Write a Comment
User Comments (0)
About PowerShow.com