High Performance Switches and Internet Routers: Architecture and Scheduling - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

High Performance Switches and Internet Routers: Architecture and Scheduling

Description:

Scheduling in Input Queued (IQ) Switches. The Buffered Crossbar ... Quite complex in Hardware (random encoders), Parallel Iterative Matching. Random Selection ... – PowerPoint PPT presentation

Number of Views:112
Avg rating:3.0/5.0
Slides: 39
Provided by: lot63
Category:

less

Transcript and Presenter's Notes

Title: High Performance Switches and Internet Routers: Architecture and Scheduling


1
High Performance Switches and Internet Routers
Architecture and Scheduling
TU Delft June 18, 2004
Lotfi Mhamdi Computer Engineering Lab. HKUST,
HONG KONG http//www.cs.ust.hk/lotfi
2
Outline
  • The Need for fast Routers
  • Routers Architecture
  • Scheduling in Input Queued (IQ) Switches
  • The Buffered Crossbar Switching Architecture
    (BCS)
  • Scheduling in BCS
  • Output Queueing (OQ) switch Emulation
  • Concluding Remarks

3
Where high performance packet switches are used
- Core Router - ATM Switch - Frame Relay Switch
The Internet Core
4
Recent trends
5
Outline
  • The Need for fast Routers
  • Routers Architecture
  • Scheduling in Input Queued (IQ) Switches
  • The Buffered Crossbar Switching Architecture
    (BCS)
  • Scheduling in BCS
  • Output Queueing (OQ) switch Emulation
  • Concluding Remarks

6
Basic Architectural Components Data Path Per
packet processing
Ingress
Ingress
Egress
2.
Interconnect
7
InterconnectsTwo basic techniques
Input Queueing
Output Queueing
Usually a non-blocking switch fabric (e.g.
crossbar)
Usually a fast bus/Shared Memory
8
Output QueueingThe ideal
9
Input QueueingThe Head of Line Blocking
10
Head of Line Blocking
11
(No Transcript)
12
(No Transcript)
13
Input QueueingVirtual Output Queues
14
IQ Switch with VOQs
It can be quite complex!
15
Outline
  • The Need for fast Routers
  • Routers Architecture
  • Scheduling in Input Queued (IQ) Switches
  • The Buffered Crossbar Switching Architecture
    (BCS)
  • Scheduling in BCS
  • Output Queueing (OQ) switch Emulation
  • Concluding Remarks

16
Matching (Scheduling)
service matrix S(n) Si,j(n), where 1 if
input i sends to output j Si,j(n)
0 otherwise Our objective is Max S(n) s.t
?i Si,j 1 ?j Si,j 1
Matching
Maximum Weight or Maximum Size?
Request Graph
Bipartite Match
17
Maximum Size/Weight Matching
Matching
  • Maximizes instantaneous throughput
  • Complexity O(N2.5)
  • Hard to implement in hardware,
  • Slow
  • Weight (Queue Length, Waiting time) ? 100
    throughput
  • Stable under any admissible input traffic (LQF,
    OCF)
  • Complexity O(N3LogN)
  • Hard to implement in hardware and very slow

18
Parallel Iterative Matching
Random Selection
Random Selection
  • 100 throughput under uniform traffic (converges
    in LogN iterations)
  • 63 throughput with 1 iteration (pointers
    synchronization)
  • Quite complex in Hardware (random encoders),

19
iSLIP
Round-Robin Selection
  • Easy to implement in hardware,
  • Converges in LogN iterations,
  • 100 throughput under Uniform Traffic.

20
Performance 16X16 Switch, Uniform Traffic
FIFO
PIM
RRM
iSLIP
21
Pointer Synchronization
22
Performance 3X3 Switch, Non-uniform Traffic
23
So Far
  • Maximum size/weight matching algorithms are
    impractical (hardware implementation)
  • Iterative matching algorithms (PIM, iSLIP,) are
    practical but unstable under non-uniform traffic.
  • Their centralized design is a bottleneck
  • Is it possible to design Distributed Scheduling
    Algorithms?
  • If yes, how?

24
Outline
  • The Need for fast Routers
  • Routers Architecture
  • Scheduling in Input Queued (IQ) Switches
  • The Buffered Crossbar Switching Architecture
    (BCS)
  • Scheduling in BCS
  • Output Queueing (OQ) switch Emulation
  • Concluding Remarks

25
Buffered Crossbar Switch Architecture
26
I/O Contention Resolution
1
2
3
4
3
4
1
2
27
Outline
  • The Need for fast Routers
  • Routers Architecture
  • Scheduling in Input Queued (IQ) Switches
  • The Buffered Crossbar Switching Architecture
    (BCS)
  • Scheduling in BCS
  • Output Queueing (OQ) switch Emulation
  • Concluding Remarks

28
Scheduling Process
  • Scheduling is divided into three steps
  • Input scheduling
  • Every input i independently selects (in parallel)
    a HoL cell of an eligible VOQ and sends it to the
    corresponding internal buffer.
  • Output scheduling
  • Every output j independently selects (in
    parallel) a cell amongst all non-empty XPij to be
    delivered to the output port.
  • Delivery notifying (Flow Control)
  • For each delivered cell, inform the corresponding
    input of the internal buffer status.

Eligible VOQi,j non-empty VOQi,j and empty XPij
29
Existing Algorithms
  • Round Robin (RR-RR)
  • Round robin scheduling at the inputs, and the
    outputs.
  • Oldest Cell First (OCF-OCF)
  • Select the oldest HoL cell in each input, and
    the oldest at the outputs.
  • Longest Queue First - Round Robin (LQF-RR)
  • Select the HoL cell of the longest VOQ at each
    input, and round robin at the outputs.

30
Internal Buffers based Scheduling
XPi,j internal buffer
31
SBF-LBF Performance
32
Performance (VOQs Occupancies)
33
Outline
  • The Need for fast Routers
  • Routers Architecture
  • Scheduling in Input Queued (IQ) Switches
  • The Buffered Crossbar Switching Architecture
    (BCS)
  • Scheduling in BCS
  • Output Queueing (OQ) switch Emulation
  • Concluding Remarks

34
OQ EmulationThe Speed up problem
Input Queueing
Output Queueing
Best delay and throughput performance - High
fabric speedup (S N)
Speedup of one is sufficient - Unpredictable
delay due to input contention
Memory speeds for 32x32 ATM switch
35
The Ideal Solution
Find a compromise 1 lt Speedup ltlt N
  • to get the performance of an OQ switch
  • close to the cost of an IQ switch

Question Can we find
  • a simple and good algorithm
  • that exactly mimics output-queueing
  • regardless of switch sizes and traffic patterns?

36
Proposed Algorithms
  • IQ Buffer less crossbar switch
  • A speed up of 4 was shown to be sufficient (MUCF
    algorithm)
  • A speed up of just two was also provided (GBVOQ
    algorithm).
  • The bad news is Both schemes are impractical
    (high complexity due to the centralized
    scheduling)
  • Buffered Crossbar switch
  • A speed up of just two was proven to be enough
    for the exact emulation of an OQ switch (MCAF_LTF
    algorithm).
  • MCAF_LTF is practical and simple to implement in
    hardware

?
37
Outline
  • The Need for fast Routers
  • Routers Architecture
  • Scheduling in Input Queued (IQ) Switches
  • The Buffered Crossbar Switching Architecture
    (BCS)
  • Scheduling in BCS
  • Output Queueing (OQ) switch Emulation
  • Concluding Remarks

38
Concluding Remarks
  • The IQ crossbar switching architecture is
    becoming less attractive due to the scalability
    and scheduling complexity challenges.
  • The BCS switching architecture presents a good
    potential in overcoming the IQ switching problems.

Open Questions
  • Is 100 throughput achievable with a speedup lt 2
    for buffered crossbars using parallel scheduling?
  • Will storing multiple cells per crosspoint
    further simplify scheduling?
Write a Comment
User Comments (0)
About PowerShow.com