Scheduling algorithms for input-queued IP routers - PowerPoint PPT Presentation

Loading...

PPT – Scheduling algorithms for input-queued IP routers PowerPoint presentation | free to download - id: 781bb2-OTk1Y



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Scheduling algorithms for input-queued IP routers

Description:

Contention resolution whenever the selection is among situations with equal weights can be round robin, ... Sharma R., ``Issues and trends in router design'', ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 147
Provided by: elt78
Learn more at: http://www.cs.elte.hu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Scheduling algorithms for input-queued IP routers


1
Scheduling algorithms for input-queued IP routers


  • Emilio Leonardi
  • in collaboration with P. Giaccone, M. Ajmone
    Marsan, A Bianco, M.Mellia, F.Neri
  • Dipartimento di Elettronica
  • Telecommunication Network Group
  • http//www.tlc-networks.polito.it
  • Politecnico di Torino (Italy)

Budapest, March 2006
2
Outline
  • IP routers
  • OQ routers
  • IQ routers
  • Scheduling
  • Optimal algorithms
  • Heuristic algorithms
  • Packet-mode algorithms
  • Networks of routers
  • CIOQ routers
  • Multicast traffic
  • Conclusions

3
Note
  • The slides marked RWP are reproduced with
    permission of Prof.Nick McKeown from the
    Electrical Engineering and Computer Science Dept.
    of Stanford University (CA,USA)

4
Outline
  • IP routers
  • OQ routers
  • IQ routers
  • Scheduling
  • Optimal algorithms
  • Heuristic algorithms
  • Packet-mode algorithms
  • Networks of routers
  • CIOQ routers
  • Multicast traffic
  • Conclusions

5
The Internet is a mesh of routers
core router
access router
enterprise router
6
The Internet is a mesh of routers
  • Access router
  • high number of ports at low speed (kbps/Mbps)
  • several access protocols (modem, ADSL, cable)
  • Enterprise router
  • medium number of ports at high speed (Mbps)
  • several services (IP classification, filtering)
  • Core router
  • moderate number of ports at very high speed
    (Mbps/Gbps)
  • very high throughput

7
Basic functions
  • Routing
  • computation of the output port of an incoming
    packet
  • uses the routing tables computed by the routing
    protocols
  • can be a complex procedure
  • very large routing tables
  • dynamic variation of routes in the Internet

8
Basic functions
  • Switching
  • transfer of packets from input ports to output
    ports
  • solution of the contentions for output ports
  • queueing
  • where to store
  • scheduling
  • what to transfer

9
Faster and faster
  • Need for high performance routers
  • to accommodate the bandwidth demands for new
    users and new services
  • to support QoS
  • to reduce costs

10
Packet processing and link speed
  • Increase of electronic packet processing power
    cannot accommodate the increase in link speed

Packet processing Power
Link Speed
10000
1000
2x / 7 months
Moores law 2x / 18 months
100
Fiber Capacity (Gbit/s)
10
1
1985
1990
1995
2000
0,1
TDM
DWDM
Source SPEC95Int David Miller, Stanford.
RWP
11
Memory access time
RWP
12
Moores law
  • Its hard to keep up with Moores law
  • the bottleneck is memory speed
  • Moores law is too slow
  • routers need to improve faster than Moores law

RWP
13
Router capacity exceeds Moores law
  • Growth in capacity of commercial routers
  • 1992 2 Gb/s
  • 1995 10 Gb/s
  • 1998 40 Gb/s
  • 2001 160 Gb/s
  • 2003 640 Gb/s
  • Average growth rate 2.2x / 18 months

RWP
14
Single packet processing
  • The time to process one packet is becoming
    shorter and shorter
  • worst case 40-Byte packets (ACKs) travelling
    over the Internet
  • 3.2 ?s at 100 Mbps
  • 320 ns at 1 Gps
  • 32 ns at 10 Gps
  • 3.2 ns at 100 Gbps
  • 320 ps at 1Tbps

15
Hardware architecture
physical structure
logical structure
16
Hardware architecture
  • Main elements
  • line cards
  • support input/output transmissions
  • store packets
  • adapt packets to the internal format of the
    switching fabric
  • support data link protocols
  • classify packets
  • schedule packets
  • support security
  • switching fabric
  • transfers packets from input ports to output ports

17
Hardware architecture
  • Main elements
  • control processor/network processor
  • runs routing protocols
  • computes routing tables
  • manages the overall system
  • forwarding engines
  • compute the packet destination (lookup)
  • inspect packet headers
  • rewrite packet headers

18
Interconnections among main elements - I
19
Interconnections among main elements - II
20
Cell-based routers
Cell switch (fabric)
cells
packets
packets
cells
1
N
  • ISM Input-Segmentation Module
  • ORM Output-Reassembly Module
  • packet variable-size data unit
  • cell fixed-size data unit

21
Switching fabric
  • Our assumptions
  • bufferless
  • to reduce internal hardware complexity
  • non-blocking
  • it is always possible to transfer in parallel
    from input to output ports any non-conflicting
    set of cells

22
Switching fabric
  • Examples
  • crossbar
  • rearrangeable Clos network
  • Benes network
  • Batcher-Banyan network (self-routing)
  • Switching constraints
  • at most one cell for each input and for each
    output can be transferred

23
Switching fabric
  • We do not discuss switching fabrics with internal
    buffers
  • e.g. crossbars with buffer at each crosspoint

24
Generic switching architecture
output queues
input queues
25
Speedup
  • The speedup determinates the switch performance
  • Sin reading speed from input queues
  • Sout writing speed to output queues
  • maximum speedup factor
  • S max(Sin,Sout)

26
Performance comparison
  • The performance of different switching systems
    can be studied
  • with analytical models
  • introducing simplifying assumptions, but
    obtaining general results
  • with simulation models
  • obtaining more detailed results

27
Traffic description
  • Aij(n) 1 if a packet arrives at time n at input
    i, with destination reachable through output j
  • ?ij EAij(n)
  • An arrival process is admissible if
  • ?i ?ij ? 1
  • ?j ?ij ? 1
  • that is, no input and no output are overloaded
    on average
  • note that OQ switches exhibit finite delays only
    for admissible traffic
  • traffic matrix ? ?ij

28
Traffic scenarios
  • Uniform traffic
  • Bernoulli i.i.d. arrivals
  • usual testbed in the literature
  • easy to schedule
  • Diagonal traffic
  • Bernoulli i.i.d arrivals
  • critical to schedule, since
  • only two matchings are good

29
Traffic scenarios
  • LogDiagonal traffic
  • Bernoulli i.i.d. arrivals
  • more critical than uniform, less than diagonal
    traffic

30
Outline
  • IP routers
  • OQ routers
  • IQ routers
  • Scheduling
  • Optimal algorithms
  • Heuristic algorithms
  • Packet-mode algorithms
  • Networks of routers
  • CIOQ routers
  • Multicast traffic
  • Conclusions

31
Output Queued (OQ) switches
  • Sin 1 Sout N
  • used for low bandwidth routers
  • no coordination among ports
  • work-conserving
  • best average delays
  • complete control of delays
  • support of QoS scheduling

32
Output Queued (OQ) switch
33
OQ performance
Uniform traffic
Note OQ is optimal from the point of view of
average delay and throughput
OQ
34
Outline
  • IP routers
  • OQ routers
  • IQ routers
  • Scheduling
  • Optimal algorithms
  • Heuristic algorithms
  • Packet-mode algorithms
  • Networks of routers
  • CIOQ routers
  • Multicast traffic
  • Conclusions

35
Simple Input Queued (IQ) switches
  • Sin 1 Sout 1
  • 1 FIFO queue for each input port
  • throughput limitations
  • due to head of the line (HOL) blocking
  • scheduling
  • to solve contentions
  • for the same output

36
Head of the Line (HOL) Blocking
RWP
37
Simple IQ switch performance
Uniform traffic
Simple IQ
OQ
38
Improving simple IQ switches
  • Window/bypass schedulers
  • the first w cells of each queue contend for
    outputs
  • HOL blocking is reduced, not eliminated
  • w 1 means FIFO at each input
  • higher complexity
  • the scheduler deals with wN cells
  • non-FIFO queues

39
Improving IQ switches
  • Virtual output queueing (VOQ)
  • one queue for each input/output pair
  • N queues at each input
  • N2 queues in the whole switch
  • eliminates HOL blocking
  • used in high-bandwidth routers
  • scheduling implemented in hardware at very high
    speed

40
IQ switches with VOQ
Note from now on, we always assume VOQ at the
switch inputs
41
Outline
  • IP routers
  • OQ routers
  • IQ routers
  • Scheduling
  • Optimal algorithms
  • Heuristic algorithms
  • Packet-mode algorithms
  • Networks of routers
  • CIOQ routers
  • Multicast traffic
  • Conclusions

42
Scheduling in IQ switches
  • Scheduling can be modeled as a matching problem
    in a bipartite graph
  • the edge from node i to node j refers to packets
    at input i and directed to output j
  • the weight of the edge can be
  • binary (not empty/empty queue)
  • queue length
  • HOL cell waiting time, or cell age
  • some other metric indicating the priority of the
    HOL cell to be served

43
Scheduling in IQ switches
Request Graph
Matching (or Permutation)
inputs
outputs
scheduler
44
Scheduling in IQ switches
  • Request Matrix
  • 3 5 0 0
  • 2 0 0 4
  • 4 5 0 0
  • 0 0 8 2

Permutation
0 1 0 0 0 0 0
1 1 0 0 0 0 0
1 0
45
Implementing schedulers
  • Scheduling is a complex task
  • a scheduling algorithm can be implemented in
    hardware if
  • it shows good performance for a wide range of
    traffic patterns
  • it can be efficiently parallelized
  • it can be efficiently pipelined
  • it requires few iterations (or clock cycles)
  • it requires limited control information

46
Scheduling uniform traffic
  • A number of algorithms give 100 throughput when
    traffic is uniform
  • For example
  • TDM and a few variants
  • iSLIP (see later)

Example of TDM for a 4x4 switch
RWP
47
Birkhoff - von Neumann theorem
  • Any doubly stochastic matrix L can be
  • expressed as convex combination of permutation
    matrices pn
  • L ?n an pn
  • with
  • an0
  • ?n an 1

48
Scheduling non-uniform traffic
  • thanks to the Birkhoff - von Neumann theorem
  • If the traffic is known and admissible, 100
    throughput can be achieved by a TDM using
  • for a fraction of time a1 matching M1 (p1)
  • for a fraction of time a2 matching M2 (p2)
  • for a fraction of time ak matching Mk (p3)

49
Outline
  • IP routers
  • OQ routers
  • IQ routers
  • Scheduling
  • Optimal algorithms
  • Heuristic algorithms
  • Packet-mode algorithms
  • Networks of routers
  • CIOQ routers
  • Multicast traffic
  • Conclusions

50
Maximum Size Matching
  • Maximum Size Matching (MSM)
  • among all the possible matchings, selects the one
    with the highest number of edges
  • MSM is generally not unique
  • the best MSM algorithm requires O(N2.5)
    iterations, and cannot be implemented
    efficiently, since it is based on a flow
    augmentation path algorithm

51
Instability of MSM
  • Assume
  • P(arrival at Q12) ?
  • P(arrival at Q11) P(arrival at Q22) 1-?-?
  • Q12 B 0 Q11 Q22 0
  • in case of parity serve Q11 and/or Q22 instead of
    Q12
  • Observe
  • Q12 is served only when A11 0 and A22 0, i.e.
    with probability
  • P(serve Q12) P(no arrivals at both Q11 and Q22
    ) 1-(1-?-?)2 (??)2
  • P(serve Q12) lt P(arrival at Q12) if ? is small
    enough
  • Example ? 0.5 ? 0.1 P(serve Q12) 0.36

Note this proof is due to I.Keslassy, Stanford
Univ.
52
Maximum Size Matching
  • MSM maximizes the instantaneous throughput
  • MSM may not yield 100 throughput
  • short term decisions can be inefficient in the
    long term
  • non-binary edge weights allow MWM to maximize
    the long-term throughput

53
Maximum Weight Matching
  • Maximum Weight Matching (MWM)
  • among all the possible N! matchings, selects the
    one with the highest weight (sum of the edge
    metrics)
  • MWM is generally not unique
  • MWM is too complex to be implemented in hardware
    at high speed
  • the best MWM algorithm requires O(N3) iterations,
    and cannot be implemented efficiently, since it
    is based on a flow augmentation path algorithm
  • cannot be parallelized and pipelined efficiently
  • MWM has never been implemented in a commercial
    chipset

54
Maximum Weight Matching
  • In case of unknown traffic, MWM is the optimal
    solution of the scheduling problem when the
    weight is either the queue length or the cell age
  • achieves 100 throughput under any traffic
  • also under non-Bernoulli arrival processes,
    satisfying the law of large numbers
  • achieves low average delays, very close to those
    of OQ switches
  • possible starvation for lightly loaded packet
    flows

55
Maximum Weight Matching
  • MWM is the optimal solution of the scheduling
    problem when the traffic is unknown, when the
    weight is either the queue length or the cell age
  • achieves 100 throughput under any traffic
  • also under non-Bernoulli arrival processes,
    satisfying the law of large numbers
  • achieves low average delays, very close to those
    of OQ switches
  • possible starvation for lightly loaded packet
    flows

56
MWM with pipeline and latency
  • Let T and P be fixed
  • Dt denotes the matching used at time t
  • The following variations of MWM also achieve
    100 throughput
  • Dt MWM(t-P) MWM with pipeline
    degree P
  • Dt MWM(ceil(t/T)T) MWM with latency T
  • combinations of both
  • thus, it seems easy to achieve 100 throughput!

57
MWM with pipeline and latency
  • Bit
  • What about throughput?
  • 100 throughput
  • but needs the computation of a MWM
  • What about delays?
  • delays can be really bad!

?
?
?
58
General consideration
  • When scheduling in IQ switches, it is very
    difficult to achieve simultaneously
  • high throughput
  • low delay
  • limited implementation complexity

59
Uniform traffic
  • MWM and MSM behave almost identically

Uniform Traffic
100
MWM
MSM
Mean delay
10
1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Normalized Load
60
LogDiagonal traffic
  • MSM is somewhat inferior to MWM

LogDiagonal Traffic
1000
MWM
MSM
100
Mean delay
10
1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Normalized Load
61
Diagonal traffic
  • MSM yields much longer delays than MWM at
    medium/high loads

Diagonal Traffic
1000
MWM
MSM
100
Mean delay
10
1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Normalized Load
62
Outline
  • IP routers
  • OQ routers
  • IQ routers
  • Scheduling
  • Optimal algorithms
  • Heuristic algorithms
  • Packet-mode algorithms
  • Networks of routers
  • CIOQ routers
  • Multicast traffic
  • Conclusions

63
Approximations of MSM and MWM
  • Motivation
  • strong interest in scheduling algorithms with
  • very low complexity
  • high performance
  • Usually
  • implementable schedulers (low complexity)
  • ? low throughput, long delays
  • theoretical schedulers (high complexity)
  • ? high throughput, short delays

64
Some implementable algorithms
  • Approximate MSM
  • WFA, iSLIP, 2DRR, RC, FIRM and many others
  • Approximate MWM with wij Xij (queue length)
  • iLQF, RPA, learning algorithms
  • Approximate MWM with wij cell age
  • iOCF
  • Approximate MWM with wij ?i Xij ?j Xij
  • iLPF, MUCS

65
APPROXIMATIONS OF MAXIMUM SIZE MATCHING
66
Wave Front Arbiter
Requests
Match
1
1
1
1
2
2
2
2
3
3
3
3
4
4
4
4
RWP
67
Wave Front Arbiter
2N-1 steps
Requests
Match
RWP
68
Wrapped Wave Front Arbiter
N steps instead of 2N-1
Requests
Match
RWP
69
iSLIP
  • iSLIP means iterative SLIP
  • iterates among the following 3 phases
  • Request
  • Grant
  • Accept

70
iSLIP
  • 3 phases
  • Request (from inputs to outputs)
  • each unmatched input sends a request to every
    output for which it has a cell
  • Grant (from outputs to inputs)
  • if an unmatched output receives requests, it
    sends a grant to one of the inputs
  • contentions solved by a round-robin mechanism
  • Accept (from inputs to outputs)
  • if an unmatched input receives grants, it selects
    a single output and it becomes matched to it
  • contentions solved by a round-robin mechanism

71
iSLIP
  • The round robin mechanism in iSLIP is designed so
    that, under uniform traffic, iSLIP emulates a
    dynamic TDM scheduler synchronized on the arrival
    pattern

72
iSLIP
  • iSLIP is maximal
  • often, with log N iterations
  • always, with N iterations
  • iSLIP was implemented on one chip in the Cisco
    12000 router
  • http//www.cisco.com/warp/public/cc/pd/rt/12000/te
    ch/fasts_wp.pdf

73
iSLIP
iSLIP demo
from http//tiny-tera.stanford.edu/tiny-tera/demo
s/index.html
74
APPROXIMATIONS OF MAXIMUM WEIGHT MATCHING
75
iLQF
  • iLQF means iterative Longest Queue First
  • iterates among the following 3 phases
  • Request
  • Grant
  • Accept

76
iLQF
  • 3 phases
  • Request (from inputs to outputs)
  • each unmatched input sends all its queue lengths
    as requests to corresponding outputs
  • Grant (from outputs to inputs)
  • if an unmatched output receives requests, it
    sends a grant to the input corresponding to the
    longest queue
  • contentions solved by random choice
  • Accept (from inputs to outputs)
  • if an unmatched input receives grants, it selects
    the output with the longest queue
  • contentions solved by random choice

77
iLQF
  • iLQF is maximal
  • often, with log N iterations
  • always, with N iterations
  • iLQF is robust to non-uniform traffic

78
iLQF
iLQF demo
from http//tiny-tera.stanford.edu/tiny-tera/demo
s/index.html
79
RPA
  • RPA means Reservation with Preemption and
    Acknowledgment
  • Two phases
  • Reservation (possibly preemptive)
  • Acknowledgement
  • Sequential accesses to a reservation vector
  • Urgj (if set) is the urgency of the transfer
    from input Inj to output j

Vector Res
80
RPA
Input 1
Input 2
  • Vector Res is sequentially accessed by all inputs

Res
Input 4
Input 3
81
RPA
  • Initially, at each round Urgj 0 for all j
  • Reservation phase
  • when input i accesses Res
  • it computes Wj Xij Urgj for all j
  • finds j such that Wj max Wj
  • if Wj gt 0,
  • ? reserve output j and set UrgjXij, possibly
    overwriting the previous reservation
  • otherwise,
  • ? leave the current reservation

82
RPA
  • Acknowledgement phase
  • if input i still finds its reservation at output
    j,
  • ? books output j
  • otherwise,
  • ? chooses an unreserved output j and books output
    j

83
Uniform traffic
  • comparison between MWM, iSLIP, iLQF, and RPA

Uniform Traffic
1000
MWM
iSLIP
iLQF
RPA
100
Mean delay
10
1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Normalized Load
84
LogDiagonal traffic
  • iSLIP saturates close to 84 throughput

LogDiagonal Traffic
100000
MWM
iSLIP
iLQF
RPA
10000
1000
Mean delay
100
10
1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Normalized Load
85
Diagonal traffic
  • RPA achieves 98 throughput, iLQF 87, iSLIP 83

Diagonal Traffic
100000
MWM
iSLIP
iLQF
RPA
10000
1000
Mean delay
100
10
1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Normalized Load
86
LEARNING ALGORITMS
87
Learning algorithms
  • Goal
  • find a good compromise among
  • throughput, delay and complexity

88
Learning algorithms
  • Key observation
  • the matchings generated by MWM show limited
    changes from one time to another
  • remembering the matching from the past simplifies
    the computation of the new matching
  • the search implemented by MWM can be enhanced
  • with a randomized approach
  • by observing arrivals
  • by searching in parallel
  • based on an extension of randomized scheduling
    algorithms

89
Simple Randomized Schemes
  • Choose a matching at random and use it as the
    schedule
  • doesnt yield 100 throughput
  • Choose 2 matchings at random and use the heavier
    one as the schedule
  • Choose N matchings at random and use the
    heaviest one as the schedule
  • ?None of these can give 100 throughput !

90
Simple randomized algorithms
32x32
91
Bounds on Maximum Throughput
92
Tassiulas scheme
  • Consider the following policy
  • Rt matching picked at random (uniformly) among
    all the possible N! matchings
  • Dt arg max W(Dt-1), W(Rt)
  • Complexity is very low
  • O(1) iterations
  • easy to pipeline
  • Yields 100 throughput !
  • note the boost in throughput is due to memory of
    the past matching Dt-1
  • However, delays are very large

93
Tassiulas' scheme
32x32
94
Learning approach
  • Properties of COMP1
  • W(Dt) ? W(Dt-1)
  • W(Dt) ? W(Mt)
  • Examples
  • COMP1 is the MAX among Dt-1 and Mt
  • COMP1 is the MERGE among Dt-1 and Mt

95
MERGE procedure
Merging
3-12-22
Emulating MWM is O(N)
2-12-4-1
M
W(M)13
96
The learning approach
  • Properties of Mt
  • informally, Mt should be a good sample in the
    space of all possible matchings
  • Examples
  • Mt is a matching picked uniformly at random
  • Mt is a matching picked non-uniformly at random,
    with a high probability of being heavy
  • Mt is derived from the arrival vector At
  • Mt is a good neighbor of Dt-1

97
Theoretical properties
  • Stability
  • 100 throughput under any admissible Bernoulli
    traffic pattern
  • Delay
  • the better is the weight of Mt , the smaller are
    the queue lengths, and hence the smaller are the
    delays

98
Example of practical implementation
  • Exploiting parallel search

K-th neighbor of Dt-1
Dt-1
MAX
Mt
At
MAX
  • This scheme is called APSARA

Dt
99
What is a neighbor of a matching?
  • Example 3 x 3 switch

Dt-1
3 neighbors
N1
N2
N3
  • Each neighbor
  • differs from Dt-1 in ONLY TWO edges
  • can be generated very easily in hardware

100
Max-APSARA
  • APSARA, as described before, is not maximal
  • Max-APSARA is a modified version of APSARA where
    a maximal size matching algorithm runs on the
    remaining unmatched inputs/outputs
  • e.g., if k inputs/outputs are unmatched,
  • run iSLIP with k iterations
  • select k random edges among the unmatched
    inputs/outputs

101
APSARA performance
102
Outline
  • IP routers
  • OQ routers
  • IQ routers
  • Scheduling
  • Optimal algorithms
  • Heuristic algorithms
  • Packet-mode algorithms
  • Networks of routers
  • CIOQ routers
  • Multicast traffic
  • Conclusions

103
Routers and switches
  • IP routers deal with variable-size packets
  • Hardware switching fabrics often deal with
    fixed-size cells
  • Question
  • how to integrate an hardware switching fabric
    within an IP router?

104
Router based on an IQ cell switch cell-mode
105
Cell-mode scheduling
  • Scheduling algorithms work at cell level
  • pros
  • 100 throughput achievable
  • cons
  • interleaving of packets at the outputs of the
    switching fabric

106
Router based on an IQ cell switch packet-mode
NO packet interleaving if packet-mode
IQ cell switch
ORM
1
1
ORM
N
N
switching fabric
107
Router based on an IQ cell switch packet-mode
NO packet interleaving if packet-mode
IQ cell switch
ORM
1
1
ORMs can be removed
ORM
N
N
switching fabric
108
Packet-mode scheduling
  • Rule packets transferred as trains of cells
  • when an input starts transferring the first cell
    of a packet comprising k cells, it continues to
    transfer in the following k-1 time slots
  • Pros
  • no interleaving of packets at the outputs
  • easy extension of traditional schedulers
  • Cons
  • starvation due to long packets
  • inherent in packet systems without preemption
  • negligible for high speed rates

109
Packet-mode scheduling
  • Questions
  • can packet mode provide high throughput?
  • what about delays?

YES! ?
It depends?
110
Packet-mode properties
  • Main theoretical results
  • MWM in packet-mode yields 100 throughput
  • Packet mode can provide shorter delays than cell
    mode, depending on the packet length distribution

111
Simulation scenario
  • Router with ISMs and ORMs
  • Uniform packet traffic
  • uniform packet load
  • uniform (1,192) packet size distribution
  • Spotted packet traffic
  • non uniform packet load
  • bimodal (3,100) packet size distribution

112
Uniform packet traffic
  • Packet mode and cell mode reach the same
    throughput

Uniform packet traffic for cell mode
Uniform packet traffic for packet mode
100000
100000
MWM
MSM
iSLIP
iLQF
10000
10000
Mean packet delay
Mean packet delay
1000
1000
100
100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Normalized Load
Normalized Load
Cell-mode
Packet-mode
113
Spotted packet traffic
  • Packet mode reaches higher throughput than cell
    mode

Spotted packet traffic for packet mode
Spotted packet traffic for cell mode
100000
100000
MWM
MSM
iSLIP
iLQF
10000
10000
Mean packet delay
Mean packet delay
1000
1000
100
100
0.5
0.6
0.6
0.7
0.7
0.8
0.8
0.9
0.9
1.0
1.0
0.5
0.6
0.6
0.7
0.7
0.8
0.8
0.9
0.9
1.0
1.0
Normalized Load
Normalized Load
Cell-mode
Packet-mode
114
Effect of packet size distribution
  • iSLIP delayCM/delayPM for different packet size
    distributions

2
Uniform
Exponential
better PM
Trimodal
Bimodal
1.5
Packet mode gain for iSLIP
1
better CM
0.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Normalized load
115
Packet mode features
  • Packet mode scheduling
  • is a feasible modification of schedulers
  • improves throughput
  • but it can generate some unfairness between long
    and short packets
  • inherent to all variable-packet networks without
    preemption
  • may give better packet delays than cell mode
  • depends on the packet size distribution

116
Outline
  • IP routers
  • OQ routers
  • IQ routers
  • Scheduling
  • Optimal algorithms
  • Heuristic algorithms
  • Packet-mode algorithms
  • Networks of routers
  • CIOQ routers
  • Multicast traffic
  • Conclusions

117
Network of IQ routers
  • Question
  • given a network of IQ switches and an admissible
    input traffic, is the network always stable?

118
Networks of IQ routers
  • Consider the acyclic network of IQ routers in the
    following slide
  • derived from well established results from
    adversarial queueing theory
  • a very specific scenario, but comprises only few
    switches
  • this situation may not be common, but cannot be
    excluded in real networks

119
Pathological network of IQ switches
Network with 8 switches and 4 flows
120
Instability of MWM
  • If MWM is adopted at each IQ router, and the
    traffic is admissible, the system can be unstable
    under Bernoulli i.i.d. arrivals

121
Instability of MWM
  • MWM is too greedy, in the sense that it can
    create traffic bursts that are amplified by each
    scheduler
  • A server can be idling when large bursts
    (directed to it) are blocked because of the
    contentions upstream
  • the problem arises when a packet flow is subject
    to priority changes along its path through the
    network
  • it is dangerous to increase priority along the
    path

122
Stability in networks of routers
  • Global policies
  • Oldest in the network and many others
  • problem requires global information about the
    network, and perfectly synchronized clocks at the
    ingress of the network
  • Local policies
  • until now, nothing really satisfying known
    (work in progress)

123
Stability in networks of routers
  • Semi-local policies
  • MWM with local information about the router
    neighbors can achieves 100 throughput under
    i.i.d. Bernoulli arrivals
  • Virtual network queue
  • the weights used by MWM are
  • wij max0,Xij-H(Xij)
  • where H(Xij) is the size of the queue upstream
    which is sending packets to Xij

124
Outline
  • IP routers
  • OQ routers
  • IQ routers
  • Scheduling
  • Optimal algorithms
  • Heuristic algorithms
  • Packet-mode algorithms
  • Networks of routers
  • CIOQ routers
  • Multicast traffic
  • Conclusions

125
CIOQ routers
VOQ
126
CIOQ routers
  • Question
  • if a low speedup S is allowed (and queues are
    available at both inputs and outputs), is it
    possible to design simple scheduling algorithms,
    capable of achieving high throughput and low
    delay?

YES! ?
127
CIOQ routers with S2
  • If S 2
  • it is easy to obtain 100 throughput
  • all maximal matchings work
  • based on stable marriage algorithms
  • it is less easy to obtain work conservation
  • output never idling whenever a packet is present
    destined to it
  • same average delays as OQ
  • very good delay performance
  • e.g. LOOFA
  • it is difficult to perfectly emulate OQ

128
LOOFA
  • The occupancy Cj
  • is the number of cells currently residing at the
    j-th output queue
  • at each time slot, it is decremented by one
    because of departures
  • Basic idea of LOOFA
  • give priority to output channels with low
    occupancy, thereby attempting to maintain
    work-conservation for all outputs

129
LOOFA
  • If S 2, during each of the two phases
  • each unmatched input selects a non-empty VOQ
    directed to the unmatched output with the lowest
    occupancy, and sends a request to that output
  • each unmatched output selects one of the
    requests, and sends a request to that input
  • repeat until the matching is maximal
  • the selection at the outputs can be round robin,
    random, ...

130
CIOQ routers with S2
  • If S 2
  • it is difficult (but possible) to perfectly
    emulate an OQ router in terms of packet
    departures
  • it is impossible to distinguish, by observing
    arrivals and departures, if the switching
    architecture is CIOQ or OQ
  • delays are perfectly controlled
  • easy to implement scheduling algorithms born for
    OQ (eg WFQ)

131
CIOQ routers
  • CIOQ are very promising architectures
  • many degrees of freedom in design
  • how to balance input/output buffers
  • how the buffers interact
  • e.g., by backpressure mechanisms
  • Several currently designed architectures are
    supposed to be CIOQ
  • The speedup S is becoming closer and closer to 1
    in practical implementations of new switching
    architectures (CIOQ ?IQ)

132
Outline
  • IP routers
  • OQ routers
  • IQ routers
  • Scheduling
  • Optimal algorithms
  • Heuristic algorithms
  • Packet-mode algorithms
  • Networks of routers
  • CIOQ routers
  • Multicast traffic
  • Conclusions

133
Multicast traffic
  • Misleading (but common) idea
  • observe
  • OQ can achieve 100 throughput under any
    admissible unicast and multicast traffic
  • OQ can be perfectly emulated by CIOQ with S 2
  • then, with S 2 it is possible to achieve 100
    throughput for multicast traffic

134
Multicast traffic
  • Question
  • what is the minimum speedup required to achieve
    100 throughput?

unknown! ?
135
Multicast traffic
  • Possible implementations
  • copy network before the switching fabric
  • a multicast cell with f destinations is treated
    as f cells
  • possible bandwidth inefficiency
  • dedicated queue
  • multicast packets are treated in some specific way

136
Multicast traffic optimal queueing
  • MC-VOQ queueing
  • best throughput performance
  • avoids HOL blocking
  • 2N-1 queues for each input, one for each fanout
    set
  • re-enqueuing process ? out-of-sequence problem
  • no re-enqueuing ? some throughput degradation

137
Multicast traffic optimal scheduling
  • The optimal scheduling for multicast traffic can
    be defined similarly to unicast traffic
  • it is a sort of max flow algorithm on all N(2N-1)
    queues
  • Many heuristics can be envisaged to approximate
    it

138
Summary
  • 3 main ingredients for IQ scheduling algorithms
  • Weight computation
  • Matching computation
  • Contention resolution

139
Summary
  • Weight computation
  • obtains the priority of each input queue
  • the metric can be related to queue length,
    waiting time of the cell at the HOL,
  • Contention resolution
  • whenever the selection is among situations with
    equal weights
  • can be round robin, or random

140
Summary
  • Matching computation
  • computes the matching, trying to maximize its
    total weight
  • can be based on
  • an iterative search, like in iSLIP, iOCF, iLQF
  • a matrix greedy approach, like in MUCS, WFA
  • a reservation vector, like in RPA
  • a learning approach, like in APSARA

141
Summary
  • Good IQ scheduling algorithms exist
  • 100 throughput
  • short delay
  • limited complexity
  • Performance differences are significant only
    close to saturation

142
Summary
  • Open questions concerning IQ schedulers
  • QoS guarantees
  • stability of networks of switches
  • multicast traffic

143
References
  • Router functions and architectures
  • Keshav S., Sharma R., Issues and trends in
    router design'', IEEE Communications Magazine,
    vol.36, n.5, May 1998, p.144-151
  • Bux W., Denzel W.E., Engbersen T., Herkersdorf
    A., Luijten R.P.,Technologies and building
    blocks for fast packet forwarding'', IEEE
    Communications Magazine, Jan.2001, pp.70-77
  • Newman P., Minshall G., Lyon T., Huston L.,IP
    switching and gigabit routers'', IEEE
    Communications Magazine, Jan.1997, pp.64-69
  • Wolf T., Turner J.S., Design issues for
    high-performance active routers'', IEEE Journal
    on Selected Areas in Communications, vol.19, n.3,
    Mar.2001, pp.404-409
  • Scheduling in IQ switches
  • Karol M., Hluchyj M., Morgan S., Input versus
    output queueing on a space division switch'',
    IEEE Transactions on Communications, vol.35,
    n.12, Dec.1987
  • McKeown N., Anantharam V., Walrand J.,Achieving
    100\ throughput in an input-queued switch'',IEEE
    INFOCOM'96, vol.1, San Francisco, CA, Mar.1996,
    pp.296-302
  • McKeown N.,iSLIP a scheduling algorithm for
    input-queued switches'', IEEE Transactions on
    Networking, vol.7, n.2, Apr.1999, pp.188-201
  • McKeown N., Mekkittikul A.,A practical
    scheduling algorithm to achieve 100\ throughput
    in input-queued switches'', IEEE INFOCOM'98,
    vol.2, 1998, pp.792-9, New York, NY
  • Tamir Y., Chi H.-C., Symmetric crossbar
    arbiters for VLSI communication switches'', IEEE
    Transaction on Parallel and Distributed Systems,
    vol.4, no.1, Jan.1993, pp.13 27
  • Chen H., Lambert J., Pitsilledes A.,RC-BB
    switch. A high performance switching network for
    B-ISDN'', GLOBECOM 95

144
References
  • Scheduling in IQ switches
  • Anderson T., Owicki S., Saxe J., Thacker
    C.,High speed switch scheduling for local area
    networks'', ACM Transactions on Computer Systems,
    vol.11, n.4, Nov.1993
  • LaMaire R.O., Serpanos D.N., Two dimensional
    round-robin schedulers for packet switches with
    multiple input queues'', IEEE/ACM Transaction on
    Networking, vol.2, n.5, Oct.1994, p.471-482
  • Chen H., Lambert J., Pitsilledes A., RC-BB
    switch. A high performance switching network for
    B-ISDN'', IEEE GLOBECOM 95, 1995
  • Duan H., Lockwood J.W., Kang S.M., Will J.D., A
    high performance OC12/OC48 queue design prototype
    for input buffered ATM switches'', IEEE
    INFOCOM'97, vol.1, 1997, pp.20-8, Los Alamitos,
    CA
  • Partridge C., et al., A 50-Gb/s IP router'',
    IEEE Transactions on Networking, vol.6, n.3, June
    1998, pp.237-248
  • Ajmone Marsan M., Bianco A., Leonardi E., Milia
    L., RPA a flexible scheduling algorithm for
    input buffered switches'', IEEE Transactions on
    Communications, vol.47, n.12, Dec.1999,
    pp.1921-1933
  • Ajmone Marsan M., Bianco A., Filippi E., Giaccone
    P.,Leonardi E., Neri F.,On the behavior of
    input queueing switch architectures'', European
    Transactions on Telecommunications, vol.10, n.2,
    Mar.1999, pp.111-124
  • Christensen K.J.,Design and evaluation of a
    parallel-polled virtual output queued switch'',
    IEEE ICC 2001, vol.1, pp.112-116, 2001
  • Serpanos D.N., Antoniadis P.I., FIRM a class
    of distributed scheduling algorithms for
    high-speed ATM switches with multiple input
    queues'', IEEE INFOCOM 2000, vol.2, pp.548-555,
    2000
  • Ying Jiang, Hamdi, M., A 2-stage matching
    scheduler for a VOQ packet switch architecture,
    IEEE ICC 2002, vol.4, pp.2105-2110, 2002
  • Tassiulas L., Linear complexity algorithms for
    maximum throughput in radio networks and input
    queued switches'', IEEE INFOCOM'98, vol.2, New
    York, NY, 1998, pp.533-539
  • Giaccone P., Prabhakar B., Shah D., Towards
    simple, high-performance schedulers for
    high-aggregate bandwidth switches '', IEEE
    INFOCOM'02, New York, Jun.2002

145
References
  • Packet scheduling in IQ switches
  • Ajmone Marsan M., Bianco A., Giaccone P.,
    Leonardi E., Neri F., Packet scheduling in
    input-queued cell-based switches'', IEEE
    INFOCOM'01, Anchorage, Alaska, Apr.2001(extended
    version to appear in IEEE Trans. on Networking,
    about Oct.2002)
  • Moon S.H., Sung D.K., High-performance
    variable-length packet scheduling algorithm for
    IP traffic'', IEEE GLOBECOM'01, Dec.2001
  • Scheduling multicast traffic in IQ switches
  • Hayes J.F., Breault R., Mehmet-Ali M.K.,
    Performance analysis of a multicast switch'',
    IEEE Transactions on Communications, vol.39, n.4,
    Apr.1991, pp.581-587
  • Kim C.K., Lee T.T., Call scheduling algorithm
    in multicast switching systems'', IEEE
    Transactions on Communications, vol.40, n.3,
    Mar.1992, pp.625-635
  • McKeown N., Prabhakar B., Scheduling multicast
    cells in an input-queued switch'', INFOCOM'96,
    vol.1, San Francisco, CA, Mar.1996, pp.261-278
  • Prabhakar B., McKeown N., Ahuja R., Multicast
    scheduling for input-queued switches'', IEEE
    Journal on Selected Areas in Communications,
    vol.15, n.5, Jun.1997, pp.855-866
  • Chen W., Chang Y., Hwang W., A high performance
    cell scheduling algorithm in broadband multicast
    switching systems'', IEEE GLOBECOM'97, vol.1, New
    York, NY, 1997, pp.170-174
  • Guo M., Chang R., Multicast ATM switches
    survey and performance evaluation'', Computer
    Communication Review, vol.28, n.2, Apr.1998,
    pp.98-131
  • Andrews M., Khanna S., Kumaran K., Integrated
    scheduling of unicast and multicast traffic in an
    input-queued switch'', IEEE INFOCOM'99, vol.3,
    New York, NY, 1999, pp.1144-1151
  • Liu Z., Righter R., Scheduling multicast
    input-queued switches'', Journal of Scheduling,
    John Wiley Sons, May 1999

146
References
  • Scheduling multicast traffic in IQ switches
  • Nong G., Hamdi M., On the provision of
    integrated QoS guarantees of unicast and
    multicast traffic in input-queued switches'',
    IEEE GLOBECOM'99, vol.3, 1999
  • Ajmone Marsan M., Bianco A., Giaccone P.,
    Leonardi E., Neri F., On the throughput of
    input-queued cell-based switches with multicast
    traffic'', IEEE INFOCOM'01, Anchorage Alaska,
    Apr.2001
  • Ge Nong, Hamdi M., Providing QoS guarantees for
    unicast/multicast traffic with fixed/variable-leng
    th packets in multiple input-queued switches,
    IEEE Symposium on Computers and Communications,
    pp.166 171, 2001
  • Smiljanic A., Flexible bandwidth allocation in
    high-capacity packet switches, IEEE/ACM
    Transactions on Networking, vol.10, n.2,
    pp.287-293, Apr.2002
  • QoS support in IQ switches
  • Tabatabaee V., Georgiadis L., Tassiulas L.,
    QoS provisioning and tracking fluid policies in
    input queueing switches'', IEEE INFOCOM'00, New
    York, Mar.2000
  • Chang C.S., Lee D.S., Jou Y.S., Load balanced
    Birkhoff-von Neumann switches'', 2001 IEEE
    Workshop on High Performance Switching and
    Routing, 2001, pp.276-280.
  • Hung A., Kesidis G., McKeown N.,ATM
    input-buffered switches with guaranteed-rate
    property'', IEEE ISCC'98, July 1998, pp.331-335,
    Athens, Greece
  • Advanced architectures derived from pure IQ
  • Iyer S., McKeown N., Making parallel packet
    switches practical'', IEEE INFOCOM'01, Alaska,
    Mar.2001
  • Chang C.S., Lee D.S., Jou Y.S., Load balanced
    Birkhoff-von Neumann switches'', 2001 IEEE
    Workshop on High Performance Switching and
    Routing, 2001, pp.276-280
  • Sivaram R., Stunkel C.B., Panda D.K., HIPIQS a
    high-performance switch architecture using input
    queuing, IEEE Transactions on Parallel and
    Distributed Systems, vol.13, n.3, pp.275-289,
    Mar.2002
About PowerShow.com