Can we make these scheduling algorithms simpler? Using a Simpler Architecture - PowerPoint PPT Presentation

About This Presentation
Title:

Can we make these scheduling algorithms simpler? Using a Simpler Architecture

Description:

A buffered crossbar switch is a switch with buffered fabric (memory inside the crossbar) ... Suppose we consider switches with fabric speedup of S, 1 S N ... – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 102
Provided by: mot112
Category:

less

Transcript and Presenter's Notes

Title: Can we make these scheduling algorithms simpler? Using a Simpler Architecture


1
Can we make these scheduling algorithms
simpler?Using a Simpler Architecture
2
Buffered Crossbar Switches
  • A buffered crossbar switch is a switch with
    buffered fabric (memory inside the crossbar).
  • A pure buffered crossbar switch architecture, has
    only buffering inside the fabric and none
    anywhere else.
  • Due to HOL blocking problem, VOQ are used in the
    input side.

3
Buffered Crossbar Architecture
Output Card
Output Card
Output Card
2
N
1
4
Scheduling Process
  • Scheduling is divided into three steps
  • Input scheduling each input selects in a certain
    way one cell from the HoL of an eligible queue
    and sends it to the corresponding internal
    buffer.
  • Output scheduling each output selects in a
    certain way from all internally buffered cells in
    the crossbar to be delivered to the output port.
  • Delivery notifying for each delivered cell,
    inform the corresponding input of the internal
    buffer status.

5
Advantages
  • Total independence between input and output
    arbiters (distributed design) (1/N complexity as
    compared to centralized schedulers)
  • Performance of Switch is much better (because
    there is much less output contention) a
    combination of IQ and OQ switches
  • Disadvantage Crossbar is more complicated

6
I/O Contention Resolution
1
2
3
4
1
2
3
4
7
I/O Contention Resolution
1
2
3
4
1
2
3
4
8
The Round Robin Algorithm
  • InRr-OutRr
  • Input scheduling InRr (Round-Robin)
  • - Each input selects the next eligible VOQ,
    based on its highest priority pointer, and sends
    its HoL packet to the internal buffer.
  • Output scheduling OutRr (Round-Robin)
  • - Each output selects the next nonempty internal
    buffer, based on its highest priority pointer,
    and sends it to the output link.

9
Input Scheduling (InRr.)
1
2
3
4
1
2
3
4
10
Output Scheduling (OutRr.)
1
4 1 3 2
2
4 1 3 2
3
4 1 3 2
4
4 1 3 2
4
1
2
3
11
Out. Ptrs Updt Notification delivery
4
1
2
3
12
Performance study
Delay/throughput under Bernoulli Uniform and
Burtsy Uniform Stability performance
13
Bernoulli Uniform Arrivals
14
Bursty Uniform Arrivals
15
Scheduling Process
  • Because the arbitration is simple
  • We can afford to have algorithms based on
    weights for example (LQF, OCF).
  • We can afford to have algorithms that provide QoS

16
Buffered Crossbar Solution Scheduler
  • The algorithm MVF-RR is composed of two parts
  • Input scheduler MVF (most vacancies first)
  • Each input selects the column of internal
    buffers (destined to the same output) where there
    are most vacancies (non-full buffers).
  • Output scheduler Round-robin
  • Each output chooses the internal buffer which
    appears next on its static round-robin schedule
    from the highest priority one and updates the
    pointer to 1 location beyond the chosen one.

17
Buffered Crossbar Solution Scheduler
  • The algorithm ECF-RR is composed of two parts
  • Input scheduler ECF (empty column first)
  • Each input selects first empty column of
    internal buffers (destined to the same output).
    If there is no empty column, it selects on a
    round-robin basis.
  • Output scheduler Round-robin
  • Each output chooses the internal buffer which
    appears next on its static round-robin schedule
    from the highest priority one and updates the
    pointer to 1 location beyond the chosen one.

18
Buffered Crossbar Solution Scheduler
  • The algorithm RR-REMOVE is composed of two parts
  • Input scheduler Round-robin (with
    remove-request signal sending)
  • Each input chooses non-empty VOQ which appears
    next on its static round-robin schedule from the
    highest priority one and updates the pointer to
    1 location beyond the chosen one. It then sends
    out at most one remove-request signal to outputs
  • Output scheduler REMOVE
  • For each output, if it receives any
    remove-request signals, it chooses one of them
    based on its highest priority pointer and removes
    the cell. If no signal is received, it does
    simple round-robin arbitration.

19
Buffered Crossbar Solution Scheduler
  • The algorithm ECF-REMOVE is composed of two
    parts
  • Input scheduler ECF (with remove-request signal
    sending)
  • Each input selects first empty column of
    internal buffers (destined to the same output).
    If there is no empty column, it selects on a
    round-robin basis.It then sends out at most one
    remove-request signal to outputs
  • Output scheduler REMOVE
  • For each output, if it receives any
    remove-request signals, it chooses one of them
    based on its highest priority pointer and removes
    the cell. If no signal is received, it does
    simple round-robin arbitration.

20
Hardware Implementation of ECF-RR An Input
Scheduling Block
21
Performance Evaluation Simulation Study
Uniform Traffic
22
Performance Evaluation Simulation Study
Load Load 0.5 0.6 0.7 0.8 0.9 0.95 0.99
Improvement Percentage 1 1 3 6 13 17 12
Normalized Improvement Percentage 1 1 3 6 12 15 11
Improvement Factor 1.01 1.01 1.03 1.06 1.13 1.17 1.12
ECF-REMOVe over RR-RR
23
Performance Evaluation Simulation Study
Bursty Traffic
24
Performance Evaluation Simulation Study
Load Load 0.5 0.6 0.7 0.8 0.9 0.95 0.99
Improvement Percentage 10 13 16 20 22 18 11
Normalized Improvement Percentage 9 12 14 16 18 16 10
Improvement Factor 1.10 1.13 1.16 1.20 1.22 1.18 1.11
ECF-REMOVe over RR-RR
25
Performance Evaluation Simulation Study
Hotspot Traffic
26
Performance Evaluation Simulation Study
Load Load 0.31 0.36 0.41 0.45 0.49 0.51
Improvement Percentage 0.2 0.3 0.5 0.8 1 0.7
Normalized Improvement Percentage 0.2 0.3 0.5 0.8 1 0.7
Improvement Factor 1.002 1.003 1.005 1.008 1.01 1.007
ECF-REMOVe over RR-RR
27
Quality of Service Mechanisms for
Switches/Routers and the Internet
28
Recap
  • High-Performance Switch Design
  • We need scalable switch fabrics crossbar,
    bit-sliced crossbar, Clos networks.
  • We need to solve the memory bandwidth problem
  • Our conclusion is to go for input
    queued-switches
  • We need to use VOQ instead of FIFO queues
  • For these switches to function at high-speed, we
    need efficient and practically implementable
    scheduling/arbitration algorithms

29
Algorithms for VOQ Switching
  • We analyzed several algorithms for matching
    inputs and outputs
  • Maximum size matching these are based on
    bipartite maximum matching which can be solved
    using Max-flow techniques in O(N2.5)
  • These are not practical for high-speed
    implementations
  • They are stable (100 throughput for uniform
    traffic)
  • They are not stable for non-uniform traffic
  • Maximal size matching they try to approximate
    maximum size matching
  • PIM, iSLIP, SRR, etc.
  • These are practical can be executed in parallel
    in O(logN) or even O(1)
  • They are stable for uniform traffic and unstable
    for non-uniform traffic

30
Algorithms for VOQ Switching
  • Maximum weight matching These are maximum
    matchings based weights such queue length (LQF)
    (LPF) or age of cell (OCF) with a complexity of
    O(N3logN)
  • These are not practical for high-speed
    implementations. Much more difficult to implement
    than maximum size matching
  • They are stable (100 throughput) under any
    admissible traffic
  • Maximal weight matching they try to approximate
    maximum weight matching. They use RGA mechanism
    like iSLIP
  • iLQF, iLPF, iOCF, etc.
  • These are somewhat practical can be executed
    in parallel in O(longN) or even O(1) like iSLIP
    BUT the arbiters are much more complex to build
  • They are recently shown to be stable under any
    admissible traffic

31
Algorithms for VOQ Switching
  • Randomized algorithms
  • They try in a smart way to approximate maximum
    weight matching by avoiding using an iterative
    process
  • They are stable under any admissible traffic
  • Their time complexity is small (depending on the
    algorithm)
  • Their hardware complexity is yet untested.
  • No schedulers deal with mis-sequencing of
    packets
  • Distributed schedulers buffered crossbars
  • Two important points to remember
  • The time complexity of an algorithm is not a
    true indication of its hardware implementation
  • 100 throughput does not mean a low delay
  • Weak vs. Strong stability

32
VOQ Algorithms and Delay
  • But, delay is key
  • Because users dont care about throughput alone
  • They care (more) about delays
  • Delay QoS ( for the network operator)
  • Why is delay difficult to approach theoretically?
  • Mainly because it is a statistical quantity
  • It depends on the traffic statistics at the
    inputs
  • It depends on the particular scheduling algorithm
    used
  • The last point makes it difficult to analyze
    delays in i /q switches
  • For example in VOQ switches, it is almost
    impossible to give any guarantees on delay.
  • All you can hope for is to have a high throughput
    and a bounded queue length bounded average
    delay (but even the bound on the queue length is
    beyond the control of the algorithm we cannot
    say that the length of the queue should not be
    more than 10).

33
VOQ Algorithms and Delay
  • This does not mean that we cannot have an
    algorithm that can do that. It means there exist
    none at this moment.
  • For this exact reason, almost all quality of
    service schemes (whether for delay or bandwidth
    guarantees) assume that you have an output-queued
    switch

34
VOQ Algorithms and Delay
  • WHY Because an OQ switch has no fabric
    scheduling/arbitration algorithm.
  • Delay simply depends on traffic statistics
  • Researchers have shown that you can provide a lot
    of QoS algorithms (like WFQ) using a single
    server and based on the traffic statistics
  • But, OQ switches are extremely expensive to build
  • Memory bandwidth requirement is very high
  • These QoS scheduling algorithms have little
    practical significance for scalable and
    high-performance switches/routers.

35
Output QueueingThe ideal
36
How to get good delay cheaply?
  • Enter speedup
  • The fabric speedup for an IQ switch equals 1
    (mem. bwdth. 2)
  • The fabric speedup for an OQ switch equals N
    (mem. Bwdth. N1)
  • Suppose we consider switches with fabric speedup
    of S, 1 lt S ltlt N
  • Such switch will require buffers both at the
    input and the output
  • ? call these combined input- and output-queued
    (CIOQ) switches
  • Such switches could help if
  • With very small values of S
  • We get the performance both delay and
    throughput of an OQ switch

37
A CIOQ switch
  • Consist of
  • An (internally non-blocking, e.g. crossbar)
    fabric with speedup S gt 1
  • Input and output buffers
  • A scheduler to determine matchings

38
A CIOQ switch
  • For concreteness, suppose S 2. The operation of
    the switch consists of
  • Transferring no more than 2 cells from (to) each
    input (output)
  • Logically, we will think of each time slot as
    consisting of two phases
  • Arrivals to (departures from) switch occur at
    most once per time slot
  • The transfer of cells from inputs to outputs can
    occur in each phase

39
Using Speedup
40
Performance of CIOQ switches
  • Now that we have a higher speedup, do we get a
    handle on delay?
  • Can we say something about delay (e.g., every
    packet from a given flow should below 15 msec)?
  • There is one way of doing this competitive
    analysis
  • ? the idea is to compete with the performance of
    an OQ switch

41
Intuition
Speedup 1
Fabric throughput .58
Ave I/p queue too large
Speedup 2
Fabric throughput 1.16
Ave I/p queue 6.25
42
Intuition (continued)
Speedup 3
Fabric throughput 1.74
Ave I/p queue 1.35
Speedup 4
Fabric throughput 2.32
Ave I/p queue 0.75
43
Performance of CIOQ switches
  • The setup
  • Under arbitrary, but identical inputs
    (packet-by-packet)
  • Is it possible to replace an OQ switch by a CIOQ
    switch and schedule the CIOQ switch so that the
    outputs are identical packet-by-packet? To
    exactly mimick an OQ switch
  • If yes, what is the scheduling algorithm?

44
What is exact mimicking?
Apply same inputs to an OQ and a CIOQ switch
  • packet by packet

Obtain same outputs
  • packet by packet

45
What is exact mimicking?
Why is a speedup of N not necessary?
It is useless to bring all packets to the output
if they need wait at the output.
Need to bring packets at the output before they
can leave.
46
Consequences
  • Suppose, for now, that a CIOQ is competitive wrt
    an OQ switch. Then
  • We get perfect emulation of an OQ switch
  • This means we inherit all its throughput and
    delay properties
  • Most importantly all QoS scheduling algorithms
    originally designed for OQ switches can be
    directly used on a CIOQ switch
  • But, at the cost of introducing a scheduling
    algorithm which is the key

47
Emulating OQ Switches with CIOQ
  • Consider an N x N switch with (integer) speedup S
    gt 1
  • Were going to see if this switch can emulate an
    OQ switch
  • Well apply the same inputs, cell-by-cell, to
    both switches
  • Well assume that the OQ switch sends out packets
    in FIFO order
  • And well see if the CIOQ switch can match cells
    on the output side

48
Key concept Urgency
OQ switch
  • Urgency of a cell at any time its departure
    time - current time
  • It basically indicates the time that this packet
    will depart the OQ switch
  • This value is decremented after each time slot
  • When the value reaches 0, it must depart (it is
    at the HoL of the output queues)

49
Key concept Urgency
  • Algorithm Most urgent cell first (MUCF). In each
    phase
  • Outputs try to get their most urgent cells from
    inputs.
  • Input grant to output whose cell is most urgent.
  • In case of ties, output i takes priority over
    output i k.
  • Loser outputs try to obtain their next most
    urgent cell from another (unmatched) input.
  • When no more matchings are possible, cells are
    transferred.

50
Key concept Urgency - Example
  • At the beginning of phase 1, both outputs 1 and 2
    request input 1 to obtain their most urgent cells
  • Since there is a tie, then input 1 grants to
    output 1 (give it to least port ).
  • Output 2 proceeds to get its next most urgent
    cell (from input 2 and has urgency of 3)

51
Key concept Urgency
  • Observation A cell is not forwarded from input
    to output for one of two (and only two) reasons
  • Input contention its input sends a more urgent
    cell
  • (output 2 cannot receive its most urgent cell in
    phase 1 because input 1 wants to send to output 1
    a more urgent cell)
  • Output contention its output receives a more
    urgent cell (Input 2 cannot send its most urgent
    cell because output 3 wants to receive from input
    3)

52
Implementing MUCF
  • The way in which MUCF matches inputs to outputs
    is similar to the stable marriage problem (SMP)
  • The SMP finds stable matchings in bipartite
    graphs
  • There are N women and N men
  • Each woman (man) ranks each man (woman) in order
    of preference for marriage

53
The Gale-Shapley Algorithm (GSA)
  • What is a stable matching?
  • A matching is a pairing (i, p(i)) of i with their
    partner p(i)
  • An unstable matching is one in which
  • there are matched pairs (i,p(i)) and (j, p(j))
    such that
  • i prefers p(j) to p(i), and p(j) prefers i to j
  • The GSA algorithm is guaranteed to give a stable
    matching
  • Its complexity is O(N2)

54
An example
  • Consider the example we have already seen
  • Executing GSA
  • With men proposing we get the matching
  • (1, 1), (2, 4), (3, 2), (4, 3) this takes 7
    proposals (iterations)
  • With women proposing we get the matching
  • (1, 1), (2, 3), (3, 2), (4, 4) this takes 7
    proposals (iterations)
  • Both matchings are stable
  • The first is man-optimal men get the best
    partners of any stable matching
  • Likewise the second is woman-optimal

55
Implementing MUCF by the GSA
  • MUCF can be implemented using the GSA algorithm
    with preference list as follows
  • Output j assigns a preference value to each input
    i, equal to the urgency of the cell at the HoL of
    VOQij
  • If VOQij is empty then the preference value of
    input I for output j is set to infiniti
  • The preference list of the output is the ordered
    set of its preference values for each input
  • Each input assigns a preference value for each
    output based on the urgency of the cells, and
    creates the preference list accordingly

56
Theorem
  • A CIOQ switch with a speedup of 4 operating under
    the MUCF algorithm exactly matches cells with
    FIFO output-queued switch.
  • This is true even for Non-FIFO OQ scheduling
    schemes (e.g., WFQ, strict priority, etc.)
  • We can achieve similar results with S 2

57
Implementation - a closer look
Main sources of difficulty
  • Estimating urgency

(and communicating this info among I/ps and O/ps)
  • Matching process - too many iterations?

Estimating urgency depends on what is being
emulated
  • FIFO, Strict priorities - no problem
  • WFQ, etc - problems

58
Other Work
Relax stringent requirement of exact emulation
  • Least Occupied O/p First Algorithm (LOOFA)

Keeps outputs always busy if there are packets
  • Can provide QoS
  • A lot of work has been done using this direction

Conclusion We must have a speedup if we want to
approach the performance of OQ switches, or
provide QoS
59
QoS Scheduling Algorithms
60
QoS Differentiation Two options
  • Stateful (per flow)
  • IETF Integrated Services (Intserv)/RSVP
  • Stateless (per class)
  • IETF Differentiated Services (Diffserv)

61
The Building Blocks May contain more functions
  • Classifier
  • Shaper
  • Policer
  • Scheduler
  • Dropper

62
QoS Mechanisms
  • Admission Control
  • Determines whether the flow can/should be allowed
    to enter the network.
  • Packet Classification
  • Classifies the data based on admission control
    for desired treatment through the network
  • Traffic Policing
  • Measures the traffic to determine if it is out of
    profile. Packets that are determined to be
    out-of-profile can be dropped or marked
    differently (so they may be dropped later if
    needed)
  • Traffic Shaping
  • Provides some buffering, therefore delaying some
    of the data, to make sure the traffic fits into
    the profile (may only effect bursts or all
    traffic to make it similar to Constant Bit Rate)
  • Queue Management
  • Determines the behavior of data within a queue.
    Parameters include queue depth, drop policy
  • Queue Scheduling
  • Determines how different queues empty onto the
    outbound link

63
QoS Router
Queue management
Policer
Per-flow Queue
Scheduler
Classifier
shaper
Policer
Per-flow Queue
Per-flow Queue
Scheduler
shaper
Per-flow Queue
64
Queue Scheduling Algorithms
65
Scheduling at the output link of an OQ Switch
  • Sharing always results in contention
  • A scheduling discipline resolves contention
  • Decide when and what packet to send on the output
    link
  • Usually implemented at output interface
  • Scheduling is a Key to fairly sharing resources
    and providing performance guarantees

66
Output Scheduling
Allocating output bandwidth Controlling packet
delay
scheduler
67
Types of Queue Scheduling
  • Strict Priority
  • Empties the highest priority non-empty queue
    first, before servicing lower priority queues. It
    can cause starvation of lower priority queues.
  • Round Robin
  • Services each queue by emptying a certain amount
    of data and then going to the next queue in
    order.
  • Weighted Fair Queuing (WFQ)
  • Empties an amount of data from a queue based on
    the relative weight for the queue (driven by
    reserved bandwidth) before servicing the next
    queue.
  • Earliest Deadline First
  • Determines the latest time a packet must leave to
    meet the delay requirements and service the
    queues in that order

68
Scheduling Deterministic Priority
  • Packet is served from a given priority level only
    if no packets exist at higher levels (multilevel
    priority with exhaustive service)
  • Highest level gets lowest delay
  • Watch out for starvation!
  • Usually map priority levels to delay classes
  • Low bandwidth urgent messages
  • Realtime
  • Non-realtime

Priority
69
Scheduling Work conserving vs.
non-work-conserving
  • Work conserving discipline is never idle when
    packets await service
  • Why bother with non-work conserving? (sometimes
    useful for example to minimize delay jitter)

70
Scheduling Requirements
  • An ideal scheduling discipline
  • is easy to implement (preferably in hardware)
  • is fair (each connection gets no more than what
    it wants.
  • The excess, if any, is equally shared)
  • provides performance bounds (Can be deterministic
    or statistical) Common parameters are
  • bandwidth
  • delay
  • delay-jitter
  • Loss
  • allows easy admission control decisions (Choice
    of scheduling discipline affects ease of
    admission control algorithm)
  • to decide whether a new flow can be allowed

71
Scheduling No Classification
  • This is the simplest possible. But we cannot
    provide any guarantees.
  • With FIFO queues, if the depth of the queue is
    not bounded, there very little that can be done
  • We can perform preferential dropping
  • We can use other service disciplines on a single
    queue (e.g., EDF)

72
Scheduling Class Based Queuing
  • At each output port, packets of the same class
    are queued at distinct queues.
  • Service disciplines within each queue can vary
    (e.g., FIFO, EDF, etc.). Usually it is FIFO
  • Service disciplines between classes can vary as
    well (e.g., strict priority, some kind of
    sharing, etc.)

73
Per Flow Packet Scheduling
  • Each flow is allocated a separated virtual
    queue
  • Lowest level of aggregation
  • Service disciplines between the flows vary (FIFO,
    SP, etc.)

flow 1
flow 2
Classifier
Scheduler
1
2
flow n
Buffer management
74
The problems caused by FIFO queues in routers
  1. In order to maximize its chances of success, a
    source has an incentive to maximize the rate at
    which it transmits.
  2. (Related to 1) When many flows pass through it,
    a FIFO queue is unfair it favors the most
    greedy flow.
  3. It is hard to control the delay of packets
    through a network of FIFO queues.

Fairness
Delay Guarantees
75
Round Robin (RR)
  • RR avoids starvation
  • All sessions have the same weight and the same
    packet length

A
B
C
76
RR with variable packet length
A
B
C
But the Weights are equal !!!
77
Solution
A
B
C
78
Weighted Round Robin (WRR)
WA3 WB1 WC4
79
WRR non Integer weights
WA1.4 WB0.2 WC0.8
80
Weighted round robin
  • Serve a packet from each non-empty queue in turn
  • Can provide protection against starvation
  • It is easy to implement in hardware
  • Unfair if packets are of different length or
    weights are not equal
  • What is the Solution?
  • Different weights, fixed packet size
  • serve more than one packet per visit, after
    normalizing to obtain integer weights

81
Problems with Weighted Round Robin
  • Different weights, variable size packets
  • normalize weights by mean packet size
  • e.g. weights 0.5, 0.75, 1.0, mean packet sizes
    50, 500, 1500
  • normalize weights 0.5/50, 0.75/500, 1.0/1500
    0.01, 0.0015, 0.000666, normalize again 60,
    9, 4
  • With variable size packets, need to know mean
    packet size in advance
  • Fairness is only provided at time scales larger
    than the schedule

82
Fairness
10 Mb/s
0.55 Mb/s
A
1.1 Mb/s
100 Mb/s
C
R1
e.g. an http flow with a given (IP SA, IP DA, TCP
SP, TCP DP)
0.55 Mb/s
B
What is the fair allocation (0.55Mb/s,
0.55Mb/s) or (0.1Mb/s, 1Mb/s)?
83
Fairness
10 Mb/s
A
1.1 Mb/s
R1
100 Mb/s
D
B
0.2 Mb/s
What is the fair allocation?
C
84
Max-Min Fairness
  • The min of the flows should be as large as
    possible
  • Max-Min fairness for single resource
  • Bottlenecked (unsatisfied) connections share the
    residual bandwidth equally
  • Their share is gt the share held by the
    connections not constrained by this bottleneck

85
Max-Min FairnessA common way to allocate flows
  • N flows share a link of rate C. Flow f wishes to
    send at rate W(f), and is allocated rate R(f).
  • Pick the flow, f, with the smallest requested
    rate.
  • If W(f) lt C/N, then set R(f) W(f).
  • If W(f) gt C/N, then set R(f) C/N.
  • Set N N 1. C C R(f).
  • If Ngt0 goto 1.

86
Max-Min FairnessAn example
1
W(f1) 0.1
W(f2) 0.5
C
R1
W(f3) 10
W(f4) 5
  • Round 1 Set R(f1) 0.1
  • Round 2 Set R(f2) 0.9/3 0.3
  • Round 3 Set R(f4) 0.6/2 0.3
  • Round 4 Set R(f3) 0.3/1 0.3

87
Max-Min Fairness
  • How can an Internet router allocate different
    rates to different flows?
  • First, lets see how a router can allocate the
    same rate to different flows

88
Fair Queueing
  1. Packets belonging to a flow are placed in a FIFO.
    This is called per-flow queueing.
  2. FIFOs are scheduled one bit at a time, in a
    round-robin fashion.
  3. This is called Bit-by-Bit Fair Queueing.

Flow 1
Bit-by-bit round robin
Classification
Scheduling
Flow N
89
Weighted Bit-by-Bit Fair Queueing
  • Likewise, flows can be allocated different rates
    by servicing a different number of bits for each
    flow during each round.

Also called Generalized Processor Sharing (GPS)
90
Understanding bit by bit WFQ 4 queues,
sharing 4 bits/sec of bandwidth, Equal Weights
91
Understanding bit by bit WFQ 4 queues,
sharing 4 bits/sec of bandwidth, Equal Weights
Round 3
92
Understanding bit by bit WFQ 4 queues,
sharing 4 bits/sec of bandwidth, Weights 3221
93
Understanding bit by bit WFQ 4 queues,
sharing 4 bits/sec of bandwidth, Weights 3221
Round 1
Round 2
Weights 1111
94
Packetized Weighted Fair Queueing (WFQ)
  • Problem We need to serve a whole packet at a
    time.
  • Solution
  • Determine what time a packet, p, would complete
    if we served it flows bit-by-bit. Call this the
    packets finishing time, Fp.
  • Serve packets in the order of increasing
    finishing time.

Also called Packetized Generalized Processor
Sharing (PGPS)
95
WFQ is complex
  • There may be hundreds to millions of flows the
    linecard needs to manage a FIFO queue per each
    flow.
  • The finishing time must be calculated for each
    arriving packet,
  • Packets must be sorted by their departure time.
  • Most efforts in QoS scheduling algorithms is to
    come up with practical algorithms that can
    approximate WFQ!

1
Egress linecard
2
Calculate Fp
Find Smallest Fp
Departing packet
Packets arriving to egress linecard
3
N
96
When can we Guarantee Delays?
  • Theorem

If flows are leaky bucket constrained and all
nodes employ GPS (WFQ), then the network can
guarantee worst-case delay bounds to sessions.
97
Deterministic analysis of a router queueFIFO case
FIFO delay, d(t)
Cumulative bytes
Model of router queue
A(t)
D(t)
A(t)
D(t)
R
B(t)
B(t)
R
time
98
Flow 1
R(f1), D1(t)
A1(t)
Classification
WFQ Scheduler
Flow N
AN(t)
R(fN), DN(t)
Cumulative bytes
Key idea In general, we dont know the arrival
process. So lets constrain it.
A1(t)
D1(t)
R(f1)
time
99
Lets say we can bound the arrival process
r
Cumulative bytes
Number of bytes that can arrive in any period of
length t is bounded by This is called (s,r)
regulation
A1(t)
s
time
100
The leaky bucket (s,r) regulator
Tokens at rate, r
Token bucket size, s
Packets
Packets
One byte (or packet) per token
Packet buffer
101
(s,r) Constrained Arrivals and Minimum Service
Rate
Cumulative bytes
A1(t)
D1(t)
r
s
R(f1)
time
Theorem Parekh,Gallager 93 If flows are
leaky-bucket constrained,and routers use WFQ,
then end-to-end delay guarantees are possible.
Write a Comment
User Comments (0)
About PowerShow.com