Title: Can we make these scheduling algorithms simpler? Using a Simpler Architecture
1Can we make these scheduling algorithms
simpler?Using a Simpler Architecture
2Buffered Crossbar Switches
- A buffered crossbar switch is a switch with
buffered fabric (memory inside the crossbar). - A pure buffered crossbar switch architecture, has
only buffering inside the fabric and none
anywhere else. - Due to HOL blocking problem, VOQ are used in the
input side.
3Buffered Crossbar Architecture
Output Card
Output Card
Output Card
2
N
1
4Scheduling Process
- Scheduling is divided into three steps
- Input scheduling each input selects in a certain
way one cell from the HoL of an eligible queue
and sends it to the corresponding internal
buffer. - Output scheduling each output selects in a
certain way from all internally buffered cells in
the crossbar to be delivered to the output port. - Delivery notifying for each delivered cell,
inform the corresponding input of the internal
buffer status.
5Advantages
- Total independence between input and output
arbiters (distributed design) (1/N complexity as
compared to centralized schedulers) - Performance of Switch is much better (because
there is much less output contention) a
combination of IQ and OQ switches - Disadvantage Crossbar is more complicated
6I/O Contention Resolution
1
2
3
4
1
2
3
4
7I/O Contention Resolution
1
2
3
4
1
2
3
4
8The Round Robin Algorithm
- InRr-OutRr
- Input scheduling InRr (Round-Robin)
- - Each input selects the next eligible VOQ,
based on its highest priority pointer, and sends
its HoL packet to the internal buffer. -
- Output scheduling OutRr (Round-Robin)
- - Each output selects the next nonempty internal
buffer, based on its highest priority pointer,
and sends it to the output link.
9Input Scheduling (InRr.)
1
2
3
4
1
2
3
4
10Output Scheduling (OutRr.)
1
4 1 3 2
2
4 1 3 2
3
4 1 3 2
4
4 1 3 2
4
1
2
3
11Out. Ptrs Updt Notification delivery
4
1
2
3
12Performance study
Delay/throughput under Bernoulli Uniform and
Burtsy Uniform Stability performance
13Bernoulli Uniform Arrivals
14Bursty Uniform Arrivals
15Scheduling Process
- Because the arbitration is simple
- We can afford to have algorithms based on
weights for example (LQF, OCF). - We can afford to have algorithms that provide QoS
16Buffered Crossbar Solution Scheduler
- The algorithm MVF-RR is composed of two parts
- Input scheduler MVF (most vacancies first)
- Each input selects the column of internal
buffers (destined to the same output) where there
are most vacancies (non-full buffers). - Output scheduler Round-robin
- Each output chooses the internal buffer which
appears next on its static round-robin schedule
from the highest priority one and updates the
pointer to 1 location beyond the chosen one.
17Buffered Crossbar Solution Scheduler
- The algorithm ECF-RR is composed of two parts
- Input scheduler ECF (empty column first)
- Each input selects first empty column of
internal buffers (destined to the same output).
If there is no empty column, it selects on a
round-robin basis. - Output scheduler Round-robin
- Each output chooses the internal buffer which
appears next on its static round-robin schedule
from the highest priority one and updates the
pointer to 1 location beyond the chosen one.
18Buffered Crossbar Solution Scheduler
- The algorithm RR-REMOVE is composed of two parts
- Input scheduler Round-robin (with
remove-request signal sending) - Each input chooses non-empty VOQ which appears
next on its static round-robin schedule from the
highest priority one and updates the pointer to
1 location beyond the chosen one. It then sends
out at most one remove-request signal to outputs - Output scheduler REMOVE
- For each output, if it receives any
remove-request signals, it chooses one of them
based on its highest priority pointer and removes
the cell. If no signal is received, it does
simple round-robin arbitration.
19Buffered Crossbar Solution Scheduler
- The algorithm ECF-REMOVE is composed of two
parts - Input scheduler ECF (with remove-request signal
sending) - Each input selects first empty column of
internal buffers (destined to the same output).
If there is no empty column, it selects on a
round-robin basis.It then sends out at most one
remove-request signal to outputs - Output scheduler REMOVE
- For each output, if it receives any
remove-request signals, it chooses one of them
based on its highest priority pointer and removes
the cell. If no signal is received, it does
simple round-robin arbitration.
20Hardware Implementation of ECF-RR An Input
Scheduling Block
21Performance Evaluation Simulation Study
Uniform Traffic
22Performance Evaluation Simulation Study
Load Load 0.5 0.6 0.7 0.8 0.9 0.95 0.99
Improvement Percentage 1 1 3 6 13 17 12
Normalized Improvement Percentage 1 1 3 6 12 15 11
Improvement Factor 1.01 1.01 1.03 1.06 1.13 1.17 1.12
ECF-REMOVe over RR-RR
23Performance Evaluation Simulation Study
Bursty Traffic
24Performance Evaluation Simulation Study
Load Load 0.5 0.6 0.7 0.8 0.9 0.95 0.99
Improvement Percentage 10 13 16 20 22 18 11
Normalized Improvement Percentage 9 12 14 16 18 16 10
Improvement Factor 1.10 1.13 1.16 1.20 1.22 1.18 1.11
ECF-REMOVe over RR-RR
25Performance Evaluation Simulation Study
Hotspot Traffic
26Performance Evaluation Simulation Study
Load Load 0.31 0.36 0.41 0.45 0.49 0.51
Improvement Percentage 0.2 0.3 0.5 0.8 1 0.7
Normalized Improvement Percentage 0.2 0.3 0.5 0.8 1 0.7
Improvement Factor 1.002 1.003 1.005 1.008 1.01 1.007
ECF-REMOVe over RR-RR
27Quality of Service Mechanisms for
Switches/Routers and the Internet
28Recap
- High-Performance Switch Design
- We need scalable switch fabrics crossbar,
bit-sliced crossbar, Clos networks. - We need to solve the memory bandwidth problem
- Our conclusion is to go for input
queued-switches - We need to use VOQ instead of FIFO queues
- For these switches to function at high-speed, we
need efficient and practically implementable
scheduling/arbitration algorithms
29Algorithms for VOQ Switching
- We analyzed several algorithms for matching
inputs and outputs - Maximum size matching these are based on
bipartite maximum matching which can be solved
using Max-flow techniques in O(N2.5) - These are not practical for high-speed
implementations - They are stable (100 throughput for uniform
traffic) - They are not stable for non-uniform traffic
- Maximal size matching they try to approximate
maximum size matching - PIM, iSLIP, SRR, etc.
- These are practical can be executed in parallel
in O(logN) or even O(1) - They are stable for uniform traffic and unstable
for non-uniform traffic
30Algorithms for VOQ Switching
- Maximum weight matching These are maximum
matchings based weights such queue length (LQF)
(LPF) or age of cell (OCF) with a complexity of
O(N3logN) - These are not practical for high-speed
implementations. Much more difficult to implement
than maximum size matching - They are stable (100 throughput) under any
admissible traffic - Maximal weight matching they try to approximate
maximum weight matching. They use RGA mechanism
like iSLIP - iLQF, iLPF, iOCF, etc.
- These are somewhat practical can be executed
in parallel in O(longN) or even O(1) like iSLIP
BUT the arbiters are much more complex to build - They are recently shown to be stable under any
admissible traffic
31Algorithms for VOQ Switching
- Randomized algorithms
- They try in a smart way to approximate maximum
weight matching by avoiding using an iterative
process - They are stable under any admissible traffic
- Their time complexity is small (depending on the
algorithm) - Their hardware complexity is yet untested.
- No schedulers deal with mis-sequencing of
packets - Distributed schedulers buffered crossbars
- Two important points to remember
- The time complexity of an algorithm is not a
true indication of its hardware implementation - 100 throughput does not mean a low delay
- Weak vs. Strong stability
32VOQ Algorithms and Delay
- But, delay is key
- Because users dont care about throughput alone
- They care (more) about delays
- Delay QoS ( for the network operator)
- Why is delay difficult to approach theoretically?
- Mainly because it is a statistical quantity
- It depends on the traffic statistics at the
inputs - It depends on the particular scheduling algorithm
used - The last point makes it difficult to analyze
delays in i /q switches - For example in VOQ switches, it is almost
impossible to give any guarantees on delay. - All you can hope for is to have a high throughput
and a bounded queue length bounded average
delay (but even the bound on the queue length is
beyond the control of the algorithm we cannot
say that the length of the queue should not be
more than 10).
33VOQ Algorithms and Delay
- This does not mean that we cannot have an
algorithm that can do that. It means there exist
none at this moment. - For this exact reason, almost all quality of
service schemes (whether for delay or bandwidth
guarantees) assume that you have an output-queued
switch
34VOQ Algorithms and Delay
- WHY Because an OQ switch has no fabric
scheduling/arbitration algorithm. - Delay simply depends on traffic statistics
- Researchers have shown that you can provide a lot
of QoS algorithms (like WFQ) using a single
server and based on the traffic statistics - But, OQ switches are extremely expensive to build
- Memory bandwidth requirement is very high
- These QoS scheduling algorithms have little
practical significance for scalable and
high-performance switches/routers.
35Output QueueingThe ideal
36How to get good delay cheaply?
- Enter speedup
- The fabric speedup for an IQ switch equals 1
(mem. bwdth. 2) - The fabric speedup for an OQ switch equals N
(mem. Bwdth. N1) - Suppose we consider switches with fabric speedup
of S, 1 lt S ltlt N - Such switch will require buffers both at the
input and the output - ? call these combined input- and output-queued
(CIOQ) switches - Such switches could help if
- With very small values of S
- We get the performance both delay and
throughput of an OQ switch
37A CIOQ switch
- Consist of
- An (internally non-blocking, e.g. crossbar)
fabric with speedup S gt 1 - Input and output buffers
- A scheduler to determine matchings
38A CIOQ switch
- For concreteness, suppose S 2. The operation of
the switch consists of - Transferring no more than 2 cells from (to) each
input (output) - Logically, we will think of each time slot as
consisting of two phases - Arrivals to (departures from) switch occur at
most once per time slot - The transfer of cells from inputs to outputs can
occur in each phase
39Using Speedup
40Performance of CIOQ switches
- Now that we have a higher speedup, do we get a
handle on delay? - Can we say something about delay (e.g., every
packet from a given flow should below 15 msec)? - There is one way of doing this competitive
analysis - ? the idea is to compete with the performance of
an OQ switch
41Intuition
Speedup 1
Fabric throughput .58
Ave I/p queue too large
Speedup 2
Fabric throughput 1.16
Ave I/p queue 6.25
42Intuition (continued)
Speedup 3
Fabric throughput 1.74
Ave I/p queue 1.35
Speedup 4
Fabric throughput 2.32
Ave I/p queue 0.75
43Performance of CIOQ switches
- The setup
- Under arbitrary, but identical inputs
(packet-by-packet) - Is it possible to replace an OQ switch by a CIOQ
switch and schedule the CIOQ switch so that the
outputs are identical packet-by-packet? To
exactly mimick an OQ switch - If yes, what is the scheduling algorithm?
44What is exact mimicking?
Apply same inputs to an OQ and a CIOQ switch
Obtain same outputs
45What is exact mimicking?
Why is a speedup of N not necessary?
It is useless to bring all packets to the output
if they need wait at the output.
Need to bring packets at the output before they
can leave.
46Consequences
- Suppose, for now, that a CIOQ is competitive wrt
an OQ switch. Then - We get perfect emulation of an OQ switch
- This means we inherit all its throughput and
delay properties - Most importantly all QoS scheduling algorithms
originally designed for OQ switches can be
directly used on a CIOQ switch - But, at the cost of introducing a scheduling
algorithm which is the key
47Emulating OQ Switches with CIOQ
- Consider an N x N switch with (integer) speedup S
gt 1 - Were going to see if this switch can emulate an
OQ switch - Well apply the same inputs, cell-by-cell, to
both switches - Well assume that the OQ switch sends out packets
in FIFO order - And well see if the CIOQ switch can match cells
on the output side
48Key concept Urgency
OQ switch
- Urgency of a cell at any time its departure
time - current time - It basically indicates the time that this packet
will depart the OQ switch - This value is decremented after each time slot
- When the value reaches 0, it must depart (it is
at the HoL of the output queues)
49Key concept Urgency
- Algorithm Most urgent cell first (MUCF). In each
phase - Outputs try to get their most urgent cells from
inputs. - Input grant to output whose cell is most urgent.
- In case of ties, output i takes priority over
output i k. - Loser outputs try to obtain their next most
urgent cell from another (unmatched) input. - When no more matchings are possible, cells are
transferred.
50Key concept Urgency - Example
- At the beginning of phase 1, both outputs 1 and 2
request input 1 to obtain their most urgent cells - Since there is a tie, then input 1 grants to
output 1 (give it to least port ). - Output 2 proceeds to get its next most urgent
cell (from input 2 and has urgency of 3)
51Key concept Urgency
- Observation A cell is not forwarded from input
to output for one of two (and only two) reasons - Input contention its input sends a more urgent
cell - (output 2 cannot receive its most urgent cell in
phase 1 because input 1 wants to send to output 1
a more urgent cell) - Output contention its output receives a more
urgent cell (Input 2 cannot send its most urgent
cell because output 3 wants to receive from input
3)
52Implementing MUCF
- The way in which MUCF matches inputs to outputs
is similar to the stable marriage problem (SMP) - The SMP finds stable matchings in bipartite
graphs - There are N women and N men
- Each woman (man) ranks each man (woman) in order
of preference for marriage
53The Gale-Shapley Algorithm (GSA)
- What is a stable matching?
- A matching is a pairing (i, p(i)) of i with their
partner p(i) - An unstable matching is one in which
- there are matched pairs (i,p(i)) and (j, p(j))
such that - i prefers p(j) to p(i), and p(j) prefers i to j
- The GSA algorithm is guaranteed to give a stable
matching - Its complexity is O(N2)
54An example
- Consider the example we have already seen
- Executing GSA
- With men proposing we get the matching
- (1, 1), (2, 4), (3, 2), (4, 3) this takes 7
proposals (iterations) - With women proposing we get the matching
- (1, 1), (2, 3), (3, 2), (4, 4) this takes 7
proposals (iterations) - Both matchings are stable
- The first is man-optimal men get the best
partners of any stable matching - Likewise the second is woman-optimal
55Implementing MUCF by the GSA
- MUCF can be implemented using the GSA algorithm
with preference list as follows - Output j assigns a preference value to each input
i, equal to the urgency of the cell at the HoL of
VOQij - If VOQij is empty then the preference value of
input I for output j is set to infiniti - The preference list of the output is the ordered
set of its preference values for each input - Each input assigns a preference value for each
output based on the urgency of the cells, and
creates the preference list accordingly
56Theorem
- A CIOQ switch with a speedup of 4 operating under
the MUCF algorithm exactly matches cells with
FIFO output-queued switch. - This is true even for Non-FIFO OQ scheduling
schemes (e.g., WFQ, strict priority, etc.) - We can achieve similar results with S 2
57Implementation - a closer look
Main sources of difficulty
(and communicating this info among I/ps and O/ps)
- Matching process - too many iterations?
Estimating urgency depends on what is being
emulated
- FIFO, Strict priorities - no problem
58Other Work
Relax stringent requirement of exact emulation
- Least Occupied O/p First Algorithm (LOOFA)
Keeps outputs always busy if there are packets
- A lot of work has been done using this direction
Conclusion We must have a speedup if we want to
approach the performance of OQ switches, or
provide QoS
59QoS Scheduling Algorithms
60QoS Differentiation Two options
- Stateful (per flow)
- IETF Integrated Services (Intserv)/RSVP
- Stateless (per class)
- IETF Differentiated Services (Diffserv)
61The Building Blocks May contain more functions
- Classifier
- Shaper
- Policer
- Scheduler
- Dropper
62QoS Mechanisms
- Admission Control
- Determines whether the flow can/should be allowed
to enter the network. - Packet Classification
- Classifies the data based on admission control
for desired treatment through the network - Traffic Policing
- Measures the traffic to determine if it is out of
profile. Packets that are determined to be
out-of-profile can be dropped or marked
differently (so they may be dropped later if
needed) - Traffic Shaping
- Provides some buffering, therefore delaying some
of the data, to make sure the traffic fits into
the profile (may only effect bursts or all
traffic to make it similar to Constant Bit Rate) - Queue Management
- Determines the behavior of data within a queue.
Parameters include queue depth, drop policy - Queue Scheduling
- Determines how different queues empty onto the
outbound link
63QoS Router
Queue management
Policer
Per-flow Queue
Scheduler
Classifier
shaper
Policer
Per-flow Queue
Per-flow Queue
Scheduler
shaper
Per-flow Queue
64Queue Scheduling Algorithms
65Scheduling at the output link of an OQ Switch
- Sharing always results in contention
- A scheduling discipline resolves contention
- Decide when and what packet to send on the output
link - Usually implemented at output interface
- Scheduling is a Key to fairly sharing resources
and providing performance guarantees
66Output Scheduling
Allocating output bandwidth Controlling packet
delay
scheduler
67Types of Queue Scheduling
- Strict Priority
- Empties the highest priority non-empty queue
first, before servicing lower priority queues. It
can cause starvation of lower priority queues. - Round Robin
- Services each queue by emptying a certain amount
of data and then going to the next queue in
order. - Weighted Fair Queuing (WFQ)
- Empties an amount of data from a queue based on
the relative weight for the queue (driven by
reserved bandwidth) before servicing the next
queue. - Earliest Deadline First
- Determines the latest time a packet must leave to
meet the delay requirements and service the
queues in that order
68Scheduling Deterministic Priority
- Packet is served from a given priority level only
if no packets exist at higher levels (multilevel
priority with exhaustive service) - Highest level gets lowest delay
- Watch out for starvation!
- Usually map priority levels to delay classes
- Low bandwidth urgent messages
- Realtime
- Non-realtime
Priority
69Scheduling Work conserving vs.
non-work-conserving
- Work conserving discipline is never idle when
packets await service - Why bother with non-work conserving? (sometimes
useful for example to minimize delay jitter)
70Scheduling Requirements
- An ideal scheduling discipline
- is easy to implement (preferably in hardware)
- is fair (each connection gets no more than what
it wants. - The excess, if any, is equally shared)
- provides performance bounds (Can be deterministic
or statistical) Common parameters are - bandwidth
- delay
- delay-jitter
- Loss
- allows easy admission control decisions (Choice
of scheduling discipline affects ease of
admission control algorithm) - to decide whether a new flow can be allowed
71Scheduling No Classification
- This is the simplest possible. But we cannot
provide any guarantees. - With FIFO queues, if the depth of the queue is
not bounded, there very little that can be done - We can perform preferential dropping
- We can use other service disciplines on a single
queue (e.g., EDF)
72Scheduling Class Based Queuing
- At each output port, packets of the same class
are queued at distinct queues. - Service disciplines within each queue can vary
(e.g., FIFO, EDF, etc.). Usually it is FIFO - Service disciplines between classes can vary as
well (e.g., strict priority, some kind of
sharing, etc.)
73Per Flow Packet Scheduling
- Each flow is allocated a separated virtual
queue - Lowest level of aggregation
- Service disciplines between the flows vary (FIFO,
SP, etc.)
flow 1
flow 2
Classifier
Scheduler
1
2
flow n
Buffer management
74The problems caused by FIFO queues in routers
- In order to maximize its chances of success, a
source has an incentive to maximize the rate at
which it transmits. - (Related to 1) When many flows pass through it,
a FIFO queue is unfair it favors the most
greedy flow. - It is hard to control the delay of packets
through a network of FIFO queues.
Fairness
Delay Guarantees
75Round Robin (RR)
- RR avoids starvation
- All sessions have the same weight and the same
packet length
A
B
C
76RR with variable packet length
A
B
C
But the Weights are equal !!!
77Solution
A
B
C
78Weighted Round Robin (WRR)
WA3 WB1 WC4
79WRR non Integer weights
WA1.4 WB0.2 WC0.8
80Weighted round robin
- Serve a packet from each non-empty queue in turn
- Can provide protection against starvation
- It is easy to implement in hardware
- Unfair if packets are of different length or
weights are not equal - What is the Solution?
- Different weights, fixed packet size
- serve more than one packet per visit, after
normalizing to obtain integer weights
81Problems with Weighted Round Robin
- Different weights, variable size packets
- normalize weights by mean packet size
- e.g. weights 0.5, 0.75, 1.0, mean packet sizes
50, 500, 1500 - normalize weights 0.5/50, 0.75/500, 1.0/1500
0.01, 0.0015, 0.000666, normalize again 60,
9, 4 - With variable size packets, need to know mean
packet size in advance - Fairness is only provided at time scales larger
than the schedule
82Fairness
10 Mb/s
0.55 Mb/s
A
1.1 Mb/s
100 Mb/s
C
R1
e.g. an http flow with a given (IP SA, IP DA, TCP
SP, TCP DP)
0.55 Mb/s
B
What is the fair allocation (0.55Mb/s,
0.55Mb/s) or (0.1Mb/s, 1Mb/s)?
83Fairness
10 Mb/s
A
1.1 Mb/s
R1
100 Mb/s
D
B
0.2 Mb/s
What is the fair allocation?
C
84Max-Min Fairness
- The min of the flows should be as large as
possible - Max-Min fairness for single resource
- Bottlenecked (unsatisfied) connections share the
residual bandwidth equally - Their share is gt the share held by the
connections not constrained by this bottleneck
85Max-Min FairnessA common way to allocate flows
- N flows share a link of rate C. Flow f wishes to
send at rate W(f), and is allocated rate R(f). - Pick the flow, f, with the smallest requested
rate. - If W(f) lt C/N, then set R(f) W(f).
- If W(f) gt C/N, then set R(f) C/N.
- Set N N 1. C C R(f).
- If Ngt0 goto 1.
86Max-Min FairnessAn example
1
W(f1) 0.1
W(f2) 0.5
C
R1
W(f3) 10
W(f4) 5
- Round 1 Set R(f1) 0.1
- Round 2 Set R(f2) 0.9/3 0.3
- Round 3 Set R(f4) 0.6/2 0.3
- Round 4 Set R(f3) 0.3/1 0.3
87Max-Min Fairness
- How can an Internet router allocate different
rates to different flows? - First, lets see how a router can allocate the
same rate to different flows
88Fair Queueing
- Packets belonging to a flow are placed in a FIFO.
This is called per-flow queueing. - FIFOs are scheduled one bit at a time, in a
round-robin fashion. - This is called Bit-by-Bit Fair Queueing.
Flow 1
Bit-by-bit round robin
Classification
Scheduling
Flow N
89Weighted Bit-by-Bit Fair Queueing
- Likewise, flows can be allocated different rates
by servicing a different number of bits for each
flow during each round.
Also called Generalized Processor Sharing (GPS)
90Understanding bit by bit WFQ 4 queues,
sharing 4 bits/sec of bandwidth, Equal Weights
91Understanding bit by bit WFQ 4 queues,
sharing 4 bits/sec of bandwidth, Equal Weights
Round 3
92Understanding bit by bit WFQ 4 queues,
sharing 4 bits/sec of bandwidth, Weights 3221
93Understanding bit by bit WFQ 4 queues,
sharing 4 bits/sec of bandwidth, Weights 3221
Round 1
Round 2
Weights 1111
94Packetized Weighted Fair Queueing (WFQ)
- Problem We need to serve a whole packet at a
time. - Solution
- Determine what time a packet, p, would complete
if we served it flows bit-by-bit. Call this the
packets finishing time, Fp. - Serve packets in the order of increasing
finishing time.
Also called Packetized Generalized Processor
Sharing (PGPS)
95WFQ is complex
- There may be hundreds to millions of flows the
linecard needs to manage a FIFO queue per each
flow. - The finishing time must be calculated for each
arriving packet, - Packets must be sorted by their departure time.
- Most efforts in QoS scheduling algorithms is to
come up with practical algorithms that can
approximate WFQ!
1
Egress linecard
2
Calculate Fp
Find Smallest Fp
Departing packet
Packets arriving to egress linecard
3
N
96When can we Guarantee Delays?
If flows are leaky bucket constrained and all
nodes employ GPS (WFQ), then the network can
guarantee worst-case delay bounds to sessions.
97Deterministic analysis of a router queueFIFO case
FIFO delay, d(t)
Cumulative bytes
Model of router queue
A(t)
D(t)
A(t)
D(t)
R
B(t)
B(t)
R
time
98Flow 1
R(f1), D1(t)
A1(t)
Classification
WFQ Scheduler
Flow N
AN(t)
R(fN), DN(t)
Cumulative bytes
Key idea In general, we dont know the arrival
process. So lets constrain it.
A1(t)
D1(t)
R(f1)
time
99Lets say we can bound the arrival process
r
Cumulative bytes
Number of bytes that can arrive in any period of
length t is bounded by This is called (s,r)
regulation
A1(t)
s
time
100The leaky bucket (s,r) regulator
Tokens at rate, r
Token bucket size, s
Packets
Packets
One byte (or packet) per token
Packet buffer
101(s,r) Constrained Arrivals and Minimum Service
Rate
Cumulative bytes
A1(t)
D1(t)
r
s
R(f1)
time
Theorem Parekh,Gallager 93 If flows are
leaky-bucket constrained,and routers use WFQ,
then end-to-end delay guarantees are possible.