Scalable Multimodule Switches with Quality of Service Thesis Defense - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Scalable Multimodule Switches with Quality of Service Thesis Defense

Description:

Related work: Formal methods in switching. Buffered Clos Switches ... Optimization problem: simplex method. Template Matchings: ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 38
Provided by: sant120
Category:

less

Transcript and Presenter's Notes

Title: Scalable Multimodule Switches with Quality of Service Thesis Defense


1
Scalable Multi-module Switches with Quality of
ServiceThesis Defense
  • Santosh Krishnan
  • sk_at_cs.columbia.edu
  • May 1, 2006
  • Advisor Prof. Henning G. Schulzrinne
  • Co-advisor Dr. Fabio M. Chiussi

2
Outline
  • Problem Definition
  • Motivations, list of contributions
  • Switching Model Components
  • Related work Formal methods in switching
  • Buffered Clos Switches
  • Concept of functional equivalence
  • BCS Throughput and Quality of Service
  • Single-path BCS CIOQ, aggregation, pipelining
  • Multi-path BCS Parallelization
  • Conclusions

3
Problem Definition
Goals
  • How to methodically construct a high-capacity
    switch?
  • How to design high-performance algorithms for
    such switches?

Importance
  • Physical layer improvements 10-G Ethernet,
    OC-768
  • Converged network requiring QoS IPTV, MPLS VPN
  • Case for modular design component reuse

What exists
  • Ad-hoc approach to switch design
  • No benchmarks, varying performance satisfaction
  • Non-blocking, 100 throughput, nominal capacity

4
Contributions
  • Taxonomy of multi-module switches Buffered Clos
    Switches
  • Performance framework Functional equivalence
    with ideal switch

Mimics circuit-switching rigor
Applications
Combined I/O Queueing
Aggregation
  • QoS Online maximal matching
  • Throughput Critical matching
  • Strict stability Maximal matching, SOQF
  • Switched Fair Airport matching
  • Shadow CIOQ and Decompose
  • Virtual Element Queueing

Pipelining
  • Striping and Equal Dispatch
  • Concurrent Dispatch 3D matching

Parallelization
  • Flow-based PPS Clos fitting
  • Cell-based PPS Striping, Equal Dispatch

Memory Space Memory
  • Combination methods
  • Recursive BCS

5
Switching Model
CPU
Slow Path
PPU
PPU
Switch Fabric
Outputs
PPU
PPU
Inputs
PPU
PPU
Fast Path
  • Basic property Contention
  • Flows Guaranteed QoS, Best-effort
  • Ideal Switch Provide bandwidth trunks, sustain
    link capacity
  • Black box for network engineering purposes

6
Switching Model Components
Memory Element
Space Element
Buffers
Matching 2D
Link Scheduling
Mesh
Conflict-free property Matching complexity
Constraints
Memory bandwidth Full-mesh circuitry
Monolithic
OQ Switch Ideal
IQ Switch
  • Architecture Interconnect memory and space
    elements
  • Algorithms Meaningfully emulate the ideal
    switch for throughput and QoS

7
Background Clos Networks
M
Outputs
Inputs- One circuit
Recognize
  • Space-time duality
  • Fitting matrix decomposition

K
  • Strictly non-blocking K 2M 1 (Clos theorem)
  • Re-arrangeable K M (Slepian-Duguid)

Fitting Algorithms
Inspiration Replace selected elements with memory
8
Background CIOQ Switches
Pro
  • Low memory bandwidth

Con
  • Complexity of matching
  • Switch size
  • Frequency
  • Reconfiguration rate

Queue State
Configuration
0
0
1
5
3
0
  • Offline Templates
  • Maximum, Maximal, Critical
  • Heuristics

1
0
0
0
1
7
0
1
5
0
0
0
What performance results when applied to a
changing queue state?
9
Background CIOQ Switch Results
Based on combinatorics and stability theory
QoS
(Weller-Hajek 97)
Throughput
Auxiliary Results Envelope matching (Kar 00),
Packet-mode matching (Marsan 02)
10
Framework Buffered Clos Switches
Parallelize Pool memory resources
PPS
Definition
  • Switch size
  • Type of elements
  • Number in first stage
  • Number in second
  • Speedup

Aggregate Smaller elements
CIOQ-A, G-MSM
Pipeline Lower speed, complexity
CIOQ-P, G-MSM
  • Isomorphism Non-blocking Clos network
  • Properties Multi-stage, fully connected,
    symmetric, uniform

11
Framework Functional Equivalence
Characterize relative performance Functional
equivalence
f1 Allocate known rates
Shape Bandwidth trunks
f2 Relative stability for admissible traffic
Literature 100 throughput
f3 Per-output relative stability
Work conserving
f4 Strict relative stability all pairs
f5 Exact emulation
  • Emulate an ideal switch exact, asymptotic
  • Bandwidth trunks, independent throughput
    optimization

12
CIOQ Bandwidth Trunks
Shaping plus online matching is sufficient for
bandwidth guarantees
BVN Templates
Offline
Rate Matrix
Cons Template Storage Centralized rate processing
Online
Weight Scheduler
Arbitrary Arrivals
Shape/Batch VOQ
Online Maximal (s2) Online Critical (s1)
Split time into intervals T GCD (R) Batch
traffic in each interval Simple counters
  • Extension of Weller-Hajek maximal matching
    theorem
  • Clos analogy Maximal matching as a strategy for
    orderly assignments

13
CIOQ Admissible Traffic
Best Throughput Results
  • No speedup MWM (McKeown et al.), Speedup 2
    Maximal (Dai-Prabhakar)
  • Can a simple maximum size matching suffice for
    admissible traffic?

Red Herring!
Critical matching suffices for asymptotic 100
throughput (f2)
6
3
0
6
3
0
Augment
MSM
0
1
7
1
1
7
Queue State
Critical Matching
5
0
5
2
0
2
Intuition 2x2 Line buckets
R1
R2
C1
C2
Max
14
CIOQ Strict Relative Stability
  • Maximal matching Keeps under-subscribed outputs
    stable (f3) (s2)
  • Shortest Output-Queue First (f4) (s3)
  • Output element scheduler Identical to the one in
    emulated switch
  • Intuition Give preference to less congested
    pairs at the output
  • Asymptotic emulation of an ideal switch
    long-term fairness

15
Switched Fair Airport
  • Integrate two policies M1 and M2
  • M1 Provides bandwidth trunks given rate
    reservations
  • M2 Optimize throughput independent of above
    rates

Multi-phase Combination
Exclusive Combination
Speedup Required
M1
M2
Maximal matching is additive to any other policy,
hence needs the least speedup
16
CIOQ-A Aggregation
Advantages
Smaller space element Lower arbitration
complexity Heterogeneous subports
  • Shadow-Decompose CIOQ emulation (f5)
  • VEQ Matching Less complex, only for admissible
    traffic (f2)

17
CIOQ-P Pipelining
  • Sequential Dispatch CIOQ emulation (f5)
  • Concurrent Dispatch
  • Limited candidates stale-state issues
  • 3D Maximal Matching for relative stability
  • Striping Shadow on envelope basis
  • Equal Dispatch
  • Explicitly equalize load
  • Separate occupancy counters for each SE

Implement arbitrarily complex policies!
Advantages
Slower space element Lower arbitration complexity
18
G-MSM Combination
Combination methods CIOQ-A/P No need for
independent analysis Recursion possible
19
PPS Architecture
Core
Advantages
Demux
Mux
Reuse low-capacity core switch Implement
arbitrarily slow memories!
provided
Memoryless first and third stages Performance
Emulates OQ switch
  • Pool the resources on several switching paths
  • Dual of a CIOQ-P switch
  • Matching algorithm replaced by load balancing
  • Sequence control might be necessary

20
PPS Flow-based
  • Model for clustered routers
  • Per-flow path assignment explicit or hashed
  • No need for sequence control
  • Memory in first stage
  • High speedup (Clos fitting)
  • Unbalanced load assignment
  • Requires knowledge of loads

Split flows
21
PPS Cell-based
  • Uniformly distribute the load of each flow
  • Premise Each core element receives 1/K cells of
    each flow
  • Equal dispatch and striping suffice for
    asymptotic OQ emulation
  • Bandwidth trunks Large buffers required

22
Summary A Recipe Book
  • Taxonomy of multi-module switches Buffered Clos
    Switches
  • Performance framework Functional equivalence
    with ideal switch

Applications
Combined I/O Queueing
Aggregation
  • QoS Online maximal matching
  • Throughput Critical matching
  • Strict stability Maximal matching, SOQF
  • Switched Fair Airport matching
  • Shadow and Decompose
  • Virtual Element Queueing

Pipelining
  • Striping and Equal Dispatch
  • Concurrent Dispatch 3D matching

Parallelization
  • Flow-based PPS Clos fitting
  • Cell-based PPS Striping, Equal Dispatch

Memory Space Memory
  • Combination methods
  • Recursive BCS

23
Avenues for Follow-on Research
  • Efficient policies for multicast
  • Similar treatment on other interconnection
    networks
  • Theory of backpressure
  • Recent interest in buffered crossbars
  • Quality of stability Average delay analysis
  • Short-timescale equivalence
  • Emulation of a finite-memory ideal switch
  • Interplay of buffer management with matching
    algorithms

24
Supporting Slides
25
Relevant Publications
  • Dynamic Partitioning Switch Memory Management,
    Infocom 99
  • Packet Switches with QoS Support, Hot
    Interconnects 00
  • Feedback Control for Distributed Scheduling,
    Globecomm 00
  • Buffered Clos Switches, Columbia TR 02
  • Inverse Multiplexing for Switches, Globecom 98
  • Switched Connections Inverse Multiplexing, Intl.
    Conf. ATM 99
  • Recognition of Parallel Packet Switches, GBN,
    Infocom 01
  • Stability Analysis of Parallel Packet Switches,
    ICC 01
  • Open-loop Schemes for Multi-path Switches, ICC 03

Switching Algorithms
Parallel Switches
26
Proposal Conjectures
Proposal six conjectures
  • Maximal matching is sufficient to isolate
    oversubscribed outputs DONE
  • SOQF is sufficient for strict relative stability
    DONE
  • Equal dispatch for strict stability in CIOQ-P
    DONE
  • Equal dispatch plus decomposition for strict
    stability in G-MSM DONE
  • Rate shaping plus maximal matching suffices for
    QoS in CIOQ DONE
  • SOQF suffices for long-term fairness in CIOQ
    DONE
  • Plus many more to round out the work

27
Additional Contributions
Background Survey of formal methods in
switching a new perspective
Applications
Combined I/O Queueing
Aggregation
  • Maximal Matching Delay analysis
  • Perfect Sequences Uniform Traffic
  • Multicast support using Recycling
  • Batch Decomposition (Optical)
  • Support for Heterogeneous Subports

Pipelining
  • Concurrent Dispatch BVN and SPS

Parallelization
  • SMM Switches PPS without backpressure
  • Fractional Dispatch for memoryless inputs

28
Matching Flavors
  • Maximal matching Non-idling, greedy
  • Maximum-size matching Maximum flow in a
    bipartite graph
  • Ford-Fulkerson, Hopcroft-Karp

Invariant
3
0
6
At least one connection in the marked lines
0
7
1
Queue State
Non-empty
5
0
0
29
Matching Flavors (continued)
  • Critical Matching Covers all critical rows and
    columns
  • Critical line A line with the maximum sum
  • Perfect Matching Each configuration is a
    permutation
  • Maximum Weight Matching Use queue length as
    weights
  • Optimization problem simplex method
  • Template Matchings
  • BVN Decompose rate matrix as convex combination
    of permutations
  • Double Lower number of permutations, wasted
    slots
  • Min N permutations will cover all entries, large
    number of wasted slots
  • Stable Matching Gale-Shapely algorithm

30
Stability Theory
  • Lyapunov functions Kumar-Meyn 95
  • Mechanism to extend Fosters criterion to a
    system of queues
  • Weighted cartesian product of queue lengths
  • Symmetric and co-positive
  • Fluid limits Dai-Prabhakar 00
  • Function of discrete time Interpolate
  • Limit Scale time to infinity
  • The scaling parameter may be drawn from an
    increasing sequence rn

F(t) lim 1/r f(rt)
r 8
31
CIOQ Bandwidth Trunks
Arrivals into GQ Bounded admissible
Bandwidth Trunk Timescale 1/GCD(R)
Covers all entries in GQ before next batch
  • Delay comparable to BVN rate decomposition

32
CIOQ Perfect Sequences
  • Sub-maximal Perfect Sequence
  • A sequence of N permutations that covers the unit
    matrix
  • A repeating sequence guarantees 1/N to each pair
  • Suffices for 100 throughput to uniform traffic
  • Simple implementation Staggered round-robin
  • Not even maximal!

Concurrent SPS for CIOQ-P K turns in KN slots
Basis for iSLIP Basis for Atlanta arbitration
33
Hierarchical Scheduling
34
CIOQ-P Equal Dispatch
Explicitly equalize the load for each
input-output pair
Implemented as counters No mis-sequencing issues
35
CIOQ-P 3D Maximal Matching
Concurrent traversal of queue state matrix
Pointers do not coincide with each other
36
Recursive G-MSM
Any matching
SPS
SPS
Memory element of a G-MSM Replace with a CIOQ
switch
Virtual Element Queues Organized per space element
37
PPS Data Path
Write a Comment
User Comments (0)
About PowerShow.com