Title: Emerging Technologies for the New Internet Meeting the Challenge of Exponential Growth
1Emerging Technologies for the New
InternetMeeting the Challenge of Exponential
Growth
Jonathan Turner jst_at_cs.wustl.edu http//www.arl.wu
stl.edu/jst
2Is IP the Last Word in Networking?
3Challenges Ahead
- Near term - coping with deficiencies of IP
- scaling limitations
- inefficiencies
- limitations on applications
- inadequate tools for management and
administration - Medium term - extending the Internet service
model - development of broadband applications
- programmable network services
- active networking or signalling-based network
services - need more effective network services API
- Long term - the continuing quest for bandwidth
- can optical switching make a difference?
- finding the right combination of optics and
electronics
4Coping with IPs Deficiencies
- Reasons for IPs success
- the Internet
- runs over anything - Ethernet
- easy to build basic router
- minimal coordination needed for basic operation
- advances in electronics
- BSD Unix
- Cisco
- religious fanatics
- Deficiencies
- limited address space
- strong bias toward software implementation
- no effective support for QoS
- ineffective routing algorithms
- excessive memory demands
- fundamentally unscalable multicast model
- no support for security
- lack of reliable flow state
5Coping with Limited Address Space
- Fixes past
- subnet masks
- classless interdomain routing
- fast algorithms for best matching prefix
- Fixes present
- Network Address Translators (NAT)
- Fixes future??
- Network Address Translation at ISP level? country
level? - full employment for packet classification
algorithm designers - Signaling protocol with expanded session-level
addresses - packet header becomes key for finding stored
state information - IPv6?
6Routing and Bandwidth Reservation
- Problem
- IP routing protocols do poor job of distributing
load - excessive overload on some links
- sudden massive changes in load distribution when
routes change - applications cannot get dependable bandwidth
- no effective mechanism for bandwidth reservation
- need to route flows, not packets
- Solution
- signaling protocol for flow establishment
- select path for flow based on bandwidth needs and
availability - retain datagram routing for default packet
handling - multi-path routing algorithms
- route traffic where there is bandwidth available
- take advantage of multiple paths
- avoid rapid traffic changes when routes change
7Excessive Memory Requirements
- Problem - TCP flow control
- synchronizes transmission rates of different
flows - wide fluctuations in network bandwidth
- large buffers needed for efficient operation
- 128 MB for OC-48 links, 512 MB for OC-192
- even worse without RED (Random Early Discard)
- Solution - rate based flow control algorithm
- inexpensive implementation
- drastic reduction in router memory requirements -
100x - virtually eliminates network queueing delays
- higher average link utilization
- superior end-to-end performance
8Scalability Problems in IP Multicast
- Global multicast addresses with no location info.
- attempting to apply LAN model to global network
- makes multicast join/leave unnecessarily
difficult - Quadratic complexity of many-to-many multicast.
- need separate multicast tree per sender
- single shared tree gives linear scaling
- Control protocols with poor scaling security.
- flooding protocol to announce multicast session
- excessive traffic, no private multicast sessions
9Security
- IP security philosophy
- networks are inherently insecure
- so no point in providing network support for
security - total reliance on end-to-end security
- Network service providers can improve security
and should be required to. - physical security of facilities
- prevent address spoofing and denial-of-service
attacks - tracking down hackers
- link encryption, especially on vulnerable links
- Need better tools for network administration.
- traffic tracing, security event logging
10The State Aggregation Fallacy
- Widely believed that effective scaling of IP
network requires aggregating flow state. - Why per flow state is non-problem.
- flow state only needed for reservation-oriented
traffic - memory cost for per flow state is small - .005
cents per hop - besides, theres much more memory in packet
buffers - per flow queues not needed for reservation-oriente
d traffic - signaling is straightforward - its been done for
gt50 years - Why aggregation is problematical.
- aggregation depends on packet classification with
prefix matching in two dimensions - expensive,
scales poorly - managing bandwidth at aggregate level wastes
bandwidth - and bandwidth is most precious
resource
11Scaling Up Routers
- Address lookup and packet classification at 10
Gb/s. - tree bitmap algorithm Eatherton Dittia
- pruned tuple space search Srinivasan, Suri
Varghese - Multistage interconnection networks for
multi-terabit routers. - networks that can handle arbitrary traffic
- large-scale use of gigabit serial transmission
- dynamic routing for effective handling of IP
traffic - maintaining sequence with zero errors
- Queueing and packet scheduling.
- shared queues sufficient for reserved traffic -
high priority - fair queue scheduling sufficient for best effort
- limited payoff for more elaborate algorithms
12IP Address Lookup
- Routing tables at router input ports contain
(prefix, next hop) pairs - Unicast address in packet is compared to stored
prefixes, starting with left-most bit. - Prefix that matches largest number of address
bits is desired match. - Packet is forwarded to the specified next hop.
- Next hop fields change as a result of topology
changes and traffic changes. - Set of prefixes changes infrequently.
routing table
nexthop
prefix
10
7
10
01
5
110
3
1011
5
1011
0001
0
01011
7
00010
1
001100
2
1011001
3
1011001
1011010
5
0100110
6
01001100
4
10110011
8
10110011
10011000
10
01011001
9
10110011011
13Address Lookup Using Tries
- Prefixes stored in binary trie.
- Green nodes denote prefixes.
- Search for prefix, using address bits to trace
path. - Remember most recent green node visited.
- When search ends, go back to last green node.
- Number of memory accesses proportional to address
length. - 32 in IPv4, 128 in IPv6
address 1011 0001 1000
14Multibit Tries
- Multibit tries match several bits at once.
- Next hop info stored in trie.
- Greatly speeds search at cost of more memory.
- 8 bit stride gives 4 memory accesses steps for
IPv4 - requires much more memory than trie
- optimize space usage using variable length stride
- Like standard tries, supports in-place
modification.
15Coding Subtrees with Bitmaps
- Compact form of multibit trie node
- Siblings stored contiguously.
- Separate array points to next hop info.
- with 4 bit stride
- lt8 bytes per prefix
- also, lt8 bytes per node
- 9 accesses for v4 or 7 with 9 bit starter array
- Good for hardware implementation.
address101 100 011 000
1100101
10101101
0000011
0010000
1001100
1010110
0100000
00000101
00001100
00000000
10000000
00001100
0010010
0001100
0000100
0100000
1000100
0110000
1011000
00110000
10010000
00000000
00000000
00000000
10000000
00000000
1011000
0001000
0010100
0000010
0010011
00000000
00000000
00000000
00000000
00000000
16Tuple Space Search
lt0110,1101gt
- Place filters in groups by length.
- Use hashing to lookup in group.
- Markers used to speed up search.
- Filters include pointer to best matching
sub-filter. - Worst-case number of hash lookups equals address
length. - Typical performance much better.
- Slow to update.
17Pruned Tuple Space Search
lt0110,1101gt
lt,gt
lt,1gt
lt,10gt
lt,110gt lt,011gt
lt1,gt lt0,gt
lt1,0gt
lt0,11gt lt0,10gt
lt1,011gt
Stored bit stringsidentify rows withmatching
filters
lt10,1gt
lt10,01gt lt01,11gt
lt110,gt lt011,gt
lt101,111gt
- Tuples that match in both directions are
searched. - Typically just a few hash lookups needed.
- Fast insertion removal.
Stored bit stringsidentify rows withmatching
filters
18Exponential Growth
- Rapid growth in traffic, fiber and electronics.
- Internet growth rates exceed technology growth
rates. - Public Internet traffic still surprisingly small.
- Keeping up with growth not hard now, but later?
- Near term challenge is better services to spur
growth. - Longer term - fix scaling problems, use new
technologies.
19Projected Router Capacities
- Router capacity growing with Internet.
- Scalable switch fabrics.
- Gigabit serial links.
- Optical interconnects.
- Cost breakdown.
- 35 external optics
- 20 internal optics
- 15 memory
- 20 port ASICs
- 10 fabric ASICs
- 10 Tb/s systems feasible today.
- 100 Tb/s by 2005-2007.
20Making Routers More Programmable
- Technology advances adding new functionality.
- logic capabilities growing much faster than IO
- packet classification, per flow queueing becoming
common - single chip packet processing engines with 16
processors now becoming available - Application-specific processing in routers could
become routine. - active networking is one way to exploit trend
- alternative model
- signalling and resource reservation
- packet classification and flow-specific routing
- Key challenge is supporting application
development. - Network Programming Interface (NPI) and network
resource mgmt.
21Scalable Programmable Router
SystemController
Packet Class. Queueing
FilterMemory
QueueMemory
PortProcessor
ActiveProc. Chip
22Port Processor Details
PacketClassifier
QueueController
- Active Proc. Unit
- proc. mem. ctl.
- cache DRAM
I/OController
23Active Processing Technology Projection
- Extensive flow-specific processing will be
feasible. - How to achieve greatest benefit?
24Prototype Active Router
- Control Processor
- global coordination control
- routing protocols
- build routing tables and other information needed
by ANPEs - first level code server
- reprogrammable for active processing
25Cell Processing
26Packet Processing
27Principal Data Flows Through PE Kernel
PacketClassificationand Routing
IPv4/6HeaderProcessing
Packet Flow Id
Plain Packets
IP Packets
Kernel Plugins
Active Packets
. . .
ActiveFunctionDispatcher
Driver
PacketScheduler
Driver
. . .
. . .
SAPF Packet
Selector/Dispatcher
. . .
. . .
. . .
Resource Controller
- Std. proc. for plain IP packets.
- classification routing, header processing,
output queueing
- Active packets move through configured kernel
plugins. - active function dispatcher passes packets to
instances of plugin objects - instantiates objects or triggers download of
plugin class, as needed - streamlined processing of SAPF packets using
pre-established state
28System Level Software Organization
29Towards an Open Research Router
- Modular components.
- ability to swap components - both hardware and
software - routing, signalling, management software
- address lookup and packet classification
- queueing and packet scheduling
- open, documented and straightforward interfaces
- Dynamic insertion of application-specific
processing. - active networking model and others
- High performance.
- gigabit links and scalability to large numbers of
ports - packet processing rates of at least a
million/second per link - application-specific processing on large fraction
of traffic - allows credible demonstrations needed to
influence commercial practice
30Objectives for Optical Routers/Switches
- Must compete with best electronic alternatives.
- multistage switch architectures can reach 10 Tb/s
capacities - need just 2-4 interconnection network ASICs per
OC-192 - parts cost just 10-20 per Gb/s of capacity, at
50 per ASIC - overall systems cost around 1K per Gb/s of
capacity - variable length packets, sophisticated routing,
and queue management - Requirements
- system throughput .1-10 Pb/s
- improved cost/performance 100-1K per Gb/s of
capacity - efficient handling of short (1KB) and long (1MB)
data bursts - sophisticated resource congestion management
- multicast services
31Burst Switching System Concept
32Statistical Multiplexing Performance
33Key Architectural Issues
- Separation of data path and control.
- use best electronic methods to manage resources
- keep data path simple to facilitate all-optical
implementation - Lots of channels per link for good stat. mux.
- Buffer sharing to maximize performance gains.
- Careful management of control/data timing skew.
- local timestamps and BHC offset field to track
variations - Look-ahead resource management.
- Optical parts cost is major implementation
concern. - need integrated multichannel devices
- e.g. fast multichannel wavelength
selection/conversion
34Burst Switch Architecture
routinglookup
switching queueing
- Input/Output Modules (IOM)
- Burst Switch Elements (BSE)
- Supports ndk ports with 2k-1 stages and d port
BSEs.
35Input/Output Link Module
36Look-Ahead Resource Management
Arriving Burst
37Burst Switch Element
- Bursts switched through XBAR.
- Burst Store Unit (BSU) queues waiting bursts.
- Control section
- ATM switch element
- d Burst Processors (BP)
- Burst Storage Manager (BSM)
- Each BP manages channels for 1 output.
38Possible Crossbar Implementation
- WDM crossbar with
- d inputs and outputs
- h WDM channels per link
- Each section requires
- dh optical space switch
- h wavelength selectors
- h wavelength converters(fixed output
wavelength) - ndk port system uses
- n(2k-1) dh optical switches
- nh(2k-1) l-selectors
- nh(2k-1) l-converters
- 1.3 Pb/s system with n512, h256, d8, uses 5
l-selectors per OC-192 channel. - Require lt500/l-selector.
WS
WC
d
h
h
WS
WC
d
. . .
WS
WC
d
h
h
WS
WC
WavelengthSelector
WavelengthConverter
39Horizon Scheduling
- For each channel, maintain horizon.
- Determine viable channels and select best.
- Simple hardware implementation.
- Good results when only small variations in
offsets.
40Scheduling Bursts Out of Order
- Poor results for H.S. when BHC burst arrival
order differs.
- Defer BHC processing to allow scheduling in burst
arrival order. - Uses resequencing buffer.
41Burst Storage Management
10
buffer usage
5
10
20
30
0
time
- Need good representation of projected buffer
usage. - fast lookup to determine if a new burst will fit
- incremental modification to add new burst to
schedule - cleanup to remove old data as time advances
42Basic Search Tree
43Differential Search Tree
44Prototype System Configuration
45Inserting Burst Switches into Data Nets
- Use packet classification in Edge Routers to
separate. - long packets to Burst Switches, short packets to
routers - Long packets usually part of longer data
transfers. - lets burst switch interfaces assemble larger
bursts
46Alternate Strategy for Optics in Data Networking
- Hybrid network architecture
- optical circuit switches with limited (or no)
wavelength conversion - IP routers at boundaries
- Most transit traffic switched optically.
- if average of five hops per packet, network needs
5x as many transit as local ports - replace most router ports with OXC ports
- Reconfigure inter-router paths based on traffic
demand - routing algorithm must understand both optical
and IP layers - perform global topology optimization
- Low cost optical switch
- no wavelength selectors or convertors
- Other issues
- traffic monitoring to drive routing algorithm
- switch-over management
- striping data over parallel channels
47Summary
- Short term.
- overcome deficiencies of IP
- make high bandwidth accessible and affordable
- and build real applications that use it!
- improve performance, reliability, ease-of-use
- Medium term.
- make network elements more programmable
- develop effective Network Programming Interface
(NPI) - create network resource management mechanisms
- Long term.
- look ahead to post-silicon era - photonics,
quantum comp. - develop strategies for exploiting new
technologies - find better strategies for network technology
evolution