Routing algorithms - PowerPoint PPT Presentation


PPT – Routing algorithms PowerPoint presentation | free to download - id: 791078-MzcxM


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Routing algorithms


Title: Advances in Ethernet Author: Yaakov Stein Keywords: Ethernet, 802, EFM, RPR Last modified by: yaakov_s Created Date: 12/6/2001 8:31:54 AM Document presentation ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 68
Provided by: Yaak4
Learn more at:


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Routing algorithms

  • Yaakov (J) Stein Sept 2009
  • Chief Scientist
  • RAD Data Communications

  • Control Plane
  • principles of IP routing
  • longest prefix match
  • RIBs and FIBs
  • Forwarding plane
  • classification and lookup mechanisms
  • switching fabrics

  • Introduction

  • A router is a combination of hardware, software,
    and memory
  • that is responsible for forwarding packets
    towards their destinations
  • Routers generally work at ISO layer 3 (network
  • but can also function at layer 2.5 (for MPLS)
  • and may inspect higher layers, but only for
  • (QoS management, load balancing, etc.)
  • Note that Ethernet switches technically filter
    rather than forward
  • In order to correctly fulfill their function
    (i.e. to know where to forward)
  • routers usually run routing protocols
  • to exchange information between themselves
  • Ethernet switches do not need such protocols as
    they learn how to filter
  • So the router performs 2 distinct algorithms
  • forwarding algorithm (forwarding component)
  • routing algorithm (control component)

What does a router do ?
  • Control plane (routing algorithm)
  • run routing protocols
  • identify interface and next hop L2 addresses
  • populate RIBs (if Link State, perform SPFs)
  • scan all RIBs, and produce FIB (entries map FEC
    to NH)
  • Data plane (forwarding algorithm)
  • deframing (CRC/checksum/defragmentation/reassembly
  • parsing (pulling values from appropriate fields
    simple IPv4 DA,
    complex finding URL or MIB variable)
  • FEC classification (add metadata, based on DA,
    DAToS, MPLS, )
  • lookup / search
  • packet modification and replication
  • framing
  • traffic management and queuing
  • compression, encryption, etc.

IP networks
  • IP networks are made up of
  • hosts
  • middleboxes (e.g., firewalls, NATs, NAPTs,
    Application Layer GWs)
  • routers (obsolete terminology gateways)
  • It will be useful to differentiate between
  • core routers (connect to other routers)
  • edge routers (connect to hosts)
  • We will see shortly that it is more complex than
  • To understand how a router is different from
    other network elements
  • we need to know the basic principles the IP
    protocol architecture
  • We will mainly deal with IPv4 unicast forwarding

Routing Slide 6
The basics (1)
  • The first principle of IP is the end-to-end (E2E)
  • All functionality should be implemented only with
    the knowledge
  • and help of the application at the end points
  • The second principle is the hourglass model
  • IP (l3) is the common layer
  • below IP (L3) is not part of IP suite, above is
  • Thus
  • most functionality and state is in the hosts
  • middlebox functionality is severely limited
  • routers are limited to forwarding packets
  • without extensive packet manipulation (exception
    - TTL)
  • The third principle is that forwarding is
  • connectionless
  • on a hop-by-hop basis

Routing Slide 7
The basics (2)
  • The fourth principle is that unicast IP
    forwarding is performed
  • based on a Destination Address (DA)
  • Addresses must usually be unique (end-to-end
  • Hosts usually have a single IP address, routers
    have many addresses
  • It is the responsibility of a service provider
    (SP) to allocate addresses
  • IP addresses are not arbitrary, like Ethernet MAC
  • The fifth principle is that IP addresses are
    aggregated into subnetworks
  • All addresses in a subnetwork share a common
  • Subnetworks may be further aggregated
  • The sixth principle is that it is the
    responsibility of the router
  • to forward towards the hosts subnet
  • but it is not its responsibility to deliver the
    packet on the subnet
  • the IP suite starts above L2
  • subnets L2 (e.g., Ethernet, PPP) delivers the
    packet to the host

IP Routing types
  • Distance Vector (Bellman-Ford), e.g. RIP, RIPv2,
  • send ltaddr,costgt to neighbors
  • routers maintain cost to all destinations
  • need to solve count to ? problem
  • Path Vector, e.g. BGP
  • send ltaddr,cost,pathgt to neighbors
  • similar to distance vector, but w/o count to ?
  • like distance vector has slow convergence
  • doesnt require consistent topology
  • can support hierarchical topology gt exterior
    protocol (EGP)
  • Link State, e.g. OSPF, IS-IS
  • send ltneighbor-addr,costgt to all routers
  • determine entire flat network topology (SPF -
    Dijkstras algorithm)
  • fast convergence, guaranteed loopless gt
    interior routing protocol (IGP)
  • convergence time is the time taken until all
    routers work consistently
  • before convergence is complete packets may be
    misforwarded, and there may be loops

  • Control Plane
  • (Routing)

  • Based on the 6 principles we can understand what
    a router does
  • The router looks at the packets DA
  • It deduces to which subnet the packet belongs
  • If the router can directly interface that subnet
  • it must use the appropriate L2 to send the
    packet to the host
  • Otherwise it must retrieve the next hop (router)
  • that sends the packet towards the subnet
  • The next router does the same
  • If routing has converged there will be no loops
    or black holes
  • but there may be during transients
  • The information needed by the router to properly
    forward packets
  • is stored in the Forwarding Information Base
  • The FIB associates address prefixes with Next
    Hops (NHs)
  • (and, to save an additional lookup, usually with
    L2 addresses as well)
  • Do not confuse the FIB with a Routing Information
    Base (RIB)

More on FIBs
  • Simple (and primitive) routers have a routing
  • modern large routers have several different
  • The FIB is designed to be fast to search
  • we will talk about its data structure later
  • RIBs are designed to be fast to update
  • which is quite a different structure
  • There may be many RIBs, one for each routing
    protocol running
  • and static routes may be entered into any of
  • There are sometime other databases as well
  • for example link state routing protocols
    require a LSDB
  • from which the RIB is built
  • The basic idea is that we build the FIB from the

Router interfaces
  • Routers connect to hosts and to other routers via
  • (from 1 to many thousands of interfaces per
  • Routers are responsible for forwarding packets
  • arriving at an ingress interface
  • to an egress interface
  • Interfaces have layer 3 and above properties
  • and also contain layer a and 2 properties (ports)
  • Each interface is assigned a unique IP address
  • Interfaces are grouped into subnetworks
  • All interfaces on a subnetwork share the same

Prefixes and masks
  • Since 1993 (RFC 1519 - CIDR) subnets can have any
    length prefix
  • There are two ways of specifying the prefix
  • slash notation, e.g.,
  • note unspecified bits are set to zero
  • mask notation, e.g., with mask
  • Note that means all addresses
  • from through
  • Note that it contains,, etc.
  • since they are in the range and have longer
    prefixes (larger masks)
  • /32 are fully qualified IP addresses
  • matches every IP address
  • it is the default route
  • route taken when there is no matching entry in
  • the gateway of last resort

Prefix tables
slash mask
A.B. 0.0/15
A.B. 0.0/14
A.B. 0.0/13
A.B. 0.0/12
A.B. 0.0/11
A.B. 0.0/10
A.0.0.0/7 254. 0.0.0
A.0.0.0/6 252. 0.0.0
A.0.0.0/5 248. 0.0.0
A.0.0.0/4 240. 0.0.0
A.0.0.0/3 224. 0.0.0
A.0.0.0/2 192. 0.0.0
A.0.0.0/1 128. 0.0.0
slash Mask
Note for /25 D0 or 128 for /26 D 0, 64,
128, or 192 etc.
Special IP addresses
  • Some IP addresses are reserved for special
  • they are not assigned by IANA
  • and may require special treatment by router

prefix range purpose defaults private addresses loopback addresses - zeroconf - private addresses - Documentation - IPv6-IPv4 relay - private addresses - device benchmark multicast reserved
Longest prefix match and FECs
  • To find the subnet, we need to look at the
    packets DA
  • and to find the best match for the DA that is
  • find known the FIB entry that matches the longest
    prefix of the DA (LPM)
  • All packets that are forwarded in the same way
    are grouped
  • into a Forwarding Equivalence Class (FEC)
  • In the simplest case, a FEC is simply a known IP
    address prefix
  • i.e., the packets subnet
  • In more complex cases it might get more complex
  • for example, ToS field, source address, etc.
  • Every packet has to be looked up in the FIB and
    classified to a FEC
  • and this forwarding has to be done fast

  • Assume the following FIB
  • and lets look up (last byte
  • It matches the first entry with prefix length 0
    (everything matches)
  • It matches the second with length 24 (first three
  • It matches the third entry with prefix length 25
    (last byte 1xxxxxxx)
  • It does not match the fourth entry (11xxxxxx )
  • So the packet is forwarded to next hop C

prefix next hop IP address (gateway of last resort) A B C D
Packet processing time
How much time do we have to process a packet
? disregarding L2 overhead, IPGs, etc. So
the FIB data structure has to be optimized for
fast look-up
100 Mbps 1 Gbps 10 Gbps
64 B 5 msec 500 nanosec 50 nanosec
256 B 20 msec 2 msec 200 nanosec
1500 B 120msec 12 msec 1.2 msec
Policy and autonomy
  • The FIB information is based on
  • static routes
  • dynamic routes determined by routing protocols
  • gateway of last resort (default route)
  • In general we need to apply policy, since
  • there are conflicting sources of information
  • may not want to use, or even believe, information
    received from peers
  • it is all a matter of autonomy
  • an Autonomous System can request service from
  • but can not force it to provide service

Autonomous systems
  • Routers are grouped into Autonomous Systems (ASs)
  • ASs may be grouped into domains
  • AS look to the outside world as single entity
    (they usually have an AS ID)
  • Routers in the same AS obey a common policy, and
    trust each other
  • AS are truly autonomous
  • one AS can request another to forward a packet,
    but can not force it to
  • Inside ASs we run Internal Gateway Protocols
  • e.g. OSPF, IS-IS
  • Between ASs we run External Gateway Protocols
  • e.g., BGP
  • A router that runs both is called an AS Border
    Router (ASBR)

More of the story
  • Actually, it can get a lot more complicated
  • In general, a router will be running multiple
    routing protocols
  • For example
  • one or more IGPs (RIP,OSPF, IS-IS) between
    routers in the same AS
  • internal BGP (iBGP) between routers in the same
  • (usually a full mesh, but when too complex we
    can use route reflectors)
  • external BGP (eBGP) between ASBRs in different
  • How does a router know if a BGP session is iBGP
    or eBGP ?
  • by the AS number !
  • IGP is used to find a path to another router
    (including ASBR) in the same AS
  • eBGP is used by ASBRs to learn / distribute
    routes to other ASs
  • iBGP is used for ASBR to inform core routers of
    external routes

Simplest example
  • Stub ASs (my home router)
  • single homed to outside world
  • single internal subnet, so dont need IGP
  • single homed, so dont need to run BGP to ISP
  • dont need to have an AS number
More complex example
  • Connecting to a server connected to another ISP
    with dual homing
  • Routers 3 and 6 learn from eBGP how to reach
  • Policy determines that 3 will be used (see later)
  • Router 1 learns from iBGP session that A.B.C.D is
    reachable via router 3
  • Router 1 learns from IGP that router 3 is
    reachable via router 2
  • Router 2 knows how to directly reach router 3
    because of IGP adjacency
  • Packet from a.b.c.d is forwarded via 1-2-3 to AS
    2 and to A.B.C.D

AS 2
AS 1
eBGP sessions
Full mesh of iBGP sessions
Even more complex example
  • Three ASs, with one possibly acting as a transit

AS 1 would like to hand off the traffic to AS
2 AS 2 has no economic incentive to carry this
traffic But AS 2 gets the route from AS3 What can
AS 2 do to stop this ? (remember autonomy!) What
can AS 1 then do ?
Rules for customer ASs
  • Stub AS
  • Single-homed AS does not need to learn routes
    from provider
  • It only has to send all traffic via its unique
    exit point (
  • Provider gets routes from static or IGP or
    private-AS eBGP
  • Multihomed Nontransit AS
  • AS advertises only its own routes to both SPs
  • AS filters out traffic for foreign routes that
    reach it via static/default routing
  • eBGP is not needed, but recommended for route
    propagation and filtering
  • Multihomed Transit AS
  • Uses eBGP to SPs and iBGP for transit traffic

BGP rules
  • There are
  • internal routes
  • external routes
  • customer routes
  • When eBGP learns a route it is repeated via iBGP
    to all others in AS
  • thus all routers in AS learn it
  • When iBGP learns a route it is repeated only to
    externals via eBGP
  • since internals also get it directly
  • When there is another ASBR that can reach the
    same other AS
  • a second route is repeated by iBGP to the ASBR
  • The ASBR will then make the decision as to which
    to use !

IGP rules
  • IGPs are used between routers in the same AS
  • so IGPs do not have sophisticated policy control
  • routers usually blindly accept all information
  • For proper operation (no routing loops)
  • all routers in AS must have the same IGP RIB
  • for link state protocols (OSPF, IS-IS)
  • there is a Link State Data Base (LSDB)
  • from which IGP RIBs can be constructed (will be
    explained shortly)
  • Because all routers have the same LSDB
  • Although the forwarding is hop-by-hop
  • the result is the same as if there were
  • IGPs
  • do not scale to inifinity
  • require complete knowledge
  • are not suitable for interaction with
    non-trusted routers
  • since a single misconfiguration can be fatal

  • We said before that LS routing protocols have
    another database
  • LSDB contains representation of every router and
    link in the AS
  • implicitly holding the complete topology of the
  • In addition, the LSDB associate costs (metric)
    with every link
  • RIP - the metric is always hop count, no
    non-trivial metric
  • OSPF - the metric is more general, for example
    link length
  • These costs form a matrix
  • The topology is symmetric, but the costs need not

From \ to Router A Router B Router C
Router A M(A,B) M(A,C)
Router B M(B,A) M(B,C)
Router C M(C,A) M(C,B)
  • Each router can independently calculate the
    least-cost path
  • to every other router in the AS
  • A Shortest Path First (SPF) algorithm (e.g.,
    Dijkstras algorithm)
  • is used to compute a tree of the shortest paths
    to all destinations
  • Each route in the SPF tree is an entire path
  • but for each router we can extract the next hop
  • and build the RIB for that router (each router
    has its own RIB)
  • From the RIBs we build the FIB needed for
    efficient forwarding of packets

Graph search algorithms
  • There are many algorithms for search on graphs
  • Breadth first
  • Bellman-Ford
  • Iterative deepening
  • Depth first (backtracking)
  • Depth limited
  • Best first
  • Greedy algorithms
  • Dijkstras algorithm
  • Beam search
  • A
  • B
  • etc. etc.
  • We will discuss graphs, trees, etc. later on

Dijkstras algorithm
  • Graph search algorithm first described by Edsger
    Dijkstra in 1959
  • It assumes additive, non-negative, costs for each
    link in graph
  • It is a best-first greedy algorithm
  • Think of a city street map
  • We want to from initial intersection to
    destination one with the least walking
  • Start at the initial intersection its distance
    is zero
  • Measure and label the distances to all adjacent
    intersections (breadth first)
  • Choose the closest one (this is the greedy step)
  • Consider all the neighbors of the chosen
  • If the distance (sum of the distance to the
    chosen intersection and the distance from chosen
    intersection to neighbor) is the shortest known
    way to get to that neighbor
  • then remember that distance (not a tree!)
  • Once you have considered all neighbors of the
  • mark the chosen intersection as visited (its
    distance is now known)
  • Choose the unvisited intersection with shortest
  • Continue until all intersections have been visited

Dijkstras algorithm - formal
  • Let's call the node we are starting with an
    initial node.
  • The COST of node X will be the distance from the
    initial node,
  • i.e., the sum of distances of all links along the
    path from the initial node to X
  • Initialization
  • Set initial nodes COST to zero, all others to
  • Set initial node as current, all other as
  • Main step
  • For all unvisited neighbours of current node
  • Calculate their distances from the initial node
  • COST(neighbor) COST(current) DIST(current
    to neighbor)
  • If this is less than what is presently marked,
    overwrite the marking
  • When all neighbors have been considered, mark
    current node as visited
  • (once visited, this nodes COST is final)
  • Recurse
  • Select the unvisited node with the smallest COST
    as current node
  • Go to Main step

Implementation issues
  • If our graph has N nodes and L links
  • In a straightforward implementation of Dijkstras
  • finding the unvisited node with lowest cost takes
  • this is done N times
  • so the total computation for this is O(N2)
  • the computation of the distances to every node
    takes O(L)
  • since each link is followed once (to the node it
    lands on)
  • So the total complexity is O(N2 L) O(N2)
  • By using more sophisticated data structures
    (Fibonacci heap)
  • this can be reduced to O(N log N L) O(N log

  • So we are finally ready to see how the FIB is
  • First rejection rules are applied, for example
  • do not accept routes from ASs without agreements
  • do not accept routes that loop
  • (e.g., BGP advertisements with AS number in the
  • Then install FIB entries according to policy, for
  • first install Static routes
  • then routes from IGP RIB
  • choose eBGP before iBGP
    (hot potato
    rule- get it out of my network - let someone else
  • if there are different routes from BGP
    the route with highest local preference
  • if routes have equal local preference
    choose the
    route with the shortest AS-PATH
  • if routes have equal AS-PATHs
    choose the route
    with the lowest origin number
  • if still equal choose highest BGP peer address

  • Not everything received is accepted for inclusion
    in the FIB
  • Not everything accepted for inclusion in the FIB
    is further advertised
  • Never advertise information not accepted to FIB !

  • Data Plane (Forwarding)

  • Now that we have a FIB installed, lets forward
    some packets !
  • There are main steps to forwarding
  • classification
  • switching
  • misc. (scheduling, queuing, QoS, compression,
    encryption, )
  • The operations
  • may be stateless vs. statefull
  • may change over time
  • may involve peeking into the packet (Deep Packet
  • may be recursive
  • Packet forwarding can be done by SW or HW or some
  • We normally differentiate between
  • the fast path (simple forwarding)
  • the slow path (control protocol packets)

Lookup and data structures
  • Lookup comes in several varieties, such as
  • Exact match (e.g., MAC addresses, VLANs, IP
  • Longest Prefix Matching (LPM) (needed for
    searching FIBs)
  • Range matching (e.g., ports, firewalls)
  • In order to optimize lookup, we use appropriate
    data structures
  • Wirths law Programs algorithms
    data structures
  • We need to perform the following on our data
  • Insert
  • Delete
  • Modify
  • Search
  • and to check the following metrics
  • Time complexity for each of the above
  • Size (spatial) efficiency
  • Scalability

LookUp Table
  • The simplest (and fastest) data structure is the
    LookUp Table (LUT)
  • AKA indexed array, Location Addressable Memory
  • The incoming address is used as an index to
    access the (NH) information
  • We can put in more info, e.g., L2 type and
    address, to save further lookups
  • Example
  • Limitations
  • only for exact match, not LPM
  • limitation only for small number of possible
    addresses, e.g. VLANs
  • We can use LUTs after other data structures that
    return a key as an index

address interface NH L2 type L2 NH address
0 1 Ethernet 00-17-42-F7-14-14
1 2 PPP -
Hash tables
  • Hash tables enable handling a large number of
    potential addresses
  • A hashing function is a function
  • from a large variable (large number of bits or
    large number of bytes)
  • to a small variable (small number of bits or
  • which is white (small changes in the input create
    widely different outputs)
  • Hashing long addresses returns a short index
  • The problem is that the hashing function is not
  • so there will always be the probability of hash
    clashes (collisions)
  • Solutions
  • perfect hashing only when addresses are known
    ahead of time
  • index in table returns list of all addresses
  • multiple hashing linear probing, quadratic
  • Hashing is very good for exact match (e.g. MAC
  • but is not suitable for LPM

Hash implementation
  • From an efficiency point of view
  • hash tables are between LUTs and search tables
    (to be discussed next)
  • To control collisions, we need a relatively large
    table (birthday paradox !)
  • Example
  • 99 probability of collision
  • when 3000 entries are put into a hash table of
    size 1 million
  • Using multiple hashing
  • average computational load is O(1
  • A primitive hash function is the modulo hash
  • H(key) addr mod table-size
  • table-size should be prime must not be a power
    of 2 or close
  • A better hash function is (for appropriate
    integer m and fraction f)
  • H(key) Trunc m Frac( f addr)
  • The quality of the hashing function depends on f
  • For Fibonacci hashing f 1 / ?
  • ? is the golden ratio ½ (v5 2) 1.618

23 people ½ 57 people 99
Search table
  • The most spatially efficient data structure is
    the search table
  • It is similar to a LUT
  • but the addresses being looked up are not
  • rather, we need to sequentially search the table
    for the address
  • We can reduce the search time by ordering the
  • Search tables are good for LPM !
  • For example
  • Order FIB from most specific (longest prefix) to
    least (
  • Loop through FIB list until find the first match

row prefix interface NH
0 1
1 2
2 2
3 3
Limitations of search tables
  • Search tables are not limited in size like LUTs
  • but this comes at a price
  • it is expensive to search for an address
  • it is expensive to modify the information for an
  • it is very expensive to insert or delete
    addresses (copy!)
  • Search table FIBs can be rebuilt from RIBs each
  • For exact match it is possible to speed up by
    binary search
  • order by address
  • guess position in middle of table range where key
    should be
  • choose new range by comparison

Linked lists
  • Search tables are hard to maintain
  • if we need to insert or delete an element we
    need extensive moving
  • Linked lists are designed to simplify mechanics
    of such updates
  • still need exhaustive search to find where to
  • Linked lists can be singly or doubly linked
  • Skip lists increase efficiency by enabling
    skipping over ranges

Linked list implementation
  • If we already know where to insert or what to
    delete or whom to modify
  • then linked lists are very efficient
  • However, search is O(N) where N is the number of
  • worst case N
  • average case N / 2
  • Properly constructed multi-level skip lists take
    O(log N) on average
  • but are still O(N) in the worst case
  • But if we already need to use double pointers
  • then trees are better

Tree structures
  • A graph is a collection of nodes and edges
    connecting them
  • In a directed graph the edges have direction
  • and so or every edge there is a father node and
    a child node
  • Nodes without children are called leaves
  • A forest is a directed graph without loops (only
    one path between 2 nodes)
  • A tree is a forest with a single root -- it thus
    defines a partial order
  • (it is conventional to draw the tree upsidedown)
  • A binary tree has no more than 2 output edges per
  • Trees can be implemented using arrays, pointers
    (like linked lists), heaps, etc.

Search trees
  • Search trees may store data
  • in their nodes
  • in leaves only
  • on edges
  • combinations of the above
  • For general search trees searching can be
    breadth-first or depth-first
  • Breadth-first
  • start at the root
  • find all children of root and check for desired
  • if not found, find all childrens children
  • recurse
  • Depth first
  • start at the root
  • find first child and check for desired data
  • if not found, find first child of first child
  • recurse until data found or leaf
  • if leaf backtrack to father and try the next

Binary trees
  • Binary Search Trees simplify storage and
  • We call the children of a node L and R
  • Perfect binary trees have exactly 2 children for
    each internal node
  • Thus there are exactly 2H leaves, where H is the
    height of the tree
  • Altogether there are (2H1 1) nodes
  • To store keys in a binary search tree
  • place keys in all nodes
  • for every intermediate node N
  • key(L) lt key(N)
  • key(R) gt key(N)
  • To search for a key in a BST
  • start at root (set current node to root)
  • check if key is stored in current node
  • if not if key lt key(N) then set current to L
    else set current to R
  • In the worst case this takes only H comparisons
    (for perfect trees O(log N))

Balanced binary trees
  • Since the search complexity is proportional to
    the BSTs height H
  • we would like to use BSTs with minimal height
  • Balanced BSTs are not perfect BSTs, but as close
    as possible to perfect
  • It is more complex to build a balanced BST, but
    faster to search

  • From retrieve, but usually pronounced try
  • A trie is an ordered tree with
  • subvalues on edges
  • values at leaves
  • Tries are related to prefix search
  • Tries were originally developed for exact match
  • Tries can be binary or not
  • Tries are good for all lexicographical searches
    O(log n)
  • but particularly efficient for LPM
  • Trie variants are the fastest known lookups -
    O(log log n)
  • LC-trie used in modern Linux router

Example trie
  • Assume the following FIB
  • Note
  • in the plain trie,
  • all IP prefixes have the same length

Example compact trie
  • Same FIB
  • Now maximum length is 2
  • We could also save memory by exploiting common

More trie variants
  • Patricia trie
  • Practical Algorithm to Retrieve Information Coded
    in Alphanumeric
  • Binary trie which store in nodes the number of
  • to skip before next decision point
  • For LPM store prefixes in internal nodes
  • Bucket tries
  • Store multiple keys in leaves
  • Multibit tries (M-tries)
  • Fewer branching decisions than binary tries
  • Multibit strides (usually variable strides)
  • Level-compressed tries (LC-tries)
  • Replace perfect subtrees in binary trie with
    single degree 2k nodes
  • LPC trie level and path compressed
  • And there are many more (Lulea, full-tree, )

Example LC-trie
  • binary trie

Content Addressable Memory
  • CAM (AKA associative memory)
  • Addressable by content, rather than by location
  • Special purpose hardware
  • Fastest possible lookup (essentially searches
    entire table in one clock)
  • but limited in size
  • usually drives regular memory for additional
  • Binary CAM (BCAM) stores 0 or 1 in each bit
  • Ternary CAM (TCAM) allows wildcards
  • Can be used for LPM
  • Can prioritize solution by
  • number of bits matched
  • order in table
  • CAM technology today
  • 32 to 144 bit keys
  • 128K 512 K memories
  • hundreds of millions searches per second

Example 3 bit BCAM
encode output to access LAM
Other uses of lookup
  • Deep Packet Inspection
  • URL lookup (often partial or with wildcards)
  • can use Trees and tries
  • XML information, patterns, etc.
  • multiple encapsulations
  • e.g. Ethernet in IP, MPLS over IP, etc.
  • there are special-purpose languages to describe
    such cases
  • Firewalls, Access Control
  • source/destination IP addresses
    source/destination ports 4-tuples
  • TCP/UDP ports in ranges

  • Once the packet has been classified we need to
    properly forward it
  • The classifier result (the FEC) becomes a
    metadata field
  • that controls the switch
  • A switch fabric is combination of HW and SW
  • that enable moving the packet from input
    interface to output interface
  • The metaphor is borrowed from woven material
  • At low packets per second (pps) switching can be
    all software
  • At high speeds highly parallel hardware is needed

  • The major design constraint in switches is
  • Blocking occurs when some resource is being used
  • and the present packet can not be processed
  • We differentiate three types of blocking
  • Output blocking (the egress port is in use)
  • Internal blocking (some internal resource of the
    switch is in use)
  • Input (head of line) blocking (packet can not
    enter the switch)
  • A switch which is designed such that there is
    never internal blocking
  • is called a nonblocking switch
  • To alleviate blocking we can add buffers to store
    the waiting packets
  • According to the type of blocking we wish to
    avoid we have
  • output queues
  • internal buffers
  • input queues

Switch types
  • In the TDM days there were two switching
    mechanism types
  • time domain switches (time slot interchange)
  • spatial domain switches
  • For packet networks there are no time slots
  • but time domain switching can still be used
  • shared memory switches (packets from all inputs
    placed in single memory)
  • shared bus switches (all inputs and outputs share
    a ring/hypercube, etc. )
  • In spatial switching
  • the packets destined for different output
  • travel different internal paths
  • The simplest spatial switch is the crossbar
  • invented for analog telephone circuits
  • does not scale well O(N2) possible connections
  • Multilayer crossbars can scale better

Shared memory/bus
  • Shared memory switches are simple and cheap to
  • The memory is the heart of the fabric
  • it determines the throughput and delay
  • The memory must run N times faster than the
    ingress rates
  • Packets may be divided into frame buffers
  • Scalability can be achieved by parallelism (bit
    or byte slicing)
  • The architecture breaks down at high speeds
  • Shared bus switches are similar in principle to
    shared memory ones
  • The shared medium is the heart of the switch
  • and determines its characteristics
  • Scalability can be achieved by parallelism
    (parallel buses)
  • The architecture breaks down at high speeds

Multistage crossbars
  • Crossbars are fast, but
  • are not nonblocking
  • do not scale
  • These problems are solved by multistage crossbars
  • The only blocking of a single stage crossbar is
    output blocking
  • more than one packet needs to leave the egress
  • Time-space switches solve this by sorting the
    frames before switching
  • More complex architectures are time-space-time
  • There are many multistage solutions to the
    scaling problem

Clos network
  • The Clos network is more efficient than a single
    NN crossbar
  • We divide N into r groups of n N n r
  • It has 3 stages of crossbars
  • the first stage has r groups of nn crossbars
  • the second stage has n groups of rr crossbars
  • the third stage has r groups of nn crossbars
  • The shuffle rule Xth output of Yth crossbar
    connects to Yth input of Xth crossbar
  • Instead of N2 n2r2 connections
  • just r2 n 2 r n2
  • For example
  • if r n vN
  • then instead of n4 just 3n 3
  • a reduction by 3/n 3 / vN

Benes network
  • The Benes network contains only 22 crossbars
  • and is built recursively
  • For N 2n, the network has 2n-1 stages and 4 N
    log (N-1/2) connections
  • The shuffle rule is that 1st outputs go to the
    1st block, 2nd to the 2nd

Banyan network
  • The Banyan network is a binary tree of 22
  • with exactly one path from each ingress
    interface to each egress interface
  • At each stage the next bit of the identifier
    determines the switching
  • Banyan switches are time and space efficient
  • O(log N) delay
  • ½ N log N 22 crossbars
  • The main problem is internal blocking at any of
    the crossbars
  • One way to relieve this is to speed up the
    crossbars and to add buffers

Batchers sort
  • To prevent internal blocking in a Banyan switch
  • we can first sort the frames according to egress
  • When this is done with Batchers parallel sorter
  • ½ log N ( log N 1) stages of simple sorters
  • we get the Batcher-Banyan switch