Longest%20Prefix%20Matching%20Trie-based%20Techniques - PowerPoint PPT Presentation

About This Presentation

Title:

Longest%20Prefix%20Matching%20Trie-based%20Techniques

Description:

Complete lookup in minimum-sized (40-byte) packet transmission time. OC-768 ... Else number of array locations in root i C(Ti), where Ti's are children of T ... – PowerPoint PPT presentation

Number of Views:289

Avg rating:3.0/5.0

Slides: 50

Provided by: kenca7

Learn more at: http://protocols.netlab.uky.edu

Category:

more less

Transcript and Presenter's Notes

Title: Longest%20Prefix%20Matching%20Trie-based%20Techniques

1
Longest Prefix MatchingTrie-based Techniques

CS 685 Network Algorithmics
Spring 2006

2
The Problem

Given
Database of prefixes with associated next hops,
say
1000101? 128.44.2.3
01101100 ? 4.33.2.1
10001 ? 124.33.55.12
10 ? 151.63.10.111
01 ? 4.33.2.1
1000100101 ? 128.44.2.3
Destination IP address, e.g. 120.16.8.211
Find the longest matching prefix and its next
hop

3
Constraints

Handle 150,000 prefixes in database
Complete lookup in minimum-sized (40-byte) packet
transmission time
OC-768 (40 Gbps) 8 nsec
High degree of multiplexingpackets from 250,000
flows interleaved
Database updated every few milliseconds
? performance ? number of memory accesses

4
Basic ("Unibit") Trie Approach

Recursive data structure (a tree)
Nodes represent prefixes in the database
Root corresponds to prefix of length zero
Node for prefix x has three fields
0 branch pointer to node for prefix x0 (if
present)
1 branch pointer to node for prefix x1 (if
present)
Next hop info for x (if present)

Example Database a 0 ? x b 01000 ? y c
011 ? z d 1 ? w e 100 ? u f 1100 ?
z g 1101 ? u h 1110 ? z i 1111 ? x
5
a 0 ? x b 01000 ? y c 011 ? z d 1 ?
w e 100 ? u f 1100 ? z g 1101 ? u h
1110 ? z i 1111 ? x
0
1
6
Trie Search Algorithm

typedef struct foo
struct foo trie_0, trie_1
NEXTHOPINFO trie_info
TRIENODE
NEXTHOPINFO best NULL
TRIENODE np root
unsigned int bit 0x80000000

while (np ! NULL) if (np-gttrie_info) best
np-gttrie_info // check next bit if
(addrbit) np np-gttrie_1 else np
np-gttrie_0 bit gtgt 1 return best
7
Conserving Space

Sparse database ? wasted space
Long chains of trie nodes with only one non-NULL
pointer
Solution handle "one-way" branches with special
nodes
encode the bits corresponding to the missing
nodes using text strings

8
a 0 ? x b 01000 ? y c 011 ? z d 1 ?
w e 100 ? u f 1100 ? z g 1101 ? u h
1110 ? z i 1111 ? x
0
1
9
a 0 ? x b 01000 ? y c 011 ? z d 1 ?
w e 100 ? u f 1100 ? z g 1101 ? u h
1110 ? z i 1111 ? x
0
1
00
10
Bigger Issue Slow!

Computing one bit at a time is too slow
Worst-case one memory access per bit (32
accesses!)
Solution compute n bits at a time
n stride length
Use n-bit chunks of addresses as index into array
in each trie node
How to handle prefixes which are not a multiple
of n in length?
Extend them, replicate entries as needed
E.g. n3, 1 becomes 100, 101, 110, 111

11
Extending Prefixes
Original Database a 0 ? x b 01000 ? y c
011 ? z d 1 ? w e 100 ? w f 1100 ?
z g 1101 ? u h 1110 ? z i 1111 ? x
Expanded Database a0 00 ? x a1 01 ? x b0
010000 ? y b1 010001 ? y c0 0110 ? z c1
0111? z d0 10 ? w d1 11 ? w e0 1000 ?
u e1 1001 ? u f 1100 ? z g 1101 ? u h
1110 ? z i 1111 ? x
Example stride length2
12
Expanded Database a0 00 ? x a1 01 ? x b0
010000 ? y b1 010001 ? y c0 0110 ? z c1
0111? z d0 10 ? w d1 11 ? w e0 1000 ?
w e1 1001 ? w f 1100 ? z g 1101 ? u h
1110 ? z i 1111 ? x
Total cost 40 pointers (22 null) Max memory
accesses 3
13
a 0 ? x b 01000 ? y c 011 ? z d 1 ?
w e 100 ? u f 1100 ? z g 1101 ? u h
1110 ? z i 1111 ? x
0
1
00
Total cost 46 pointers (21 null) Max memory
accesses 5
14
Choosing Fixed Stride Lengths

We are trading space for time
Larger stride length ? fewer memory accesses
Larger stride length ? more wasted space
Use the largest stride length that will fit in
memory and complete required accesses within the
time budget

15
Updating

Insertion
Keep a unibit version of the trie, with each node
labeled with longest matching prefix and its
length
To insert P, search for P, remembering last node,
until
Null pointer (not present), or
Reach the last stride in P
Expand P as needed to match stride length
Overwrite any existing entries with length less
than P's
Deletion is similar
Find entry for prefix to be deleted
Remove its entry (from unibit copy also!)
Expand any entries that were "covered" by the
deleted prefix

16
Variable Stride Lengths

It is not necessary that every node have the same
stride length
Reduce waste by allowing stride length to vary
per node
Actual stride length encoded in pointer to the
trie node
Nodes with fewer used pointers can have smaller
stride lengths

17
Expanded Database a0 00 ? x a1 01 ? x b
01000 ? y c0 0110 ? z c1 0111? z d0 10
? w d1 11 ? w e 100 ? w f 1100 ? z g
1101 ? u h 1110 ? z i 1111 ? x
1 bit
2 bits
2 bits
1 bit
u
0
1
Total waste 16 pointers Max memory accesses
3 Note encoding stride length costs 2
bits/pointer
18
Calculating Stride Lengths

How to pick stride lengths?
We have two variables to play with height and
stride length
Trie height determines lookup speed ? set max
height first
Call it h
Then choose strides to minimize storage
Define cost of trie T, C(T)
If T is a single node, then number of array
locations in the node
Else number of array locations in root ?i
C(Ti), where Ti's are children of T
Straightforward recursive solution
Root stride s results in y2s subtries T1, ... Ty
For each possible s, recursively compute optimal
strides for C(Ti)'s using height limit h-1
Choose root stride s to minimize total cost (2s
?i C(Ti))

19
Calculating Stride Lengths

Problem Expensive, repeated subproblems
Solution (Srinivasan Varghese)
Dynamic programming
Observe that each subtree of a variable-stride
trie contains the set of prefixes as some subtree
of the original unibit trie
For each node of the unibit trie, compute optimal
stride and cost for that stride
Start at bottom (height 1), work up
Determine optimal grouping of leaves in subtree
Given subtree optimal costs, compute parent
optimal cost
This results in optimal stride length selections
for the given set of prefixes

20
0
1
00
21
Alternative Method Level Compression

LC-trie (Nilsson Karlsson '98) is a
variable-stride trie with no empty entries in
trie nodes
Procedure
Select largest root stride that allows no empty
entries
Do this recursively down through the tree
Disadvantage cannot control height precisely

22
Stride 1
Stride 1
0
1
00
Stride 1
Stride 2
23
Performance Comparisons

MAE-East database (1997 snapshot)
40K prefixes
"Unoptimized" multibit trie 2003 KB
Optimal fixed-stride 737 KB, computed in 1 msec
Height limit 4 (? 1 Gbps wire speed _at_ 80
nsec/access)
Optimized (SV) variable-stride 423 KB, computed
in 1.6 sec, Height limit 4
LC-compressed
Height 7
700 KB

24
Lulea Compressed Tries

Goals
Minimize number of memory accesses
Aggressively compress trie
Goal so it can fit in SRAM (or even cache)
Three-level trie with strides of 16, 8, 8
8 mem accesses typical
Main Techniques
Leaf-pushing
Eliminate duplicate pointers from trie node
arrays
Efficient bit-counting using precomputation for
large bitmaps
Use of indices instead of full pointers for
next-hop info

25
1. Leaf-Pushing

In general, a trie node entry has associated
A pointer to a next trie node
A prefix (i.e. pointer to next-hop info)
Or both, or neither
Observation we don't need to know about a prefix
pointer along the way until we reach a leaf
So "push" prefix pointers down to leaves
Keep only one set of pointers per node

26
Leaf-Pushing the Concept
Prefixes
27
Expanded Database a0 00 ? x a1 01 ? x b0
010000 ? y b1 010001 ? y c0 0110 ? z c1
0111? z d0 10 ? w d1 11 ? w e0 1000 ?
u e1 1001 ? u f 1100 ? z g 1101 ? u h
1110 ? z i 1111 ? x
Before
Cost 40 pointers (22 wasted)
28
2. Removing Duplicate Pointers

Leaf-pushing results in many consecutive
duplicate pointers
Would like to remove redundancy and store only
one copy in each node
Problem now we can't directly index into array
using address bits
Example k2, bits 01 index 1 needs to be
converted to index 0 somehow

29
2. Removing Duplicate Pointers

Solution Add a bitmap one bit per original
entry
1 indicates new value
0 indicates duplicate of previous value
To convert index i, count 1's up to position i in
the bitmap, and subtract 1
Example old index 1 ? new index 0
old index 2 ? new index 1

u
00
u
01
w
10
w
11
30
Bitmap for Duplicate Elimination
Prefixes
10000000000010001000010000000000000000001000000000
01000000000000100010000001000000110000000000000000
00
31
3. Efficient Bit-Counting

Lulea first-level 16-bit stride ? 64K entries
Impractical to count bits up to, say, entry 34578
on the fly!
Solution Precompute (P2a)
Divide bitmap into chunks (say, 64 bits each)
Store the number of 1 bits in each chunk in an
array B
Compute 1 bits up to bit k by
chunkNum k gtgt 6
posInChunk k 0x3f // k mod 64
numOnes BchunkNum count1sInChunk(chunkNum,po
sInChunk) 1

32
Bit-Counting Precomputation Example
index 35
Chunk Size 8 bits
1
0
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
1
1
0
0
0
0
0
0
0
1
0
1
0
1
0
3
3
6
7
9
Converted index 7 2 1 8
Cost 2 memory accesses (maybe less)
33
4. Efficient Pointer Representation

Observation the number of different next-hop
pointers is limited
Each corresponds to an immediate neighbor of the
router
Most routers have at most a few dozen neighbors
In some cases a router might have a few hundred
distinct next hops, even a thousand
Apply P7 avoid unnecessary generality
Only a few bits (say 8-12) needed to distinguish
between actual next-hop possibilities
Store indices into table of next-hops info
E.g., to support up to 1024 next hops 10 bits
40K prefixes ? 40K pointers ? 160KB _at_ 32 bits,
vs 50KB _at_ 10 bits

34
Other Lulea Tricks

First level of trie uses two levels of
bit-counting array
First counts bits before the 64-bit chunk
Second counts bits in the 16-bit word within
chunk
Second- and third-level trie nodes are laid out
differently depending on number of pointers in
them
Each node has 256 entries
Categorized by number of pointers
1-8 "sparse" store 8-bit indices 8 16-bit
pointers (24B)
9-64 "dense" like first level, but only one
bit-counting array (only six bits of count
needed)
65-256 "very dense" like first level, with two
bit-counting arrays 4 64-bit chunks, 16 16-bit
words

35
Lulea Performance Results

1997 MAE-East database
32K entries, 58K leaves, 56 different next hops
Resulting Trie size 160KB
Build time 99 msec
Almost all lookups took lt 100 clock cycles
(333MHz Pentium)

36
Trie Bitmap(Eatherton, Dittia Varghese)

Goal storage, speed comparable to Lulea plus
fast insertion
Main culprit in slow insertion is leaf-pushing
So get rid of leaf-pushing
Go back to storing node and prefix pointers
explicitly
Use the same compression bitmap trick on both
lists
Store next-hop information separately, only
retrieve at the end
Like leaf-pushing, only in the control plane!
Use smaller strides to limit memory accesses to
one per trie node (Lulea requires at least two)

37
Storing Prefixes Explicitly

To avoid expansion/leaf pushing, we have to store
prefixes in the node explicitly
There are 2k1 1 possible prefixes of length k
Store list of (unique) next hop pointers for each
prefix covered by this node
Use same bitmap/bit counting technique as Lulea
to find pointer index
Keep trie nodes small (stride 4 or less), exploit
hardware (P5) to do prefix matching, bit counting

38
Example Root node, stride 3
0

a 0 ? x b 01000 ? y c 011 ? z d 1 ?
w e 100 ? u f 1100 ? z g 1101 ? u h
1110 ? z i 1111 ? x
1
0
000
0
x
1
1
001
0
w
0
00
010
1
z
0
01
011
0
u
0
10
100
0
0
11
101
0
0
000
110
1
0
001
111
0
0
010
1
011
1
100
0
101
to child nodes
0
110
0
111
39
Tree Bitmap Results

Insertions are as in simple multibit tries
May cause complete revamp of trie node, but that
requires only one memory allocation
Performance comparable to Lulea, but insertion
much faster

40
A Different Lookup Paradigm

Can we use binary search to do longest-prefix
lookups?
Observe that each prefix corresponds to a range
of addresses
E.g. 204.198.76.0/24 covers the range
204.198.76.0 204.198.76.255
Each prefix has two range endpoints
N disjoint prefixes divide the entire space into
2N1 disjoint segments
By sorting range endpoints, and comparing to
address, can determine exact prefix match

41
Prefixes as Ranges
42
Binary Search on Ranges

Store 2N endpoints in sorted order
Including the full address range for
Store two pointers for each entry
"gt" entry next-hop info for addresses strictly
greater than that value
"" entry next-hop info for addresses equal to
that value

43
Example 6-bit addresses
Example Database a 0 ? x b 01000 ? y c
011 ? z d 1 ? w e 100 ? u f 1100 ?
z g 1101 ? u h 1110 ? z i 1111 ? x
a 000000-011111 ? x b 010000-010001 ? y c
011000-011111 ? z d 100000-111111 ? w e
100000-100111 ? u f 110000-110011 ? z g
110100-110111 ? u h 111000-111011 ? z i
111100-111111 ? x
44
Range Binary Search Results

N prefixes can be searched in log2 N 1 steps
Slow compared to multibit tries
Insertion can also be expensive
Memory-expensive
Requires 2 full-size entries per prefix
40K prefixes, 32-bit addresses 320KB, not
counting next-hop info
Advantage no patent restrictions!

45
Binary Search on Prefix LengthsWaldvogel, et al

For same-length prefixes, a hash table gives fast
comparisons
But linear search on prefix lengths is too
expensive
Can we do a faster (binary) search on prefix
lengths?
Challenge how do we know whether to move "up" or
"down" in length on failure?
Solution include extra information to indicate
presence of a longer prefix that might match
These are called marker entries
Each marker entry also contains best-matching
prefix for that node
When searching, remember best-matching prefix
when going "up" because of a marker, in case of
later failure

46
Example Binary Search on Prefix Length
Prefix Lengths 1, 3, 4, 7
Example Database a 0 ? x b 01000 ? y c
011 ? z d 1 ? w e 100 ? u f 1100 ?
z g 1101 ? u h 1110 ? z i 1111 ? x
0
1
length 1
BMP
a,x
d,w
length 3
011
100
110M
111M
010M
BMP
c,z
e,u
d,w
d,w
a,x
length 4
1100
1101
1110
1111
0100M
BMP
f,z
g,u
h,z
i,x
a,x
length 5
01000
BMP
b,y
Example Search for address 011000 and 101000
47
Binary Search on Prefix Length Performance

Worst-case number of hash-table accesses 5
However, most prefixes are 16 or 24 bits
Arrange hash tables so these are handled in one
or two accesses
This technique is very scalable for larger
address lengths (e.g. 128 bits for IPv6)
Unibit Trie for IPv6 128 accesses!

48
Memory Allocation for Compressed Schemes

Problem when using a compressed scheme (like
Lulea), trie nodes are kept at minimal size
If a node grows (changes size), it must be
reallocated and copied over
As we have discussed, memory allocators can
perform very badly
Assume M is the size of the largest possible
request
Cannot guarantee more than 1/log2 M of memory
will be used!
E.g. if M32, 20 is max guaranteed utilization
Router vendors cannot claim to support large
databases

49
Memory Allocation for Compressed Schemes