Chisel: A Storageefficient, Collisionfree Hashbased Network Processing Architecture - PowerPoint PPT Presentation

1 / 45

About This Presentation

Title:

Chisel: A Storageefficient, Collisionfree Hashbased Network Processing Architecture

Description:

Collision-free Hashing-Scheme for LPM (Chisel) ... Figure 6 is referred to as a Chisel sub-cell. The Chisel architecture for the LPM application consists of a ... – PowerPoint PPT presentation

Number of Views:208

Avg rating:3.0/5.0

Slides: 46

Provided by: puni

Category:

more less

Transcript and Presenter's Notes

Title: Chisel: A Storageefficient, Collisionfree Hashbased Network Processing Architecture

1
Chisel A Storage-efficient, Collision-free
Hash-based Network Processing Architecture

Jahangir Hasan, Srihari Cadambi,
Venkatta Jakkula, Srimat Chakradhar,
Proceedings of the 33rd International Symposium
on Computer Architecture (ISCA06)

2
Outline

Introduction
Bloomier Filter
Chisel Architecture
Results
Conclusion

3
Introduction

There are three major families of techniques
for performing LPM
TCAM
Prohibitive cost and power dissipation
Trie-Based
Large memory requirements and long lookup
latencies
Hash-Based

4
Introduction

Hash-Based Schemes Advantage
an order-of magnitude lower power
small memory sizes
key-length-independent O(1) latencies
Hash-Based Schemes Disadvantage
incur collisions
hash functions cannot directly operate on
wildcard bits

5
Introduction

Collision-free Hashing-Scheme for LPM (Chisel)
build upon a recent collision-free hashing scheme
called Bloomier filter
provides support for wildcard bits with small
additional storage
support fast and incremental updates

6
Bloomier Filter

An extension of Bloom filters.
Support storage and retrieval of arbitrary
per-key information.
Guarantee collision-free hashing for a
constant-time lookup in the worst case.

7
Bloomier Filter

The Bloomier filter stores some function ft?f(t)
for all t.
The collection of all the keys is the key set.
The process of storing these f(t) values for all
t is called function encoding.
The process of retrieving f(t) for a given t is
called a lookup.

8
Bloomier Filter

The data structure consists of a table indexed by
k hash functions. We call this the Index Table.
The k hash values of a key are collectively
referred to as its hash neighborhood, reprented
by HN(t).
If some hash value of a key is not in the hash
neighborhood of an other key in the set, then
that value is said to be a singleton.

9
Bloomier Filter

The Index Table is set up such that for every t,
we find a location t(t) among HN(t) such that
there is a one-to-one mapping between all t and
t(t).
Because t(t) is unique for each t we can
guarantee collision-free lookups.
Let us call the hash function which hashes t to
location t(t) as the ht(t)th function.

10
Bloomier Filter

The idea is to setup the Index Table, so that a
lookup for t returns t(t).
Then we can store f(t) in a separate Result Table
at address t(t) and thereby guarantee
deterministic, collision-free lookups of
arbitrary functions.

11
Bloomier Filter

Encoding the Index Table
During encoding, once we find t(t) for a certain
t, we write V(t) from Equation 1 into the
location t(t).
Because (t) is unique for each t, we are
guaranteed that this location will not be altered
by the encoding of other keys.

12
Bloomier Filter

Encoding the Index Table
Now during a lookup for t, ht(t) can be retrieved
by a simple XOR operation of the values in all k
hash locations of t, as given in Equation 2.
We can use ht(t) in turn to obtain t(t), then
read f(t) from the location t(t) in the Result
Table.

13
Bloomier Filter
14
Bloomier Filter

The Bloomier Filter Setup Algorithm
We first hash all keys into the Index Table and
then make a single pass over the entire table to
find keys that have any singletons (locations
with no collisions).
All these keys with singletons are then pushed
onto the top of a stack.

15
Bloomier Filter

The keys are considered one by one starting from
the bottom of the stack, and removed from all of
their k hash locations in the Index Table.
The affected locations are examined for any new
singletons, which are then pushed onto the top of
the stack.

16
Bloomier Filter

This process is repeated until the Index Table
becomes empty.
The final stack, considered top to bottom,
represents an ordering G.

17
Bloomier Filter
18
Bloomier Filter
19
Bloomier Filter

G ensures that every key t has at least one
unique hash location t(t) in its hash
neighborhood.
We can now process the keys in order G, encoding
V(t) into t(t) for each t, using Equation 1.
Therefore, lookups are guaranteed obtain the
correct ht(t) values for all t using Equation 2.
The running time of the setup algorithm is O(n)
for n keys.

20
Chisel Architecture

Convergence of the Setup Algorithm
Removing False Positives
Supporting Wildcards
Incremental Updates

21
Convergence of the Setup Algorithm

At each step the setup algorithm removes some key
from the Index Table and then searches for new
singletons.
If at some step a new singleton is not found then
the algorithm fails to converge.
A Bloomier Filter with k hash functions, n keys
and an Index Table size m?kn, the probability of
setup failure P(fail) is upper bounded as follows

22
Convergence of the Setup Algorithm
23
Convergence of the Setup Algorithm

In the event that a setup failure does occur, we
remove a few problematic keys to a spillover TCAM
and resume setup.
The probability of the same setup subsequently
failing 1, 2, 3 and 4 times is 1014, 1021,
1028, and 1035.
Therefore, a small spillover TCAM (e.g., 16 to 32
entries) suffices.

24
Removing False Positives

False positive can occur when a Bloomier filter
lookup involves some key t which was not in the
set of original keys used for setup.
6 addresses such false positives by
concatenating a checksum c(t) to ht(t) and using
this concatenation in place of ht(t) in Equation
1 during setup.
The wider this checksum field the smaller the
probability of false positives (PFP).

25
Removing False Positives

A non-zero PFP means that some specific keys will
always incur false positives.
Therefore a non-zero PFP, no matter how small, is
unacceptable for LPM.

26
Removing False Positives

We propose a storage-efficient scheme to
eliminate false positives for our LPM
architecture.
The basic idea is to store in the data structure,
all original keys, and match them against the
lookup keys.

27
Removing False Positives

During setup, we encode a pointer p(t) for each t
instead of encoding ht(t).
p(t) directly points into a Result Table having n
locations.
Thus, the Index Table encoding equation (Equation
1) is modified as follows

28
Removing False Positives

During lookup, we extract p(t) from the Index
Table (using Equation 2), and read out both f(t)
and t from the location p(t) in the Result Table.
We then compare the lookup key against the value
of t.
If the two match then f(t) is a correct lookup
result, otherwise it is a false positive.

29
Removing False Positives

In order to facilitate hardware implementation,
the actual architecture uses two separate tables
to store f(t) and t, the former being the Result
Table and the latter the Filter Table.

30
Chisel Architecture
31
Supporting Wildcards

We propose a novel technique called prefix
collapsing which efficiently supports wildcard
bits.
In contrast to CPE, prefix collapsing converts a
prefix of length x into a single prefix of
shorter length xl (l?1) by replacing its l least
significant bits with wildcard bits.
The maximum number of bits collapsed is called
the stride.

32
Chisel Architecture
33
Supporting Wildcards

Each instance of Figure 6 is referred to as a
Chisel sub-cell.
The Chisel architecture for the LPM application
consists of a number of such sub-cells, one for
each of the collapsed prefix lengths l1 lj.
Prefixes having lengths between li and li1 are
stored in the sub-cell for li.

34
Supporting Wildcards

A lookup collapses the lookup-key to lengths l1
lj, and searches all sub-cells in parallel.
The results from all sub-cells are sent to a
priority encoder, which picks the result from
that matching sub-cell which corresponds to the
longest collapsed length.

35
Chisel Architecture
36
Incremental Updates

The Bloomier Filter supports only a static set of
keys.
To address this shortcoming, we equip the Chisel
architecture with extensions based on certain
heuristics, in order to support fast and
incremental updates.

37
Incremental Updates

We observe that in real update traces, 99.9 of
prefixes added by updates are such that when
those prefixes are collapsed to the appropriate
length, they become identical to some collapsed
prefix already present in the Index Table.
Therefore, we need to update only the Bit-vector
Table, and not the Index Table, for these updates.

38
Incremental Updates

we observe that in real update traces a large
fraction of updates are actually route-flaps
(i.e., a prefix is added back after being
recently removed).
We temporarily mark the prefix dirty and
temporarily retain it in the Index Table, instead
of immediately removing it.

39
Incremental Updates

We maintain a shadow copy of the data structures
in software.
When an update command is received, we first
incrementally update the shadow copy, and then
transfer then modified portions of the data
structure to the hardware engine.

40
Results

Chisel versus EBFCPE
Comparison against EBF with No Wildcards
Prefix Collapsing vs. Prefix Expansion
Chisel versus EBFCPE
Scalability
Scaling with Router Table Size
Scaling with Key Width
Power using Embedded DRAM
Updates
Comparison with Other Families
Chisel vs Tree Bitmap
Chisel vs TCAMs

41
Results
42
Results
43
Results
44
Conclusion

Based upon a recently-proposed hashing scheme
called Bloomier filter, we architected an LPM
solution and proposed a novel technique, called
prefix collapsing, for supporting wildcard bits.
We also built support for fast and incremental
updates by exploiting key characteristics found
in real update traces.

45
Conclusion

Another significant advantage of Chisel is that
it has memory requirements small enough to be
implemented on-chip using embedded DRAM.
Chisel performs only one off-chip access at the
end of a lookup, when a pointer is sent to an
off-chip next-hop table.

Write a Comment

User Comments (0)