Title: A LowPower Network Search Engine Based on Statistical Partitioning
1A Low-Power Network Search Engine Based on
Statistical Partitioning
- Taskin Kocak and Faysal Basci
- Dept. of Electrical and Computer Engineering
- University of Central Florida, Orlando, FL 32816
- tkocak_at_cpe.ucf.edu
2Introduction
- Stringent memory access and search speed
requirements are two of the main bottlenecks in
wire speed packet processing. - Most viable search engines are implemented in
Content Addressable Memories (CAM). - In networking applications, CAMs are generally
used in VLAN - lookup tables (layer-2), packet forwarding lookup
tables (layer-3), session lookup table (layer-4)
and filtering (layers 5-7) - CAMs have high operational speed advantage over
other memory search algorithms, such as
look-aside tag buffers, binary or tree based
searches. - However, this performance advantage comes with a
price of higher silicon area, and higher power
consumption (3-5 W/chip).
3CAM Architecture
4IP lookups
- IP lookups are based on classless inter-domain
routing (CIDR) scheme. - The IP address space is broken into line
segments. - Each line segment is described by a prefix.
- A prefix is of the form x/y where x indicates
the prefix of all addresses in the line segment,
and y indicates the length of the segment. - e.g. The prefix 134.26/16 represents the line
segment containing addresses in the range
134.26.0.0 134.26.255.255. - The most specific route longest prefix match
5TCAMs in IP forwarding
- TCAMs dont care storage capability is
favorable in lookup engines. - e.g., for a 24-bit prefix, the last 8 bits will
be x - Routing table entries stored in TCAMs are
ordered according to their prefixes - In the case of multiple matches priority encoder
will choose the one with the lowest address,
which is the longest matching prefix
632-bit IP Prefix Distribution
7Partitioned TCAM
Perform a search in TCAM1 If there is a match
DONE otherwise search TCAM2 If
there is a match DONE otherwise
issue mismatch end
8Power Consumption
PTCAM PSTATIC PCLOCK PMATCH PMISS
PX Here, we are concerned about the dynamic power
consumption caused by the search (comparison)
operation.
where ?0 represent the probability the search
word has a match in the whole table, and ?1
represent the fraction of entries that resides in
TCAM1
9Power Consumption (cont.)
Assuming that the probability of match for a
single cell is 0.5
10Experimental Results dynamic power
Example TCAM circuit is implemented in Cadence
for TSMC 0.18-µm CMOS. Simulations are run at 100
MHz. PMATCH 397 nW PMISS 515 nW Px
336 nW
11Experimental Results dynamic power
12Experimental Results powerlatency
EL2-?0 L search latency in clock cycles
13Power consumption with different miss rates for
Telstra
14Experimental Results
Expected power savings and average latency when
partitioning into three TCAMs
15Reconfigurable Architecture
16Conclusions
- We presented a TCAM partitioning scheme, which
utilizes statistical distribution of prefixes. - We showed that indeed the partitioning helps
reducing the power consumption in IP lookup
applications. - Partitioning into two and three, reduce power
consumption considerably, whereas further
partitioning shows little improvement
17ANCHOR workshop at ISCA 2004
18Dynamic power