Hardware Implementation of Fast Forwarding Engine using Standard Memory and Dedicated Circuit - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Hardware Implementation of Fast Forwarding Engine using Standard Memory and Dedicated Circuit

Description:

Hardware Implementation of Fast Forwarding Engine using Standard Memory and Dedicated Circuit Kazuya ZAITSU, Shingo ATA, Ikuo OKA (Osaka City University, Japan) – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 17
Provided by: 21079
Category:

less

Transcript and Presenter's Notes

Title: Hardware Implementation of Fast Forwarding Engine using Standard Memory and Dedicated Circuit


1
Hardware Implementation ofFast Forwarding Engine
usingStandard Memory and Dedicated Circuit
  • Kazuya ZAITSU, Shingo ATA, Ikuo OKA
  • (Osaka City University, Japan)
  • Koji YAMAMOTO
  • (Renesas Design Corporation, Japan)
  • Yasuto KURODA, Kazunari INOUE
  • (Renesas Electronics Corporation, Japan)

2
Outline
  • Background
  • Objective
  • Proposed hardware architecture
  • Hardware architecture evaluation
  • FPGA implementation
  • Hardware evaluation
  • Conclusion

3
What is TCAM?
  • TCAM Ternary Content Addressable Memory
  • Feature
  • Very high speed searching
  • Input data for matching, output memory address
  • 3rd matching state of dont care in addition 1s
    and 0s
  • Application
  • Looking up the routing table in IP routers

Addr . Prefix
1 192.168..
2 192.168.100.
3 192.168.101.

Input 192.168.101.1
Output 3
Routing table
4
TCAM problems
  • Manufacturing cost
  • /bit is 4 times more expensive than SRAM.
  • Power consumption
  • All logical gates must be energized for every
    search.
  • Capacity
  • Expensive price-per-bit-ratio and power-saving
    activities
  • Hard to pursue denser TCAM

Search performance Manufacturing cost Power Consumption Capacity
Requirements High Low Low High
TCAM High High High Low
5
Objective
  • Propose a new hardware architecture
  • Focus on the address lookup in the routing table
    of routers
  • RAM-based design
  • Named Custom Memory
  • Hardware design of the Custom Memory
  • Verify the effectiveness of the Custom Memory
  • Effectiveness of our architecture
  • Dramatically reduce its cost and power
    consumption
  • Implementation to the FPGA

Speed Cost Power Capacity
Custom Memory High Low Low High
TCAM High High High Low
6
Design concepts
Speed Cost Power Capacity Interface
Custom Memory High Low Low High Same as TCAM
  • Divide the memory area into equal-sized tables
  • Low power
  • RAM-based design
  • Low cost, low power, high capacity
  • Lookup operation by single access
  • High search performance
  • Same physical user interface as TCAM
  • Aim to replace the TCAM in the market

7
Architectural overview
Divide into subtables
RAM based design
Custom Memory
Command
RAM
Search device 0
Address
Table 0
Table 1
Search device 1
IP addr.
???
Table -1
???
Comparator
Prefix
Search device N
Same physical user interface as TCAM
8
Search device partitioning
  • How to decide a device to store?

Partitioning based on prefix length
Example
Search device 0 (prefix length 8)
6.0.0.0/8 24.128.0.0/9 62.30.0.0/16 112.63.240/
20 184.128.191.0/24 232.95.225.1/32
Search device 1 (prefix length 9)
???
Search device N (prefix length 32)
9
Table partitioning
  • How to decide a table to store?
  • bits in prefix are extracted for index bits.
  • Remainder bits are stored.
  • How to determine the index bits?

Extract last bits from prefix
Example ( 8)
Search device (prefix length 16)
RAM
0
01011000
01101011
???
00110111
154.1.0.0/16 ?10011010.00000001 ?10011010.00000001
1
1
01011000
???
empty
10011010
???
???
???
-2
empty
???
empty
Index bits
Remainder bits
-1
01001111
???
empty
10
Search operation
Custom Memory
Table
Search Command
Destination IP Address
Search device (prefix length 8)
Index calculator
Search device (prefix length 9)
Destination IP Address
RAM
Table 0
Input-output controller
LPM comparator
Table 1
???
???
Table -1
Search device (prefix length 32)
???
Comparator
Hit address
Hit address
11
Evaluation of partitioning
  • Which bits are better to use as index bits?
  • Distribution of table is affected to the cost.
  • Evaluation metric
  • Maximum number of prefixes in the table

Extract last bits from prefix
RAM
???
word lines
???
of prefixes in table
???
???
???
???
Comp.
???
Comp.
comparators
Table
11
12
Effectiveness of indexing
  • Top k bits using the top bits for index bits
  • proposal using the last bits for index bits
  • bottom ideal value (unrealizable)

Max of prefixes in table ( )
Prefix length
13
FPGA implementation
  • ALTRA Stratix IV GX FPGA Development Kit
  • Verilog-HDL
  • Parameters
  • 4 search devices
  • 256 tables/device
  • 128 prefixes/table

Search device 0
RAM
Table 0
Search device 1
Table 1
???
128 prefixes
Table 255
Search device 2
Comparator
Search device 3
14
Hardware evaluation
Search performance Power consumption (mA) Chip area (ratio)
Custom Memory every clock (125MHz) 6.53 (52) 62
TCAM every clock (360MHz) 12.43 100
(4k bits Array, Vdd1.0V, Room Temp. 125Msps)
Custom Memory
RAM
RAM
???
RAM
word lines
RAM
???
???
???
???
RAM
RAM
???
Comp.
???
Comp.
comparators
Operation area
15
FPGA experiment
  • Examine the hardware operation
  • Use a raw data (BGP routing table)

16
Conclusion
  • Design RAM-based fast forwarding engine
  • Hardware architecture
  • FPGA implementation
  • Reduce the costs and power
  • 62 cost (compare with TCAM)
  • 52 power consumption (compare with TCAM)
  • Future work
  • Implementation parameter optimization
  • Handling of the table overflow
Write a Comment
User Comments (0)
About PowerShow.com