Reconfigurable Architectures for High Bandwidth Network Processing Systems - PowerPoint PPT Presentation

1 / 67
About This Presentation
Title:

Reconfigurable Architectures for High Bandwidth Network Processing Systems

Description:

Title: Slide 1 Author: Sakir Sezer Last modified by: John McCanny Created Date: 1/13/2005 10:43:13 AM Document presentation format: A4 Paper (210x297 mm) – PowerPoint PPT presentation

Number of Views:101
Avg rating:3.0/5.0
Slides: 68
Provided by: Sak82
Category:

less

Transcript and Presenter's Notes

Title: Reconfigurable Architectures for High Bandwidth Network Processing Systems


1
Reconfigurable Architectures for High Bandwidth
Network Processing Systems
  • Professor John McCanny CBE FRS FREng
  • Dr Sakir Sezer, Dr Maire McLoone

2
Institute of Electronics Communications and
Information Technology
You see things and say Why? but I dream things
that never were and say Why not?
George Bernard Shaw
3
Purpose of Talk
  • An overview of Research on reconfigurable
    architectures for Network Processing applications
  • Three aspects
  • Node throughput
  • QoS
  • Data security

4
Structure of talk
  • Convergence of Communication systems
  • Processing demands of future networks
  • Trade-offs of reconfigurability in network
    processing in the context of Application specific
    architectures for
  • Programmable Data-Link layer Datagram Processing
  • Programmable Packet scheduling Architectures
  • Configurable Cryptographic Architectures
  • Conclusions

5
Convergence of Communication and Information
Systems
Broadcast, Multicast VoD, TV, Radio
Convergence of Technology, Applications and
Services
Real Time Interactive Services
Best effort Services
W L A N
6
Bandwidth Demand Vs Moores Law
Data processing demand at network- and access
nodes doubles every 6-9 Months
Technology Gap
Network Processing Gap
Internet Traffic doubles every 12 Months
Moores Law Silicon Integration
Capability doubles every 18 Months
Existing data processing architectures are unable
to keep up with network processing demands !!
7
Issues
  • Internet traffic is continuously doubling every
    12 months
  • Emerging services require
  • Higher bandwidth (VoD, DVB-IP, VoIP)
  • Higher degree of security (Internet Banking,
    internet shopping, e-business)
  • Network processing demands a consequence of
  • Smaller packet sizes of real-time and interactive
    services
  • QoS requirements of real-time and interactive
    services
  • Complex security processing of sensitive data
  • Network protection from viruses and intruders

8
Network Processors Architectures -High
performance with flexibility
  • To cope with exponential growth in bandwidth
    demands
  • Complex traffic profiles and heterogeneity of
    service
  • To efficiently utilise resources by dynamically
    adapting network nodes to various traffic
    patterns
  • Capability for on-demand and customised QoS
    support
  • Cost effective upgrades to new communication
    protocols
  • Ideally high levels of compute power, high levels
    of flexibility

9
Application Specific, Configurable Network
Processing Architectures
  • Programmable Data-Link layer Datagram Processing
  • Frame Delineation
  • Frame Check Sequence
  • Programmable Packet scheduling Architectures
  • Logic-level reconfigurable packet scheduling
    architecture
  • System-level configurable packet scheduling
    architecture
  • Configurable Cryptographic Architectures
  • Iterative and Non-iterative architectures

10
  • Data-Link Layer
  • Protocol processing

11
Data-Link layer Protocol processing
  • Data Link Layer protocols enables a
    point-to-point connection between two peers over
    a physical link.

Common Data Link Layer protocols are ATM,
Ethernet, PPP, GFP, Frame Relay, Fibre Channel
etc.
12
IP over SDH/SONET Data Link Layer Protocols
Internet Protocol
Network Layer
Data Link Layer
Ethernet Bridge, VLAN
Frame Relay
ATM (MPOA, MPLS,..)
PPP
GFP
PHY Layer
SDH/SONET
Legacy Protocols
Emerging Protocols
13
Data-link layer frame processing
  • Frame processing involves two key functions
  • frame delineation and
  • frame check sequence (FCS)
  • The circuit architectures for both functions are
    determined by
  • the protocols and
  • the data-path width
  • Numerous Frame Delineation and FCS architectures
    for PPP, ATM and GFP investigated.
  • Scalability
  • Throughput
  • Hardware costs
  • Programmable frame processing architecture is
    desirable to support a variety of protocols

14
  • Data-Link Layer
  • Protocol processing
  • Frame Delineation

15
PPP 32-bit ACCM Transmitter Circuit
Includes Asynchronous-Control-Character-Map
(ACCM) function.
16
PPP Frame Delineation CircuitPost-layout
Synthesis - Altera Stratix II
Hardware Complexity O(N)N2
17
8 bit and 32 bit Data Paths
  • 32 bit data path requires additional hardware
    for rearrangement of data words before and after
    transmission.
  • Scaling involves a significant area penalty.
  • Complex data reorganization circuits designed to
    overcome the limitations set by an octet based
    protocol
  • Requires an increased logic cost by factors of 15
    and 26 for the ACCM receiver and the ACCM
    transmitter circuits respectively.
  • Majority of logic increase due to the number of
    byte comparators, as well as the provision of
    extra routing and the conditional multiplexers


18
ATM Frame Delineation
  • ATM Frame 5 byte header, 48 byte payload
  • Based on Cyclic Coding
  • Header Error Check HEC
  • Cyclic Redundancy Check (CRC)
  • Provides header error detection and frame
    delineation
  • 5th header byte (HEC) calculated from CRC
    computation of 1st 4 header bytes
  • CRC polynomial G(x) x8x2x1

8 Bits
First 4 header bytes
GFCUNI/VPI
VPI
VPI
VCI
ATM Header
CRC Computation G(x) x8x2x1
VCI
VCI
PT CLP
HEC
19
ATM Bit-by-Bit HEC HUNT 4-Bit Data-Path
Architecture
Enable O/P if Match
8 cycles of Data
Compare with next 2 nibbles
Reset CRC Unit
20
4, 8, 16, 32 and 64 bit implementations
Altera Stratix Technology
16 bit design - 2.5 Gbps supports SONET OC48
line rate 64 bit design - 6.8 Gbps
21
Generic Frame Procedure (GFP)
  • The Generic Frame Procedure is a Layer-2 framing
    protocol for data over high-capacity optical
    networks.
  • Recently standardised (ITU-T G.7041) to replace
    ATM and PPP in high capacity Wide Area Networks
    (WANs)
  • GFP is scalable, allowing the implementation of
    wide data-path architectures.
  • GFP deploys a CRC based frame delineation
    architecture similar to ATM HEC HUNT and
    synchronisation technique

22
GFP Frame Structure
16-bit GFP Core Header Error Check (CHEC) field
is used for frame delineation
23
GFP Frame Delineation 64-bit Datapath with 1-bit
Header Error Correction Circuit
Preliminary Design study
Max CLK 165 MHz ALUTs 1107 Register 653 ALMs
751 LABs 149 Throughput 10.5 Gbps
Altera Stratix II-3 FPGA Technology
24
GFP Frame Delineation 64-bit Datapath with 1-bit
Header Error Correction Circuit
  • Cadence Encounter UMC-130nm
  • Clock frequency   250 MHz
  • Total area             0.12 mm2
  • Throughput 16.0 Gbps
  •  
  • Total-power          1.6x10-02 Watts
  • Internal-power      1.4x10-02 Watts
  • Switching-power 2.3x10-03 Watts
  • Leakage-power   8.1x10-05 Watts
  •  

UMC-130nm Reference Design
Fastest implementation in the literature
25
Programmable Frame Delineation
Target 10Gbps not achievable in FPGA Technology,
should be with ASIC
Altera Stratix II-3 FPGA Technology
26
32-Bit Protocol Processing Circuit Decomposition
27
Programmable ATM/GFP Protocol Frame Delineation
Architecture
Separate Data Path
Shared Common Elements
28
Architectural Studies Frame Delineation
Architectures
  • Header Error Check architectures easier to scale
    than pattern based architectures.
  • Examined feasibility of driving a a common
    programmable frame delineation architecture for
    layer-2 protocols (PPP, ATM, GFP) operating at at
    least 2.5 Gbps.
  • Unable to derive due to diverse nature of
    techniques used
  • Implementation of GFP and ATM frame delineation
    techniques, which are based on a similar header
    error check method, have shown significant
    diversity and restrictions for a common
    architecture.
  • However, a programmable architecture that is
    slightly faster and smaller can be derived that
    is highly suitable for standard cell based
    implementation key aspects reduction of
    registers by 50, a key cost

29
Frame Delineation Architectures - Conclusions
  • Options
  • Multiplexed specific-purpose circuits
  • FPGA that can be reconfigured to implement a
    specific protocol
  • First the more efficient implementation (area and
    speed) for 10 or less protocols
  • Derivation of a programmable datapath based on
    common low level functional elements is a
    potential low hardware cost option

30
  • Data-Link Layer
  • Protocol processing
  • Frame Check Sequence

31
Frame Check Sequence Circuits
  • Data integrity is paramount for data-link layer
    protocols
  • Cycle Redundancy Check (CRC) is the preferred
    methodology detection bit and burst errors in
    payloads of protocols due to medium related
    noise.
  • Commonly used CRC types for layer -2 protocols

32
Investigated Architectures
  • Hardwired parallel CRC circuits for a given port
    size and generator polynomial G(x).
  • Semi-reconfigurable parallel CRC circuit with
    reconfigurable input port size and a given
    generator polynomial G(x).
  • Fully reconfigurable CRC computation circuit for
    any generator polynomial G(x) of up to the power
    of 32 and port sizes of 4, 8, 16, 24, and 32 bits.

33
Parallel Hardwired CRC-8 Circuit
34
Parallel CRC-32 with Programmable Input Bus
Input bus configuration
Programmable input bus is required, if the frame
size is not a multiple of the port size, or the
frame data is not aligned to the Less-Significant
Byte (LSB) of the input bus as illustrated below
Requires feedback circuit reconfiguration
35
Dedicated Parallel CRC architectures
Altera Stratix II-3 FPGA Technology
36
Programmable CRC with Partial Programmability
(uses multiplexing)
Altera Stratix II-3 FPGA Technology
37
Parallel and Fully Reconfigurable CRC Computation
Circuit for High Speed Data Processing
Patent Pending
Max CLK 114.98 MHz ALUTs 2240 Register 1365 A
LMs 1620 LABs 292 Supports throughput rates
above 2.5Gbps (3.68 Gbps)
Altera Stratix II-3 FPGA Technology
38
Parallel and Fully Reconfigurable CRC Computation
Circuit for High Speed Data Processing
  • Cadence Encounter UMC-130nm
  • Clock frequency   125 MHz
  • Total area             0.27mm2
  • Throughput 8.0 Gbps
  •  
  • Total-power          5.9x10-3 Watts
  • Internal-power      4.2x10-3 Watts
  • Switching-power 1.6x10-3 Watts
  • Leakage-power   1.2x10-4 Watts
  •  

UMC-130nm Reference Design
39
Performance Evaluation
  • There is a trade-off cost for programmability
  • Fully reconfigurable CRC is 8x larger and 2x
    slower than the hard-wired CRC-32
  • For CRCs with small polynomials and input bus
    sizes, the area cost difference can be a factor
    100
  • Hardware efficient programmability for parallel
    FEC circuits can be achieved by multiplexing
    between different custom implementations

40
Performance Evaluation
  • If polynomial G(x) is not known, then a fully
    programmable implementation is an appropriate
    solution
  • Other applications include storage where CRC
    computation can be 30 of overall

41
  • Programmable Packet Scheduling

42
Programmable IP packet scheduling
  • Programmability of Internet protocol packet
    scheduling an essential feature to deal with
  • Complex traffic profiles and heterogeneity of
    service
  • Efficient bandwidth resource utilisation
  • To provide on-demand and customised QoS support

43
Current Internet QoS Problems
  • Internet Routers support best effort packet
    delivery only Best Effort Service
  • Delay guarantee for delay sensitive services
    cannot be provided
  • Real time and interactive services (Voice, Video)
    will not meet users expectation of quality

44
How we can provide QoS in Internet
Multiple Lane Model The Motorway
Resource Reservation Model Railway / Aircraft
Single Lane Model
Proposed Method Integrated Services IntServ
Current Internet Best Effort Service
Proposed and partially deployed
Method Differential Services DiffServ
45
Packet scheduling is paramount for QoS
  • A packet scheduler decides when to send each
    packet based on
  • Traffic type ID (Tag, DiffServe)
  • Flow ID (Source Destination Address, IntServ)
  • Scheduling algorithm and the deployed service
    policy determines the QoS performance of the
    Network
  • Scheduling Method Tradeoffs
  • Computation Complexity ? Desired Fairness

Router / Switch
Output Control
46
Switch adaptation via partial reconfiguration
Adaptation achieved by partially reconfiguring
FPGA by adding or removing (i.e. reconfiguring)
packet FIFO circuits and output packet schedulers
Partial Reconfiguration
47
Issues relating to run-time, gate level
configurable logic
  • Limited memory resources, off chip memory access
    a bottleneck
  • Reconfiguration interrupts traffic flow - QoS
    degradation.
  • Partial reconfiguration is limited to similar
    scheduling policies.
  • Runtime and partial configuration adds additional
    complexity
  • Partial reconfiguration control remains an
    unsolved challenge despite a promising model.
  • Current FPGA technology immature in terms of
    design tools and run-time reconfiguration support

48
Systems Level Approach for Programmable Packet
Scheduling
Patent Pending
Packet handling isolated from schedule policy
functions
49
Systems Level Approach for Programmable Packet
Scheduling
  • Packet handling functions isolated from
    scheduler policy specific functions.
  • Individual queues replaced by a more complex
    shared buffer architecture that can support
    multiple queues
  • controlled via address pointers and link lists.
  • Provides clear separation of the packet
    scheduling architecture into
  • a circuit purely responsible for dealing with
    packet service policies i.e. scheduling
    algorithms and
  • a circuit concerned with packet handling e.g.
    store/retrieve
  • Allows flexible programmability of the scheduling
    policy and number of packet queues without
    reconfiguring the hardware
  • Comparable throughput rates to implementations
    with physically built queues

50
Configurable Packet Scheduling
Patent Pending
  • Cadence Encounter UMC130nm
  • Clock frequency   143 MHz
  • Number of IOs 478 Pins
  • Total area             14.4 mm2
  • Number of Sessions 1,000,000
  • Number of Packets External DDR
  • Up to 30 Million packets can be support
  • Throughput 35.8M packets/sec
  • Throughput 40 Gbps line rate
  • (assuming a mean IP packet size of 130 bytes)

Address Translation Table
Search Trie Memory
51
Conclusions
  • Queue and scheduling policy adaptation via
    address pointer, lookup tables and packet
    time-stamp processing
  • Low programming complexity
  • Service requirements can be translated
    immediately
  • No traffic interruption is required
  • instant change of queues and scheduling policies

52
Conclusions
  • Performance comparable with customized
    implementations
  • Complex, but affordable data processing hardware
  • Programmability at the system level NOT
    reconfigurability at gate level
  • Programming scheduler does not require place and
    route at gate level

53
  • Programmable
  • Cryptographic Architectures

54
Programmable Cryptographic Architectures
  • Encryption needs to be performed on data in
    real-time
  • 100 Mbps networks, 1G Ethernet, 10G Ethernet
  • This holds the key to successful growth of
    applications such as WLANs, satellite
    communications, e-businesses
  • Software architectures are too slow
  • Hardware solutions required

55
Programmable Cryptographic Architectures
  • Reconfigurable Cryptographic Architectures can be
    used to provide the security requirements of many
    applications
  • FPGAs are well suited for crypto algorithms
  • allow algorithm agility
  • support alterable architecture parameters,
    scalable security (DES/ 3DES)
  • Clever mapping of complex math operations onto
    special purpose silicon architectures

56
Private Key Algorithms AES
  • NIST requested a new Advanced Encryption Standard
    (AES) to replace DES - Sept 1997
  • Interim measure TripleDES
  • RIJNDAEL AES Winner - Oct 2000
  • Developed by Joan Daemen, Vincent Rijmen
  • Replaced DES as Federal Standard in November 2001
  • 128-bit Data, 128, 192 or 256-bit Key

57
Reconfigurable AES Architecture
  • In conjunction with AES, NIST recommended 5 modes
    of operation
  • Electronic Codebook (ECB) mode
  • Cipher Block Chaining (CBC) mode
  • Output Feedback (OFB) mode
  • Ciphertext Feedback (CFB) mode
  • Counter (CTR) mode a simplification of OFB mode

58
Private Key Algorithms AES
59
Reconfigurable AES Architecture
  • Reconfigurable AES architecture with following
    features
  • Iterative architecture
  • On-chip key scheduling
  • Support for 3 key lengths
  • Encryption Decryption
  • Support for feedback modes of operation

60
Performance Evaluation
Reconfigurable AES V Specific-purpose Enc/Dec
Device Area Throughput (Mbps)
AES Encryptor 128-bit Key XCV400E 1987 Slices 18 BRAMs 423
AES Decryptor 128-bit Key XCV600E 2121 Slices 18 BRAMs 557
AES Enc/Dec 128-bit key Supports 5 modes XCV600E 4681 Slices 20 BRAMs 310
61
Performance Evaluation
  • 2 additional BRAMs required in reconfigurable
    design as memory re-use possible
  • Reconfigurable Design
  • Throughput reduced up to 40
  • Area increased by 10
  • However, modes of operation supported
  • gt Area/speed penalty acceptable trade-off in
  • favour of using reconfiguration over multiple
  • specific-purpose circuits

62
  • Conclusions

63
Conclusions
  • Frame Delineation
  • Common architecture could not be found
  • Separate FPGA circuit for each or multiplex
    between separate circuits on an ASIC
  • Derivation of a programmable datapath based on
    common low level functional elements is a
    potential low hardware cost option
  • CRC circuits well defined (i.e. 8 ) options
  • Fully reconfigurable ASIC possible but larger and
    slower than 8 separate versions
  • If G(x) and number of options is not known then
    use fully programmable solution

64
Conclusions
  • Packet Scheduler
  • Systems level approach deploying address pointer,
    lookup tables and packet time-stamp processing
    the most appropriate approach
  • Enables programmability while supporting line
    rates beyond 100 Gbps
  • Best approach, tackle at the Systems and
    Architecture level rather than FPGA level
  • Current FPGA technology and design tools for
    run-time reconfiguration too immature for packet
    scheduling

65
Conclusions
  • Encryption/Decryption
  • Re-configurable architecture identified
  • Supports a number of modes of operation
  • Reconfigurable Design
  • Throughput reduced up to 40
  • Area increased by 10
  • Area/speed penalty acceptable trade-off compared
    with reconfiguration of multiple specific circuits

66
Reconfigurable Architectures for High Bandwidth
Network Processing Systems
  • Professor John McCanny CBE FRS FREng
  • Dr Sakir Sezer (s.sezer_at_ecit.qub.ac.uk) ,
  • Dr Maire McLoone (m.mcloone_at_ecit,qub.ac.uk)

67
Institute of Electronics Communications and
Information Technology
You see things and say Why? but I dream things
that never were and say Why not?
George Bernard Shaw
Write a Comment
User Comments (0)
About PowerShow.com