Scott Poretsky - PowerPoint PPT Presentation

About This Presentation
Title:

Scott Poretsky

Description:

Core Router Testing for High Availability. Architecture for the 21st Century Network ... Maintainer Response Time. Boot Time. Protocol Convergence. Time ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 29
Provided by: nan70
Category:

less

Transcript and Presenter's Notes

Title: Scott Poretsky


1
Core Router Testing for High Availability
  • Scott Poretsky
  • Avici Systems, Inc.
  • June 3, 2002

2
Outline
  • IP Network Availability
  • Test Coverage for 99.999 Availability
  • Commercial Test Equipment Requirements

3
IP Network Availability
4
High Reliability More Revenue
  • Reliability is the single biggest criteria in
    selecting an ISP, according to Interactive
    Week/Telechoice

ISP Customer Survey
ISP Customer Survey
4.8
4.8
4.7
4.7
4.6
4.6
4.5
4.5
Relative Importance
Relative Importance
4.4
4.4
4.3
4.3
4.2
4.2
4.1
4.1
4
4
Reliability
Value
Performance
Customer
Provisioning
Reliability
Value
Performance
Customer
Provisioning
Service
Speed
Service
Speed
New IP services demand higher levels of network
reliability
5
High Reliability More Profit
  • Compensation for poor router reliability through
    redundancy and interconnects can increase network
    cost by up to 50

IP Backbone
Service
Service
Service
Provider
Peering
Provider
Provider
Peer
Peer
Peer
Core Layer
Core Layer
(Backbone Router)
(Backbone Router)
Aggregation Layer
Aggregation Layer
(Hub Router)
(Hub Router)
Edge
Edge
Layer
Layer
Access
Access
VOIP
DSLAM
L3/4
CMTS
GGSN
L3/4
Direct
Direct
VOIP
DSLAM
L3/4
CMTS
GGSN
L3/4
Direct
Direct
Switch
Switch
Connects
Connects
Switch
Switch
Connects
Connects
Devices
Devices
6
Definitions
  • Reliable
  • Capable of being dependable (Webster)
  • Availability
  • Measure of Reliability using router/switch Uptime
  • Mission Reliability
  • Mean Time Between Critical Failures (MTBCF) or
    the average time between hardware or software
    failures that interrupt service (the mission)
  • Maintenance Reliability
  • Mean Time Between Failures (MTBF) or the average
    time between hardware failures that require
    corrective maintenance actions
  • Defects Per Million (DPM)
  • Measure of downtime equal to (1 Availability) x
    106

7
Contributing Factors for Availability
Total Time to Restore Router/Switch After a
Software Failure
CrashDump Time
Boot Time
Protocol Convergence Time
Mission Reliability
Image Upgrade Time
Software Failure Occurs
Not to Scale
Full Operation Restored
Time
Total Time to Restore a Module After a Hardware
Failure
Maintainer Response Time
Boot Time
Protocol Convergence Time
Removal and Replacement Time
Maintenance Reliability
Time
Hardware Failure Occurs
Full Operation Restored
Not to Scale
8
The Availability Goal
  • The Goal 99.999 Router Availability
  • The Reality 99.9 Router Availability
  • Features to achieve 99.999 availability.
  • Non-Stop Routing
  • Graceful Restart
  • What if testing could could improve Mission
    Reliability to achieve 99.999 Availability in
    absence of new features?
  • What if the addition of these new features would
    then achieve 99.9999 Availability?

9
Test Coverage
10
Traditional Test Coverage
  • Isolated testing of protocols
  • Functionality
  • Conformance
  • Interoperability
  • Scaling
  • Forwarding Performance in the absence of
    protocols.
  • Disadvantages
  • Operational environment is not tested
  • Operational conditions are not tested
  • The router under test is not completely stressed.
  • Deployed routers run multiple protocols
    simultaneously.

11
Test Program for 99.999 Availability
  • Stress Testing
  • Longevity Testing
  • Convergence Testing
  • Network-Specific Topology Testing
  • Automated Regression Testing

12
Stress Testing
  • Simultaneous configuration and scaling of
    multiple protocols.
  • BGP, IGP
  • MPLS-TE, LDP (optional)
  • MBGP, PIM-SM, MSDP (optional)
  • Traffic Forwarding
  • Line Rate Traffic Forwarding
  • Overutilize links
  • Enable QoS
  • Network Instability
  • Repeated Route Flaps
  • Link Loss
  • Tunnel Reroutes (optional)
  • Serviceability
  • Repeated SNMP Gets
  • Logging Enabled
  • Debug Enabled
  • Telnet with SHOW commands (stressful and invalid)

13
Stress Configuration
Optional Neighbor Router for Tunnel Reroutes
Router Under Test
Neighbor Router
Neighbor Router
Test Equipment
Test Equipment
Test Equipment
14
Stress Execution Guidelines
  • Configure ECMP, Parallel Paths, and Composite
    Links between routers
  • Use Live BGP Feed for Route Table
  • Mix traffic types across links (IP Unicast, IP
    Multicast, MPLS)
  • One neighbor router should be a different vendor
    to show interoperability under stress
  • Run Stress for many days (if the router lasts
    that long)
  • Router should experience more in a couple of days
    then it likely would in its operational lifetime.

15
Typical Stress Metrics
  • Flap 1 million BGP routes per hour
  • Forward 10 Terabits of data per hour
  • Perform 100,000 SNMP Gets per hour
  • Simulate 100 fiber cuts per hour (use every
    remote interface)
  • Along with
  • Full BGP Table
  • Full IGP Table
  • Full Multicast Cache
  • Required MPLS-TE Tunnels (protection optional)
  • Required LDP FECs
  • Enable Logging and Protocol Debug

16
Longevity Testing
  • Similar to Stress Testing, but more operational
    (less stressful) conditions injected over many
    weeks.
  • Simultaneous configuration and scaling of
    multiple protocols
  • Traffic Forwarding
  • More realistic Network Instability
  • More typical Serviceability actions
  • Use Live Internet feed.

17
Convergence Terms
  • Network Convergence -
  • The point in time at which all nodes in a network
    have updated their routing tables for a route
    entry change (new, withdrawal, or modification)
  • Protocol Convergence -
  • The point in time in which a single node updates
    its routing table and advertises the route table
    change to its peer in a routing protocol
    advertisement (or update) message.
  • Route Convergence -
  • The point in time in which a single node updates
    its routing table and reroutes traffic out the
    new interface.
  • Route Convergence is the common Router Benchmark.

18
Convergence Test Issues
  • Large number of Protocols in which Convergence is
    important.
  • Number of conditions that can impact results.
  • Technical difficulty in testing convergence of
    one protocol due to flap or instability of
    another protocol.

19
Convergence Test Conditions
  • Interface shutdown
  • on Local Interface
  • on Remote Interface
  • Fiber Pull
  • on Local Interface
  • on Remote Interface
  • Peer removal via CLI
  • on Local router
  • on Peer router
  • Peer node failure
  • Route Table changes
  • Route Withdrawal
  • Route Flap
  • Next-Hop Change
  • Metric Change
  • Dynamic Constraint Change
  • Policy Change

All conditions must be tested because different
results can be produced.
20
Network-Specific Topology Testing
  • Large network with many routers (e.g. 10)
  • Use multiple vendors for interoperability/function
    ality testing.
  • Multiple protocols configured in deployment
    scenario
  • Run test cases to match deployment scenario

21
Automated Regression Testing
  • Addition of bug fixes/new features put previously
    working features at risk.
  • Regression testing ensures that the previously
    working features still work.
  • As the number of releases with new features grow
    it is more difficult to provide complete
    regression coverage through manual testing
    (increasingly labor intensive).
  • Automated regression testing enables more
    coverage in less time.
  • Automation is typically achieved using TCL
    scripts.
  • Configuration

Router Under Test
Test Equipment
22
Commercial Test Equipment Requirements
23
The State of the Union
  • Test Equipment fails to meet todays requirements
    for testing 99.999 Availability.
  • Router vendors have been forced to develop their
    own specialized test tools.
  • Carriers have been forced to use the router
    vendor test tools.
  • Test Equipment vendors must respond to the
    challenge today.

24
Stress Testing Requirements
  • Maintain BGP Sessions and IGP Adjacencies
  • Flap BGP Routes
  • Signal and maintain RSVP-TE tunnels
  • Distribute LDP FECs
  • Signal and maintain Multicast Groups
  • Perform SNMP GETs and check validity
  • Forward Traffic (IP Unicast, IP Multicast, and
    MPLS)
  • Make the network seem much bigger than it really
    is without having to obtain hundreds of routers.

25
Required Protocol Emulation/ Conformance Suites
Coverage
  • Routing Protocols
  • BGP
  • OSPF, ISIS
  • OSPF-TE, ISIS-TE
  • RSVP-TE
  • Fast Reroute
  • Standby Tunnels
  • Ingress, Mid-Point, Egress
  • LDP
  • RFC 2547 Layer 3 VPNs
  • Martini Layer 2 VPNs
  • P and PE
  • LDP over RSVP
  • Multicast
  • MBGP
  • PIM-SM
  • MSDP

26
Protocol Emulation Requirements
  • Run any protocols in combination on the same
    interface
  • Forward traffic for emulated protocols
  • Protocol Emulation on any interface type GigE,
    10GigE, and POS (including 192c).
  • Scaling
  • BGP Sessions gt500/system, gt100/interface
  • BGP Routes gt3M/system, gt500K/session
  • MPLS-TE Tunnels gt10K - Ingress, Mid-Point, Egress
  • FECs gt10K
  • Load external BGP table for advertisement
  • Controlled BGP Route Flapping

27
Automated Regression Requirements
  • Commercial test equipment vendors offer protocol
    conformance TCL suites.
  • Test Case coverage must be improved within each
    suite
  • Interaction between protocols must be tested
  • Need each script to test multiple interfaces (4
    or more)
  • Full Protocol Coverage
  • Multicast protocols have been the forgotten son

28
System Requirements
  • Multiple ports per chassis (gt32)
  • Automated Convergence measurement
  • Automated reroute/failover measurement
  • Support for ECMP and Composite Links
  • System/Protocol Stability For Many Days
  • Ability to store GUI configuration for
    repeatability.
  • Ability to TCL script any GUI test case.
Write a Comment
User Comments (0)
About PowerShow.com