OSPF Monitor Architecture, Design and Deployment Experience - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

OSPF Monitor Architecture, Design and Deployment Experience

Description:

Post-mortem analysis of recurring problems. Generate statistics and reports about network performance. Identify anomaly signatures ... – PowerPoint PPT presentation

Number of Views:283
Avg rating:3.0/5.0
Slides: 31
Provided by: AmanS9
Category:

less

Transcript and Presenter's Notes

Title: OSPF Monitor Architecture, Design and Deployment Experience


1
OSPF Monitor Architecture, Design and Deployment
Experience
  • Aman Shaikh
  • Albert Greenberg
  • ATT Labs - Research
  • NSDI 2004

2
Objectives for OSPF Monitor
  • Real-time analysis of OSPF behavior
  • Trouble-shooting, alerting, validation of
    maintenance
  • Real-time snapshots of OSPF network topology
  • Off-line analysis
  • Post-mortem analysis of recurring problems
  • Generate statistics and reports about network
    performance
  • Identify anomaly signatures
  • Facilitate tuning of configurable parameters
  • Improve maintenance procedures
  • Analyze OSPF behavior in commercial networks

3
OSPF Monitor in a Nutshell
  • Collect OSPF LSAs (Link State Advertisements)
    passively from network
  • Every router describes its local connectivity in
    an LSA
  • Router originates an LSA due to...
  • Change in network topology
  • Periodic soft-state refresh
  • LSA is flooded to other routers in the domain
  • Flooding is reliable and hop-by-hop
  • Flooding leads to duplicate copies of LSAs being
    received
  • Every router stores LSAs (self-originated
    received) in link-state database ( topology
    graph)
  • Real-time analysis of LSA streams
  • Archive LSAs for off-line analysis

4
Components
  • Data collection LSA Reflector (LSAR)
  • Passively collects OSPF LSAs from network
  • Reflects streams of LSAs to LSAG
  • Archives LSAs for analysis by OSPFScan
  • Real-time analysis LSA aGgregator (LSAG)
  • Monitors network for topology changes, LSA
    storms, node flaps and anomalies
  • Off-line analysis OSPFScan
  • Supports queries on LSA archives
  • Allows playback and modeling of topology changes
  • Allows emulation of OSPF routing

5
Example
LSAs
LSAs
TCP Connection
LSAs
LSAR 1
LSAR 2
Reflect LSA
Reflect LSA
replicate
LSAs
LSAs
LSAs
OSPF Network
Area 0
Area 2
Area 1
6
How LSAR attaches to Network
  • Host mode
  • Join multicast group
  • Adv completely passive
  • Disadv not reliable, delayed initialization of
    LSDB
  • Full adjacency mode
  • Form full adjacency ( peering session) with a
    router
  • Adv reliable, immediate initialization of LSDB
  • Disadv LSARs instability can impact entire
    network
  • Partial adjacency mode
  • Keep adjacency in a state that allows LSAR to
    receive LSAs, but does not allow data forwarding
    over link
  • Adv reliable, LSARs instability does not impact
    entire network, immediate initialization of LSDB
  • Disadv can raise alarms on the router

7
Partial Adjacency for LSAR
I need LSA L from LSAR
Partial state
  • Router R does not advertise a link to LSAR
  • LSAR does not originate any LSAs
  • Routers (except R) not aware of LSARs presence
  • Does not trigger routing calculations in network
  • LSARs going up/down does not impact network
  • LSAR?R link is not used for data forwarding

8
LSA aGregator (LSAG)
  • Analyzes reflected LSAs from LSARs in real-time
  • Generates console messages
  • Change in OSPF network topology
  • ADJACENY COST CHANGE rtr 10.0.0.1 (intf
    10.0.0.2) ? rtr 10.0.0.5 old_cost 1000 new_cost
    50000 area 0.0.0.0
  • Node flaps
  • RTR FLAP rtr 10.0.0.12 no_flaps 7 flap_window
    570 sec
  • LSA storms
  • LSA STORM lstype 3 lsid 10.1.0.0 advrt 10.0.0.3
    area 0.0.0.0 no_lsas 7 storm_window 470 sec
  • Anomalous behavior
  • TYPE-3 ROUTE FROM NON-BORDER RTR ntw 10.3.0.0/24
    rtr 10.0.0.6 area 0.0.0.0
  • Dumps snapshots of network topology

9
OSPFScan
  • Tools for off-line analysis of LSA archives
  • Parse, select (based on queries), and analyze
  • Functionality supported by OSPFScan
  • Classification of LSA traffic
  • Change LSAs, refresh LSAs, duplicate LSAs
  • Emulation of OSPF Routing
  • How OSPF routing tables evolved in response to
    network changes
  • How end-to-end path within OSPF domain looked
    like at any instance
  • Modeling of topology changes
  • Vertex addition/deletion and link
    addition/deletion/change_cost
  • Playback of topology change events
  • Statistics and report generation

10
Performance Evaluation
  • Performance of LSAR and LSAG through lab
    experiments
  • LSAR and LSAG are key to real-time monitoring
  • How performance scales with LSA-rate and network
    size

11
Experimental Setup
PC
SUT
LSAG
TCP connection
OSPF adjacency
Zebra
LSAR
TCP connection
12
Methodology
  • Send a burst of LSAs from Zebra to LSAR
  • Vary number of LSAs (l) in a burst of 1 sec
    duration
  • Use of fully connected graph as the emulated
    topology
  • Vary number of nodes (n) in the topology
  • Performance measurements
  • LSAR performance LSA pass-through time
  • Zebra measures time difference between sending
    and receiving an LSA from LSAR
  • LSAG performance LSA processing time
  • Instrumentation of LSAG code

13
LSAR Performance
14
LSAG Performance
15
Deployment
  • Tier-1 ISP network
  • Area 0, 100 routers point-to-point links
  • Deployed since January, 2003
  • LSA archive size 8 MB/day
  • LSAR connection partial adjacency mode
  • Enterprise network
  • 15 areas, 500 routers Ethernet-based LANs
  • Deployed since February, 2002
  • LSA archive size 10 MB/day
  • LSAR connection host mode

16
LSAG in Day-to-day Operations
  • Generation of alarms by feeding messages into
    higher layer network management systems
  • Grouping of messages to reduce the number of
    alarms
  • Prioritization of messages
  • Validation of maintenance steps and monitoring
    the impact of these steps on network-wide OSPF
    behavior
  • Example
  • Network operators use cost-out/cost-in of links
    to carry out maintenance
  • A link-audit web-page allows operators to keep
    track of link costs in real-time

17
Problems Caught by LSAG
  • Equipment problem
  • Detected internal problems in a crucial router in
    enterprise network
  • Problem manifested as episodes of OSPF adjacency
    flapping
  • Configuration problem
  • Identified assignment of same router-id to two
    routers in enterprise network
  • OSPF implementation bug
  • Caught a bug in type-3 LSA generation code of a
    router vendor in ISP network
  • Faster refresh of LSAs than standards-mandated
    rate

18
Long Term Analysis by OSPFScan
  • LSA traffic analysis
  • Identified excessive duplicate LSA traffic in
    some areas of Enterprise Network
  • Led to root-cause analysis and preventative steps
  • Statistics generation
  • Inter-arrival time of change LSAs in ISP network
  • Fine-tuning configurable timers related to route
    calculation ( SPF calculation)
  • Mean down-time and up-time for links and routers
    in ISP network
  • Assessment of reliability and availability

19
Lessons Learned through Deployment
  • New tools reveal new failure modes
  • Real-time alerting and off-line analysis are
    complementary
  • Distributed architecture helped a lot
  • OSPF exhibits significant activity in real
    networks
  • Maintenance and genuine problems
  • Add functionality incrementally and through
    interaction with users
  • Archive all LSAs
  • LSA volume is manageable
  • Dont throw away refresh and duplicate LSAs

20
Conclusion
  • Three component architecture
  • LSAR data collection
  • LSAG real-time analysis
  • OSPFScan off-line analysis
  • Performance analysis
  • LSAR and LSAG scale well as LSA-rate and network
    size increases
  • Deployment
  • Deployed in Tier-1 ISP and Enterprise network
  • Has proved to be an extremely valuable tool for
    network management
  • OSPF Monitor was a Lifesaver
  • VP of Networking, Enterprise network?

21
Future Work
  • Real-time analysis
  • Correlation with other fault and performance data
    for more meaningful alerting
  • Prioritization of alerts
  • Off-line analysis
  • Correlation with other data sources
  • Work already underway BGP, fault, performance
  • Identification of problem signatures and feeding
    them into real-time component for problem
    prediction

22
Backup Slides
23
Overview of OSPF
  • OSPF is a link-state protocol
  • Every router learns entire network topology
  • Topology is represented as graph
  • Routers are vertices, links are edges
  • Every link is assigned weight through
    configuration
  • Every router uses Dijkstras single source
    shortest path algorithm to build its forwarding
    table
  • Router builds Shortest Path Tree (SPT) with
    itself as root
  • Shortest Path Calculation (SPF)
  • Packets are forwarded along shortest paths
    defined by link weights

24
Areas in OSPF
  • OSPF allows domain to be divided into areas for
    scalability
  • Areas are numbered 0, 1, 2
  • Hub-and-spoke with area 0 as hub
  • Every link is assigned to exactly one area
  • Routers with links in multiple areas are called
    border routers

25
Summarization with Areas
  • Each router learns
  • Entire topology of its attached areas
  • Information about subnets in remote areas and
    their distance from the border routers
  • Distance sum of link costs from border router
    to subnet

26
Link State Advertisements (LSAs)
  • Every router describes its local connectivity in
    Link State Advertisements (LSAs)
  • Router originates an LSA due to
  • Change in network topology
  • Example link goes down or comes up
  • Periodic soft-state refresh
  • Recommended value of interval is 30 minutes
  • LSA is flooded to other routers in the domain
  • Flooding is reliable and hop-by-hop
  • Includes change and refresh LSAs
  • Flooding leads to duplicate copies of LSAs being
    received
  • Every router stores LSAs (self-originated
    received) in link-state database ( topology
    graph)

27
Adjacency
  • Neighbor routers (i.e., routers connected by a
    physical link) form an adjacency
  • The purpose is to make sure
  • Link is operational and routers can communicate
    with each other
  • Neighbor routers have consistent view of network
    topology
  • To avoid loops and black holes
  • Link gets used for data forwarding only after
    adjacency is established
  • Use of periodic Hellos to monitor the status of
    link and adjacency

28
Equipment Problem at Enterprise Network
  • Internal errors in a router in area 0
  • Episodes where router would drop adjacencies with
    other routers
  • Problem manifested in LSAG as ADJ UP and ADJ
    DOWN messages
  • Not visible in other network management systems
  • Led to proactive maintenance

29
LSA Traffic in Enterprise Network
Refresh LSAs
Change LSAs
Duplicate LSAs
30
Overhead Duplicate LSAs
Days
  • Why do some areas witness substantial duplicate
    LSA traffic, while other areas do not witness
    any?
  • OSPF flooding over LANs leads to control plane
    asymmetries and to imbalances in duplicate LSA
    traffic
Write a Comment
User Comments (0)
About PowerShow.com