Internet Routing COS 598A Today: Detecting Anomalies Inside an AS - PowerPoint PPT Presentation

About This Presentation
Title:

Internet Routing COS 598A Today: Detecting Anomalies Inside an AS

Description:

Multiple products for polling and analyzing data. Disadvantages: dumb ... Around 2 million/day, and very bursty. Too much for an operator to manage ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 38
Provided by: albertgr
Category:

less

Transcript and Presenter's Notes

Title: Internet Routing COS 598A Today: Detecting Anomalies Inside an AS


1
Internet Routing (COS 598A)Today Detecting
Anomalies Inside an AS
  • Jennifer Rexford
  • http//www.cs.princeton.edu/jrex/teaching/spring2
    005
  • Tuesdays/Thursdays 1100am-1220pm

2
Outline
  • Traffic
  • SNMP link statistics
  • Packet and flow monitoring
  • Network topology
  • IP routers and links
  • Fault data, layer-2 topology, and configuration
  • Intradomain route monitoring
  • Interdomain routes
  • BGP route monitoring
  • Analysis of BGP update data
  • Conclusions

3
Why is Traffic Measurement Important?
  • Billing the customer
  • Measure usage on links to/from customers
  • Applying billing model to generate a bill
  • Traffic engineering and capacity planning
  • Measure the traffic matrix (i.e., offered load)
  • Tune routing protocol or add new capacity
  • Denial-of-service attack detection
  • Identify anomalies in the traffic
  • Configure routers to block the offending traffic
  • Analyze application-level issues
  • Evaluate benefits of deploying a Web caching
    proxy
  • Quantify fraction of traffic that is P2P file
    sharing

4
Collecting Traffic Data SNMP
  • Simple Network Management Protocol
  • Standard Management Information Base (MIB)
  • Protocol for querying the MIBs
  • Advantage ubiquitous
  • Supported on all networking equipment
  • Multiple products for polling and analyzing data
  • Disadvantages dumb
  • Coarse granularity of the measurement data
  • E.g., number of byte/packet per interface per 5
    minutes
  • Cannot express complex queries on the data
  • Unreliable delivery of the data using UDP

5
Collecting Traffic Data Packet Monitoring
  • Packet monitoring
  • Passively collecting IP packets on a link
  • Recording IP, TCP/UDP, or application-layer
    traces
  • Advantages details
  • Fine-grain timing information
  • E.g., can analyze the burstiness of the traffic
  • Fine-grain packet contents
  • Addresses, port numbers, TCP flags, URLs, etc.
  • Disadvantages overhead
  • Hard to keep up with high-speed links
  • Often requires a separate monitoring device

6
Collecting Traffic Data Flow Statistics
  • Flow monitoring (e.g., Cisco Netflow)
  • Statistics about groups of related packets (e.g.,
    same IP/TCP headers and close in time)
  • Recording header information, counts, and time
  • Advantages detail with less overhead
  • Almost as good as packet monitoring, except no
    fine-grain timing information or packet contents
  • Often implemented directly on the interface card
  • Disadvantages trade-off detail and overhead
  • Less detail than packet monitoring
  • Less ubiquitous than SNMP statistics

7
Using the Traffic Data in Network Operations
  • SNMP byte/packet counts everywhere
  • Tracking link utilizations and detecting
    anomalies
  • Generating bills for traffic on customer links
  • Inference of the offered load (i.e., traffic
    matrix)
  • Packet monitoring selected locations
  • Analyzing the small time-scale behavior of
    traffic
  • Troubleshooting specific problems on demand
  • Flow monitoring selective, e.g,. network edge
  • Tracking the application mix
  • Direct computation of the traffic matrix
  • Input to denial-of-service attack detection

8
Network Topology
9
IP Topology
  • Topology information
  • Routers
  • Links, and their capacities
  • Internal links inside the AS
  • Edge links connecting to neighboring domains
  • Ways to learn the topology
  • Inventory database
  • SNMP polling/traps
  • Traceroute
  • Route monitoring
  • Router configuration data

10
Below IP
  • Layer-2 paths
  • ATM virtual circuits
  • Frame Relay virtual circuits
  • Mapping to lower layers
  • Specific fibers
  • Shared optical amplifiers
  • Shared conduits
  • Physical length (propagation delay)
  • Information not visible to IP
  • Stored in an inventory database
  • Not necessarily generated/updated automatically

11
Intradomain Monitoring OSPF Protocol
  • Link-state protocol
  • Routers flood Link State Advertisements (LSAs)
  • Routers compute shortest paths based on weights
  • Routers identify next-hop to reach other routers

2
1
3
1
3
2
1
5
4
3
12
Intradomain Route Monitoring
  • Construct continuous view of topology
  • Detect when equipment goes up or down
  • Input to traffic-engineering and planning tools
  • Detect routing anomalies
  • Identify failures, LSA storms, and route flaps
  • Verify that LSA load matches expectations
  • Flag strange weight settings as misconfigurations
  • Analyze convergence delay
  • Monitor LSAs in multiple locations with go
  • Compare the times when LSAs arrive
  • Detect router implementation mistakes

13
Passive Collection of LSAs
  • OSPF is a flooding protocol
  • Every LSA sent on every participating link
  • Very helpful for simplifying the monitor
  • Can participate in the protocol
  • Shared media (e.g., Ethernet)
  • Join multicast group and listen to LSAs
  • Point-to-point links
  • Establish an adjacency with a router
  • or passively monitor packets on a link
  • Tap a link and capture the OSPF packets

14
Reducing the Volume of Information
  • Prioritizing the messages
  • Router failure over router recovery
  • Link failure or weight change over a refresh
  • Informational messages about weight settings
  • Grouping related messages
  • Link failure group messages for the two ends
  • Router failure group the affected links
  • Common failure group links failing close in time

15
Anomalies Found in the Shaikh04 paper
  • Intermittent hardware problem
  • Router periodically losing OSPF adjacencies
  • Risk of network partition if 2nd failure occurred
  • External link flaps
  • Congestion on edge link causing lost messages
  • Lost adjacency leading to flapping routes
  • Configuration errors
  • Two routers assigned the same IP address
  • Inefficient config leading to duplicate LSAs
  • Vendor implementation bug
  • More frequent refreshing of LSAs than specified

16
Interdomain Route Monitoring
17
Motivation for BGP Monitoring
  • Visibility into external destinations
  • What neighboring ASes are telling you
  • How you are reaching external destinations
  • Detecting anomalies
  • Increases in number of destination prefixes
  • Lost reachability to some destinations
  • Route hijacking
  • Instability of the routes
  • Input to traffic-engineering tools
  • Knowing the current routes in the network
  • Workload for testing routers
  • Realistic message traces to play back to routers

18
BGP Monitoring A Wish List
  • Ideally knowing what the router knows
  • All externally-learned routes
  • Before policy has modified the attributes
  • Before a single best route is picked
  • How to achieve this
  • Special monitoring session on routers that tells
    everything they have learned
  • Packet monitoring on all links with BGP sessions
  • If you cant do that, you could always do
  • Periodic dumps of routing tables
  • BGP session to learn best route from router

19
Using Routers to Monitor BGP
Establish a passive BGP session from a
workstation running BGP software
Talk to operational routers using SNMP or telnet
at command line
eBGP or iBGP
() BGP table dumps do not burden
operational routers (-) Receives only best
routes from BGP neighbor () Update
dynamics captured () not restricted to
interfaces provided by vendors
(-) BGP table dumps are expensive () Table
dumps show all alternate routes (-) Update
dynamics lost (-) restricted to interfaces
provided by vendors
20
Collect BGP Data From Many Routers
Seattle
Cambridge
Chicago
Detroit
New York
Kansas City
Philadelphia
Denver
San Francisco
St. Louis
Washington, D.C.
2
Los Angeles
Dallas
Atlanta
San Diego
Phoenix
Austin
Orlando
Houston
Route Monitor
BGP is not a flooding protocol
21
Detecting Important Routing Changes
  • Large volume of BGP updates messages
  • Around 2 million/day, and very bursty
  • Too much for an operator to manage
  • Identify important anomalies
  • Lost reachability
  • Persistent flapping
  • Large traffic shifts
  • Not the same as root-cause analysis
  • Identify changes and their effects
  • Focus on mitigation, rather than diagnosis
  • Diagnose causes if they occur in/near the AS

22
Challenge 1 Excess Update Messages
  • A single routing change
  • Leads to multiple update messages
  • Affects routing decision at multiple routers

Persistent Flapping Prefixes
Group updates for a prefix with inter-arrival lt
70 seconds, and flag prefixes with changes
lasting gt 10 minutes.
23
Determine Event Timeout
Cumulative distribution of BGP update
inter-arrival time
BGP beacon
(70, 98)
24
Event Duration Persistent Flapping
Complementary cumulative distribution of event
duration
(600, 0.1)
25
Detecting Persistent Flapping
  • Significant persistent flapping
  • 15.2 of all BGP update messages
  • though a small number of destination prefixes
  • Surprising, especially since flap dampening is
    used
  • Types of persistent flapping
  • Conservative flap-damping parameters (78.6)
  • Protocol oscillations, e.g., MED oscillation
    (18.3)
  • Unstable interface or BGP session (3.0)

26
Example Unstable eBGP Session
Peer
ATT
p
Customer
  • Flap damping parameters is session-based
  • Damping not implemented for iBGP sessions

27
Challenge 2 Identify Important Events
  • Major concerns of network operators
  • Changes in reachability
  • Heavy load of routing messages on the routers
  • Flow of the traffic through the network

Classify events by type of impact it has on the
network
28
Event Category No Disruption
p
AS2
AS1
No Traffic Shift
ATT
No Disruption each of the border routers has
no traffic shift
29
Event Category Internal Disruption
p
AS2
AS1
Internal Disruption all of the traffic shifts
are internal traffic shift
ATT
Internal Traffic Shift
30
Event Type Single External Disruption
p
AS2
AS1
external Traffic Shift
ATT
Single External Disruption traffic at one exit
point shifts to other exit points
31
Statistics on Event Classification
32
Challenge 3 Multiple Destinations
  • A single routing change
  • Affects multiple destination prefixes

Group events of same type that occur close in time
33
Main Causes of Large Clusters
  • External BGP session resets
  • Failure/recovery of external BGP session
  • E.g., session to another large tier-1 ISP
  • Caused single external disruption events
  • Validated by looking at syslog reports on routers
  • Hot-potato routing changes
  • Failure/recovery of an intradomain link
  • E.g., leads to changes in IGP path costs
  • Caused internal disruption events
  • Validated by looking at OSPF measurements

34
Challenge 4 Popularity of Destinations
  • Impact of event on traffic
  • Depends on the popularity of the destinations

Netflow Data
Weight the group of destinations by the traffic
volume
35
Traffic Impact Prediction
  • Traffic weight
  • Per-prefix measurements from Netflow
  • 10 prefixes accounts for 90 of traffic
  • Traffic weight of a cluster
  • The sum of traffic weight of the prefixes
  • Flag clusters with heavy traffic
  • A few large clusters have large traffic weight
  • Mostly session resets and hot-potato changes

36
Conclusions
  • Network troubleshooting from the inside
  • Traffic, topology, and routing data
  • Easier to understand whats going on
  • though still challenging to collect/analyze
    data
  • Traffic measurement
  • SNMP, packet monitoring, and flow monitoring
  • Routing monitors
  • Track network state and identify anomalies
  • Intradomain monitor capturing LSAs
  • BGP monitor capturing BGP updates

37
Next Time BGP Routing Table Size
  • Three papers
  • On characterizing BGP routing table growth
  • An empirical study of router response to large
    BGP routing table load
  • A framework for interdomain route aggregation
  • Review only of the first paper
  • Summary
  • Why accept
  • Why reject
  • Avenues for future work
  • Optional
  • Vanevar Bush on As We May Think (1945)
Write a Comment
User Comments (0)
About PowerShow.com