Internet Routing COS 598A Today: Detecting Anomalies Inside an AS - PowerPoint PPT Presentation

About This Presentation

Title:

Internet Routing COS 598A Today: Detecting Anomalies Inside an AS

Description:

Multiple products for polling and analyzing data. Disadvantages: dumb ... Around 2 million/day, and very bursty. Too much for an operator to manage ... – PowerPoint PPT presentation

Number of Views:56

Avg rating:3.0/5.0

Slides: 38

Provided by: albertgr

Learn more at: https://www.cs.princeton.edu

Category:

more less

Transcript and Presenter's Notes

Title: Internet Routing COS 598A Today: Detecting Anomalies Inside an AS

1
Internet Routing (COS 598A)Today Detecting
Anomalies Inside an AS

Jennifer Rexford
http//www.cs.princeton.edu/jrex/teaching/spring2
005
Tuesdays/Thursdays 1100am-1220pm

2
Outline

Traffic
SNMP link statistics
Packet and flow monitoring
Network topology
IP routers and links
Fault data, layer-2 topology, and configuration
Intradomain route monitoring
Interdomain routes
BGP route monitoring
Analysis of BGP update data
Conclusions

3
Why is Traffic Measurement Important?

Billing the customer
Measure usage on links to/from customers
Applying billing model to generate a bill
Traffic engineering and capacity planning
Measure the traffic matrix (i.e., offered load)
Tune routing protocol or add new capacity
Denial-of-service attack detection
Identify anomalies in the traffic
Configure routers to block the offending traffic
Analyze application-level issues
Evaluate benefits of deploying a Web caching
proxy
Quantify fraction of traffic that is P2P file
sharing

4
Collecting Traffic Data SNMP

Simple Network Management Protocol
Standard Management Information Base (MIB)
Protocol for querying the MIBs
Advantage ubiquitous
Supported on all networking equipment
Multiple products for polling and analyzing data
Disadvantages dumb
Coarse granularity of the measurement data
E.g., number of byte/packet per interface per 5
minutes
Cannot express complex queries on the data
Unreliable delivery of the data using UDP

5
Collecting Traffic Data Packet Monitoring

Packet monitoring
Passively collecting IP packets on a link
Recording IP, TCP/UDP, or application-layer
traces
Advantages details
Fine-grain timing information
E.g., can analyze the burstiness of the traffic
Fine-grain packet contents
Addresses, port numbers, TCP flags, URLs, etc.
Disadvantages overhead
Hard to keep up with high-speed links
Often requires a separate monitoring device

6
Collecting Traffic Data Flow Statistics

Flow monitoring (e.g., Cisco Netflow)
Statistics about groups of related packets (e.g.,
same IP/TCP headers and close in time)
Recording header information, counts, and time
Advantages detail with less overhead
Almost as good as packet monitoring, except no
fine-grain timing information or packet contents
Often implemented directly on the interface card
Disadvantages trade-off detail and overhead
Less detail than packet monitoring
Less ubiquitous than SNMP statistics

7
Using the Traffic Data in Network Operations

SNMP byte/packet counts everywhere
Tracking link utilizations and detecting
anomalies
Generating bills for traffic on customer links
Inference of the offered load (i.e., traffic
matrix)
Packet monitoring selected locations
Analyzing the small time-scale behavior of
traffic
Troubleshooting specific problems on demand
Flow monitoring selective, e.g,. network edge
Tracking the application mix
Direct computation of the traffic matrix
Input to denial-of-service attack detection

8
Network Topology
9
IP Topology

Topology information
Routers
Links, and their capacities
Internal links inside the AS
Edge links connecting to neighboring domains
Ways to learn the topology
Inventory database
SNMP polling/traps
Traceroute
Route monitoring
Router configuration data

10
Below IP

Layer-2 paths
ATM virtual circuits
Frame Relay virtual circuits
Mapping to lower layers
Specific fibers
Shared optical amplifiers
Shared conduits
Physical length (propagation delay)
Information not visible to IP
Stored in an inventory database
Not necessarily generated/updated automatically

11
Intradomain Monitoring OSPF Protocol

Link-state protocol
Routers flood Link State Advertisements (LSAs)
Routers compute shortest paths based on weights
Routers identify next-hop to reach other routers

2
1
3
1
3
2
1
5
4
3
12
Intradomain Route Monitoring

Construct continuous view of topology
Detect when equipment goes up or down
Input to traffic-engineering and planning tools
Detect routing anomalies
Identify failures, LSA storms, and route flaps
Verify that LSA load matches expectations
Flag strange weight settings as misconfigurations
Analyze convergence delay
Monitor LSAs in multiple locations with go
Compare the times when LSAs arrive
Detect router implementation mistakes

13
Passive Collection of LSAs

OSPF is a flooding protocol
Every LSA sent on every participating link
Very helpful for simplifying the monitor
Can participate in the protocol
Shared media (e.g., Ethernet)
Join multicast group and listen to LSAs
Point-to-point links
Establish an adjacency with a router
or passively monitor packets on a link
Tap a link and capture the OSPF packets

14
Reducing the Volume of Information

Prioritizing the messages
Router failure over router recovery
Link failure or weight change over a refresh
Informational messages about weight settings
Grouping related messages
Link failure group messages for the two ends
Router failure group the affected links
Common failure group links failing close in time

15
Anomalies Found in the Shaikh04 paper

Intermittent hardware problem
Router periodically losing OSPF adjacencies
Risk of network partition if 2nd failure occurred
External link flaps
Congestion on edge link causing lost messages
Lost adjacency leading to flapping routes
Configuration errors
Two routers assigned the same IP address
Inefficient config leading to duplicate LSAs
Vendor implementation bug
More frequent refreshing of LSAs than specified

16
Interdomain Route Monitoring
17
Motivation for BGP Monitoring

Visibility into external destinations
What neighboring ASes are telling you
How you are reaching external destinations
Detecting anomalies
Increases in number of destination prefixes
Lost reachability to some destinations
Route hijacking
Instability of the routes
Input to traffic-engineering tools
Knowing the current routes in the network
Workload for testing routers
Realistic message traces to play back to routers

18
BGP Monitoring A Wish List

Ideally knowing what the router knows
All externally-learned routes
Before policy has modified the attributes
Before a single best route is picked
How to achieve this
Special monitoring session on routers that tells
everything they have learned
Packet monitoring on all links with BGP sessions
If you cant do that, you could always do
Periodic dumps of routing tables
BGP session to learn best route from router

19
Using Routers to Monitor BGP
Establish a passive BGP session from a
workstation running BGP software
Talk to operational routers using SNMP or telnet
at command line
eBGP or iBGP
() BGP table dumps do not burden
operational routers (-) Receives only best
routes from BGP neighbor () Update
dynamics captured () not restricted to
interfaces provided by vendors
(-) BGP table dumps are expensive () Table
dumps show all alternate routes (-) Update
dynamics lost (-) restricted to interfaces
provided by vendors
20
Collect BGP Data From Many Routers
Seattle
Cambridge
Chicago
Detroit
New York
Kansas City
Philadelphia
Denver
San Francisco
St. Louis
Washington, D.C.
2
Los Angeles
Dallas
Atlanta
San Diego
Phoenix
Austin
Orlando
Houston
Route Monitor
BGP is not a flooding protocol
21
Detecting Important Routing Changes

Large volume of BGP updates messages
Around 2 million/day, and very bursty
Too much for an operator to manage
Identify important anomalies
Lost reachability
Persistent flapping
Large traffic shifts
Not the same as root-cause analysis
Identify changes and their effects
Focus on mitigation, rather than diagnosis
Diagnose causes if they occur in/near the AS

22
Challenge 1 Excess Update Messages

A single routing change
Leads to multiple update messages
Affects routing decision at multiple routers

Persistent Flapping Prefixes
Group updates for a prefix with inter-arrival lt
70 seconds, and flag prefixes with changes
lasting gt 10 minutes.
23
Determine Event Timeout
Cumulative distribution of BGP update
inter-arrival time
BGP beacon
(70, 98)
24
Event Duration Persistent Flapping
Complementary cumulative distribution of event
duration
(600, 0.1)
25
Detecting Persistent Flapping

Significant persistent flapping
15.2 of all BGP update messages
though a small number of destination prefixes
Surprising, especially since flap dampening is
used
Types of persistent flapping
Conservative flap-damping parameters (78.6)
Protocol oscillations, e.g., MED oscillation
(18.3)
Unstable interface or BGP session (3.0)

26
Example Unstable eBGP Session
Peer
ATT
p
Customer

Flap damping parameters is session-based
Damping not implemented for iBGP sessions

27
Challenge 2 Identify Important Events

Major concerns of network operators
Changes in reachability
Heavy load of routing messages on the routers
Flow of the traffic through the network

Classify events by type of impact it has on the
network
28
Event Category No Disruption
p
AS2
AS1
No Traffic Shift
ATT
No Disruption each of the border routers has
no traffic shift
29
Event Category Internal Disruption
p
AS2
AS1
Internal Disruption all of the traffic shifts
are internal traffic shift
ATT
Internal Traffic Shift
30
Event Type Single External Disruption
p
AS2
AS1
external Traffic Shift
ATT
Single External Disruption traffic at one exit
point shifts to other exit points
31
Statistics on Event Classification
32
Challenge 3 Multiple Destinations