Routing Measurements: Three Case Studies - PowerPoint PPT Presentation

About This Presentation

Title:

Routing Measurements: Three Case Studies

Description:

OSPF is a flooding protocol. Every link-state advertisements sent on every link ... Atlanta. St. Louis. San. Francisco. Denver. Cambridge. Washington, D.C. ... – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 42

Provided by: albertgr

Learn more at: https://www.cs.princeton.edu

Category:

more less

Transcript and Presenter's Notes

Title: Routing Measurements: Three Case Studies

1
Routing MeasurementsThree Case Studies

Jennifer Rexford

2
Motivations for Measuring the Routing System

Characterizing the Internet
Internet path properties
Demands on Internet routers
Routing convergence
Improving Internet health
Protocol design problems
Protocol implementation problems
Configuration errors or attacks
Operating a network
Detecting and diagnosing routing problems
Traffic shifts, routing attacks, flaky equipment,

3
Techniques for Measuring Internet Routing

Active probing
Inject probes along path through the data plane
E.g., using traceroute
Passive route monitoring
Capture control-plane messages between routers
E.g., using tcpdump or a software router
E.g., dumping the routing table on a router
Injecting network events
Cause failure/recovery at planned time and place
E.g., BGP route beacon, or planned maintenance

4
Challenges in Measuring Routing

Data vs. control plane
Understand relationship between routing protocol
messages and the impact on data traffic
Cause vs. effect
Identify the root cause for a change in the
forwarding path or control-plane messages
Visibility and representativeness
Collect routing data from many vantage points
Across many Autonomous Systems, or within
Large volume of data
Many end-to-end paths
Many prefixes and update measurements

5
Measurement Tools Traceroute

Traceroute tool exploits TTL-limited probes
Observation of the forwarding path
Useful, but introduces many challenges
Path changes
Non-participating nodes
Inaccurate, two-way measurements
Hard to map interfaces to routers and ASes

destination
source
Send packets with TTL1, 2, 3, and record
source of time exceeded message
6
Measurement Intradomain Route Monitoring

OSPF is a flooding protocol
Every link-state advertisements sent on every
link
Very helpful for simplifying the monitor
Can participate in the protocol
Shared media (e.g., Ethernet)
Join multicast group and listen to LSAs
Point-to-point links
Establish an adjacency with a router
or passively monitor packets on a link
Tap a link and capture the OSPF packets

7
Measurement Interdomain Route Monitoring
Establish a passive BGP session from a
workstation running BGP software
Talk to operational routers using SNMP or telnet
at command line
BGP session over TCP
() BGP table dumps do not burden
operational routers (-) Receives only best
routes from BGP neighbor () Update
dynamics captured () not restricted to
interfaces provided by vendors
(-) BGP table dumps are expensive () Table
dumps show all alternate routes (-) Update
dynamics lost (-) restricted to interfaces
provided by vendors
8
Collect BGP Data From Many Routers
Seattle
Cambridge
Chicago
Detroit
New York
Kansas City
Philadelphia
Denver
San Francisco
St. Louis
Washington, D.C.
2
Los Angeles
Dallas
Atlanta
San Diego
Phoenix
Austin
Orlando
Houston
Route Monitor
BGP is not a flooding protocol
9
Two Kinds of BGP Monitoring Data

Wide-area, from many ASes
RouteViews or RIPE-NCC data
Pro available from many vantage points
Con often just one or two views per AS
Single AS, from many routers
Abilene and GEANT public repositories
Proprietary data at individual ISPs
Pro comprehensive view of a single AS
Con limited public examples, mostly research
nets

10
Measurement Injecting Events

Equipment failure/recovery
Unplug/reconnect the equipment ?
Packet filters that block all packets
Knowing when planned event will take place
Shutting down a routing-protocol adjacency
Injecting route announcements
Acquire some blocks of IP addresses
Acquire a routing-protocol adjacency to a router
Announce/withdraw routes on a schedule
Beacons http//psg.com/zmao/BGPBeacon.html

11
Two Papers for Today

Both early measurement studies
Initially appeared at SIGCOMM96 and 97
Both won the best student paper award ?
Early glimpses into the health of Internet
routing
Early wave of papers on Internet measurement
Differences in emphasis
Paxson96 end-to-end active probing to measure
the characteristics of the data plane
Labovitz97 passive monitoring of BGP update
messages from several ISPs to characterize
(in)stability of the interdomain routing system

12
Paxson Study Forwarding Loops

Forwarding loop
Packet returns to same router multiple times
May cause traceroute to show a loop
If loop lasted long enough
So many packets traverse the loopy path
Traceroute may reveal false loops
Path change that leads to a longer path
Causing later probe packets to hit same nodes
Heuristic solution
Require traceroute to return same path 3 times

13
Paxson Study Causes of Loops

Transient vs. persistent
Transient routing-protocol convergence
Persistent likely configuration problem
Challenges
Appropriate time boundary between the two?
What about flaky equipment going up and down?
Determining the cause of persistent loops?
Anecdote on recent study of persistent loops
Provider has static route for customer prefix
Customer has default route to the provider

14
Paxson Study Path Fluttering

Rapid changes between paths
Multiple paths between a pair of hosts
Load balancing policies inside the network
Packet-based load balancing
Round-robin or random
Multiple paths for packets in a single flow
Flow-based load balancing
Hash of some fields in the packet header
E.g., IP addresses, port numbers, etc.
To keep packets in a flow on one path

15
Paxson Study Routing Stability

Route prevalence
Likelihood of observing a particular route
Relatively easy to measure with sound sampling
Poisson arrivals see time averages (PASTA)
Most host pairs have a dominant route
Route persistence
How long a route endures before a change
Much harder to measure through active probes
Look for cases of multiple observations
Typical host pair has path persistence of a week

16
Paxson Study Route Asymmetry

Hot-potato routing

Other causes
Asymmetric link weights in intradomain routing
Cold-potato routing, where AS requests traffic
enter at particular place
Consequences
Lots of asymmetry
One-way delay is not necessarily half of the
round-trip time

Customer B
Provider B
multiple peering points
Early-exit routing
Provider A
Customer A
17
Labovitz Study Interdomain Routing

AS-level topology
Destinations are IP prefixes (e.g., 12.0.0.0/8)
Nodes are Autonomous Systems (ASes)
Links are connections business relationships

4
3
5
2
6
7
1
Client
Web server
18
Labovitz Study BGP Background

Extension of distance-vector routing
Support flexible routing policies
Avoid count-to-infinity problem
Key idea advertise the entire path
Distance vector send distance metric per dest d
Path vector send the entire path for each dest d

d path (2,1)
d path (1)
3
1
data traffic
data traffic
d
19
Labovitz Study BGP Background

BGP is an incremental protocol
In theory, no update messages in steady state
Two kinds of update messages
Announcement advertising a new route
Withdrawal withdrawing an old route
Study saw an alarming number of updates
At the time, Internet had around 45,000 prefixes
Routers were exchanging 3-6 million updates/day
Sometimes as high as 30 million in a day
Placing a very high load on the routers

20
Labovitz Study Classifying Update Messages

Analyze update messages
For each (prefix, peer) tuple
Classify the kinds of routing changes
Forwarding instability
WADiff explicit withdraw, replaced by alternate
AADiff implict withdraw, replaced by alternate
Pathological
WADup explicit withdraw, and then reanounced
AADup duplicate announcement
WWDup duplicate withdrawal

21
Labovitz Study Duplicate Withdrawals

Time-space trade-off in router implementation
Common system building technique
Trade one resource for another
Can have surprising side effects
The gory details
Ideally, you should not send a withdrawal if you
never sent a neighbor a corresponding
announcement
Requires remembering what update message you sent
to each neighbor
Easier to just send everyone a withdrawal when
your route goes away

22
Labovitz Study Practical Impact

Stateless BGP is compliant with the standard
But, it forces other routers to handle more load
So that you dont have to maintain state
Arguably very unfair, and bad for global Internet
One router vendor was largely at fault
Router vendor modified its implementation
ISPs then deployed the updated software

23
Labovitz Study Still Hard to Diagnose Problems

Despite having very detailed view into BGP
Some pathologies were very hard to diagnose
Possible causes
Flaky equipment
Synchronization of BGP timers
Interaction between BGP and intradomain routing
Policy oscillation
These topics were studied in follow-up studies
Example study of BGP data within a large ISP
http//www.cs.princeton.edu/jrex/papers/nsdi05-ji
an.pdf

24
ISP Study Detecting Important Routing Changes

Large volume of BGP updates messages
Around 2 million/day, and very bursty
Too much for an operator to manage
Identify important anomalies
Lost reachability
Persistent flapping
Large traffic shifts
Not the same as root-cause analysis
Identify changes and their effects
Focus on mitigation, rather than diagnosis
Diagnose causes if they occur in/near the AS

25
Challenge 1 Excess Update Messages

A single routing change
Leads to multiple update messages
Affects routing decision at multiple routers

Persistent Flapping Prefixes
Group updates for a prefix with inter-arrival lt
70 seconds, and flag prefixes with changes
lasting gt 10 minutes.
26
Determine Event Timeout
Cumulative distribution of BGP update
inter-arrival time
BGP beacon
(70, 98)
27
Event Duration Persistent Flapping
Complementary cumulative distribution of event
duration
(600, 0.1)
28
Detecting Persistent Flapping

Significant persistent flapping
15.2 of all BGP update messages
though a small number of destination prefixes
Surprising, especially since flap dampening is
used
Types of persistent flapping
Conservative flap-damping parameters (78.6)
Policy oscillations, e.g., MED oscillation
(18.3)
Unstable interface or BGP session (3.0)

29
Example Unstable eBGP Session
Peer
ATT
p
Customer
30
Challenge 2 Identify Important Events

Major concerns of network operators
Changes in reachability
Heavy load of routing messages on the routers
Flow of the traffic through the network

Classify events by type of impact it has on the
network
31
Event Category No Disruption
p
AS2
AS1
No Traffic Shift
ATT
No Disruption each of the border routers has
no traffic shift
32
Event Category Internal Disruption
p
AS2
AS1
Internal Disruption all of the traffic shifts
are internal traffic shift
ATT
Internal Traffic Shift
33
Event Type Single External Disruption
p
AS2
AS1
external Traffic Shift
ATT
Single External Disruption traffic at one exit
point shifts to other exit points
34
Statistics on Event Classification
Events Updates
No Disruption 50.3 48.6
Internal Disruption 15.6 3.4
Single External Disruption 20.7 7.9
Multiple External Disruption 7.4 18.2
Loss/Gain of Reachability 6.0 21.9
35
Challenge 3 Multiple Destinations