EndtoEnd Routing Behavior in The Internet - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

EndtoEnd Routing Behavior in The Internet

Description:

Argue the route set is large enough to plausibly represent Internet ... Conclude observation made less than an hour will not completely miss a routing change ... – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 46
Provided by: mnil
Category:

less

Transcript and Presenter's Notes

Title: EndtoEnd Routing Behavior in The Internet


1
End-to-End Routing Behavior in The Internet
  • By Vern Paxson 1997
  • ECE697A Nov. 2002
  • Prof. Lixin Gao
  • Presenter Teng Fei

2
Outline
  • Introduction
  • Methodology
  • Raw Data
  • Routing Pathologies
  • End-to-End Routing Stability
  • Asymmetry
  • Summary

3
Objective
  • Study large-scale routing behavior over the
    Internet
  • Find out what sort of Pathologies and failures on
    the Internet routing
  • Study routing stability
  • Study routing symmetry

4
Framework
  • Measure a large sample of Internet routes between
    geographically diverse hosts
  • Argue the route set is large enough to plausibly
    represent Internet routing behavior
  • Gain insight into routing behavior change over
    time

5
One distinction
  • Routing protocols
  • Mechanisms for disseminating routing information
    and forwarding traffic
  • Routing traffic
  • In practice how the routing algorithm perform
  • Routing protocols have been heavily studied, but
    routing behavior has not.

6
Related Research
  • Network Routing has been studied over 20 years
  • A number of books have been written
  • Discussion over ARPANET, EGP, BGP, OSPF, IS-IS,
    Multicast, high speed network routing etc.

7
Other Research
  • Most protocol studies are qualitative
  • Of the measurement studies only Chinoys and
    Labovitz et al. are devoted to characterized
    large scale Internet routing behavior

8
Chinoys Research
  • Routers sending updates periodically sent out
    regardless of connectivity change
  • Most routing changes occur at the edge of the
    network
  • Network outage durations span a large range of
    time

9
Labovitz et. al.s Research
  • Pathological routing updates are common
  • Total volume of BGP routing updates are 1-2
    orders of magnitude higher than necessary
  • Routing instability and network load are related
  • Excluding pathological updates, 80 of routes are
    highly stabilized.

10
Experimental Apparatus
  • Use traceroute to do end-to-end measurement
  • Recruiting 37 Internet sites to run network
    probe daemon (NPD), controlled by npdcontrol
    at UC Berkeley
  • Measure Internet path between NPDs
  • O(N2) scaling means fairly modest framework can
    observe a wide range of Internet behavior.

11
Two Measurement Set
  • First set D1 Nov. 8 Dec. 24, 1994 27 sites
  • Mean interval between measurements are 1-2 days
  • Interval too large to resolve a number of routing
    stability questions
  • Second set D2 Nov. 3 Dec. 21, 1995, 33 sites
  • 60 with mean interval of 2 hours, 40 with an
    mean interval of about 2.75 days.

12
Exponential Sampling
  • Time intervals between consecutive measurement of
    the same path were exponentially distributed.
  • Conform to additive random sampling
  • Measurement times form a Poisson process
  • Means we can compare two data sets even the
    sampling rates are different

13
Is It Good Observation?
  • As July, 1995, 6.6 Million Internet hosts
    estimated
  • As April, 1995, 50,000 networks known to the
    NSFNET
  • Not plausibly representative, but gives a
    considerably richer cross-section of the Internet
    routing behavior

14
Participating Sites
15
Links Traversed
16
Routing Pathologies - Loops
  • 10 loops in D1 (0.13), and 50 loops in D2
    (0.16)
  • Duration
  • Short loop under 3 hours
  • Long loop more than 0.5 day
  • Two long-live loop 14-17 hr, and 16-32hr Shows
    lack of good tools to diagnosing network problems

17
- Loops
  • Geographical and temporal correlation
  • Loops are clustered
  • Two AlterNet in DC and separate Sprint loop at
    MAE-East
  • Suggesting loops may affect nearby routers
  • Cross AS loops exist, but not like to be BGPs
    fault

18
- Erroneous routing
  • One route (connix -gt London via Israel)
  • Cant assume where the packet might travel

19
- Connectivity altered
  • 0.16 in D1, 0.44 in D2
  • Some accompanied by outages
  • Recovery is bimodal
  • Some are very quick (100s ms to seconds)
  • Maybe new routes are being announced
  • Some are in minutes
  • Existing routes are lost

20
Fluttering
  • rapid-oscillating routing

21
- Fluttering
  • Pro
  • Balance network load
  • Con
  • Unstable network path
  • If fluttering only happen in one direction, then
    the routes are asymmetric
  • Estimating path characters like RRT becomes
    difficult
  • If two routes have different propagation time,
    then TCP performs worse

22
- Infrastructure Failure
  • Classified as is when traceroute gets host
    unreacheable
  • Aggregated when the reporting router is remote
    from the destination
  • More network might lost connectivity
  • D1 availability 99.8
  • D2 availability 99.5
  • Might overlook the availability

23
- Too Many Hops
  • 30 hops enough for all D1, all but 6 in D2
  • Mean path length
  • D1 15.6
  • D2 16.2
  • Median
  • Both are 16
  • Sometimes assume hop counts equates to
    geographical distance, but remarkable exceptions

24
- Temporary outages
  • Consecutive traceroute packets are lost
  • In D1(D2), 55(43) no loss, 44(55) has 1-5
    losses, 0.96(2.2) has 6 or more losses

25
Time-Of-Day patterns
  • Study temporary outages and infrastructure
    failures in D2
  • Try to find correlation with heavy traffic
  • Use mean of the time-of-day at source and
    destination
  • 0.4 outage during 100 200 am8.0 during
    during300pm 400 pm
  • 9.3 failure during 300pm 400 pm1.2 during
    900 am 1000am
  • 7.6 during 600am 700 am

26
Representative Pathologies
  • For temporary outage 30s or more, assign the
    outage to the router directly upstream from the
    first completely missing hop
  • AS-3561(MCI-RESTON) 25 AS-1800(ICM-AtlanticSpri
    nt) 16 AS-1239(Sprint) 9
  • Correspond to the 3 most heavy weight AS

27
Pathologies Summary
28
Explanation
  • Hard to tell the reason or the significance of
    the trend
  • 1995 may be an atypical year, but might not be
    the reason (only 1/3 of D1)
  • Require to collect more data

29
Routing Stability
  • Two types of routing stability
  • Prevalence network predictability
  • Persistence Given a route, how long before it
    changes
  • Confine analysis to D2, remove pathological
    observations, merge tightly coupled routers
  • Reduce the routes into 3 different granularity
  • Host any change
  • City - 57 major change
  • AS 36

30
Routing Prevalence
31
Routing Prevalence
  • Wide range of prevalence, especially for host
    granularity
  • However, the median at host granularity is 82
  • In general, Internet paths are strongly dominated
    by a single route

32
Routing Prevalence
  • The median at city granularity is 97, and 100
    at AS granularity
  • Aggregating all virtual paths with source or
    destination shows considerable site-to-site
    variation
  • 50 at ucl, just under 90 at unij
  • In general, Internet paths are very strongly
    dominated by a single route, but also find
    significant site-to-site variation

33
Routing Persistence
  • Question 1, if routing alternates on short time
    scales?
  • 54 measurements less than 60s apart all have no
    change
  • 1302 measurements less than 10 min apart have 25
    route change
  • Analysis shows fast oscillation related to
    particular source and destinations.
  • Conclude observation made less than an hour will
    not completely miss a routing change

34
Medium-Scale route alternation
  • 10 out of 1517 observation spanning around an
    hour show route change twice
  • Oscillation are also related with special
    destinations and sources
  • The rest routes change slowly
  • Per 12 hours or 1.5 days

35
Large-scale Route Alternation
  • For measurement for 6 hours apart or less, 75 out
    of 10660 triple-measurement-routes changed twice
  • After removing outliers, the 11174 measurements
    shows a maximum transition rate for about once
    every two days, and median 1 per 4 days

36
Duration of Long-Lived Routes
37
Duration of Long-Lived Routes
  • Half of long-lived routes persisted for under a
    week
  • The other half accounts for 90 of total
    persistence
  • At any time, if we are not observing the
    outliers, the change of the route lasts more than
    a week is about 90

38
Summary of Routing Persistence
39
Routing Symmetry
  • Confine to analyze major symmetries
  • Past study shows two opposite directions of path
    do have considerably different latency variations
  • Complicates network trouble shooting

40
Source of Routing Asymmetry
  • Asymmetric link cost along two directions
  • Configuration errors and inconsistency
  • Economics of commercial Internet
  • Hot potato, cold potato

41
Analysis of Routing Symmetry
  • 49 of the measurement observed asymmetry visited
    at least one different city
  • 30 of the measurements observed differed ASes in
    two directions
  • 20 showed more than 2 differences at city
    granularity

42
Summary
  • The likelihood of major routing pathology rose
    from 1.5 to 3.3 between end of 1994 and end of
    1995
  • Internet routes are heavily dominated by a single
    prevalent route, but the period over which route
    persists shows wide variation
  • At the end of 1995, half of the time routes are
    not symmetrical, 30 has at least one different AS

43
Summary
  • Repeated find that different sites encounters
    very different routing characteristics
  • There is no typical Internet site or Internet
    path
  • The study help us to learn how the Internet
    actually works, from end-points view

44
Shortcoming of Design
  • It does not uncover the reason of the routing
    difficulties
  • Because end-to-end measurements are hard to
    uncover whats happening inside the network
  • Can just ask the network administrators, but may
    not scale well
  • Use batch measurement rather than a single
    request
  • Use more sophisticated tool than traceroute

45
The End
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com