A Measurement Study on the Impact of Routing Events on EndtoEnd Internet Path Performance - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

A Measurement Study on the Impact of Routing Events on EndtoEnd Internet Path Performance

Description:

50% of LB are caused by Failure Events for failover-1. 52% for failover-2 ... Characterizing Connectivity of Destination Prefixes. Representativeness of the experiment ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 40
Provided by: zzbng
Category:

less

Transcript and Presenter's Notes

Title: A Measurement Study on the Impact of Routing Events on EndtoEnd Internet Path Performance


1
A Measurement Study on the Impact of Routing
Events onEnd-to-End Internet Path Performance
  • by zzb

2
Motivation
  • How to improve network performance?
  • End-to-end Internet path performance degradation
    routing dynamics.
  • The root cause of the correlation is unknown

3
Questions
  • Routing changes
  • Failover events
  • Recovery events
  • Factors
  • Topological properties,
  • Routing policies
  • IBGP configurations

4
Content
  • Background
  • Experiment Methodology
  • Failover events
  • Recovery events
  • Representativeness of the experiment
  • Conclusions recommendations

5
Background
  • MRAI
  • 30 seconds for eBGP sessions .
  • 5 seconds for iBGP sessions.
  • Non-valley
  • Customers do not transit traffic from one
    provider to another
  • Peers do not transit traffic from one peer to
    another.
  • prefer customer
  • routes received from a network providers
    customers are always preferred over those
    received from its peers or any other routes.

6
Experiment Methodology
  • Beacon controlled routing changes

7
Experiment Methodology
  • Probing architecture
  • Probing methods
  • UDP packet
  • 50ms interval
  • Ping
  • Traceroute
  • Every hour for 20 minutes

8
Experiment Methodology
  • Data plane performance metrics
  • Packet loss
  • Bursty loss size (consecutive)
  • Packet delay
  • RTT
  • Out of order
  • Num of reordering
  • Reordering offset (buffer size)

9
Experiment Methodology
  • Identifying Routing Failures
  • Reasons of Loss
  • Routing dynamics
  • Route loss
  • Forwarding loops
  • Congestion
  • Ideal way to identify
  • By traceroute and ping (ICMP)
  • Route loss - gt destination is unreachable
  • Loops -gt TTL exceeded

10
Failover Events
  • Probe host 37 PlanetLab sites
  • 14 choose ISP1
  • 23 choose ISP2
  • Two failover Events
  • Failover-1
  • Failover-2
  • Entire month of July 2005

11
Data Plane Performance
  • Majority of Loss Bursts Occur at 0

12
Data Plane Performance
  • Three intervals
  • Significant impact on loss bust length RTT

13
Data Plane Performance
  • Num of reordering is small for all
  • Reordering offset is impacted

14
Root Causes of Loss Bursts
  • 50 of LB are caused by Failure Events for
    failover-1
  • 52 for failover-2
  • Length of Verified LBs longer than Unverified

15
Root Causes of Loss Bursts
  • Verified last longer than Unverified
  • LBs Caused by Forwarding loops last longer

16
A strange problem
  • More than half of the routing failures occur
    within ISP1. On the contrary, only a small
    portion of the routing failures occur within ISP2
    upon withdrawal of the preferred route via ISP2.
  • over 80 of all the failover events have routing
    failures.
  • We also observe that the occurrence of withdrawal
    messages is right after the occurrence of
    failover events, and the withdrawal message is
    quickly replaced by an announcement.

17
How Rooting Failures Occur
  • non-valley
    policy
  • MRAI of ISP1 is 5s
  • MRAI of ISP2 is small
  • Interval against
  • non-valley policy
  • can up to 30s

18
Multiple Loss Bursts
  • 75 host -gtless than two
  • A host up to 6
  • First two ? majority
  • 57 of first 40 of second
  • are caused by routing failure

19
Location of Routing Failures
  • Via ICMP msg DNS
  • The fist loss burst
  • The second loss burst
  • 55 in other tier-1 ASes during failover-1
  • 73 in other tier-1 ASes during failover-2
  • Routing failures are propagated

20
Location of Routing Failures
  • Via BGP updates
  • In a tier-1 AS 134 withdrawals from 4 monitored
    routers
  • In other ASes 210 withdrawals from 7 ASes which
    dont include ISP1 and ISP2
  • Categories of probe hosts

21
Methodology Evaluation
  • Can we correlate ICMP messages with loss bursts
  • Ping the Beacon when there is no Beacon event
    0.6 are not caused by Beacon events
  • ICMP blocking in some ISPs
  • 53 of 10 tier-1 ASes
  • 52 of ISP1 and 95 of ISP2

22
Recovery Events
  • Probe host 37 PlanetLab sites
  • 12 choose restored path via ISP1
  • 25 choose restored path via ISP2
  • Two recovery Events
  • Recovery-1
  • Recovery-2

23
Data Plane Performance
  • For (a) dont observe of a large loss burst
  • For (b) a large loss burst last for 100s
  • For all 29 hosts experience packet loss

24
Data Plane Performance
  • Loss burst length no difference
  • But the longest ones can up to 180/140, which
    must be cause by routing failures
    (counter-intuitive)

25
Data Plane Performance
  • Similar to Failover events
  • Recovery events have impact on RTT

26
Data Plane Performance
  • Reordering offset(ISP1)
  • Failover-1
    Recovery-1

27
Data Plane Performance
  • Num of Reordering (ISP1)
  • Failover-1
    Recovery-1

28
Data Plane Performance
  • Conclusion
  • Recovery dont contribute to Reordering
  • Recovery has impact on RTT
  • Recovery has the most impact on Loss Burst Length

29
Root Causes of Loss Bursts
  • Recovery indeed causes Routing Failure
  • May be more

30
Root Causes of Loss Bursts
  • Evaluate from BGP updates of ISP2
  • 12 withdrawals among 724 recovery events
  • Little difference between withdrawal and
    announcement
  • Show that ISP2 temporarily lose their routes to
    the beacon

31
Root Causes of Loss Bursts
  • Duration of loss burst
  • Loss burst caused by recovery events lasts
    shorter

32
How Routing Failures occur
  • Recovery-1
  • R3---R1 has to wait due to MRAI
  • R3---R2 timer has just expired
  • R2 will send a message to R1to poison the
    previous route
  • A will experience packet loss while B will not
  • But if R2-R3 are logical link.

33
Multiple Loss Bursts Caused by Routing Failures
  • 16 of the first loss
  • 8 of the second
  • More than half of
  • the second failures
  • are forwarding loops
  • Why withdrawal...propagate.explore.loop

34
Location of Routing Failures
  • The same reason as failover events

35
Representativeness of the experiment
  • Characterizing Connectivity of Destination
    Prefixes

36
Representativeness of the experiment
  • Routing Failures During Failover Events
  • Multi-homed via a single link
  • Prefer customer Policy
  • Route from other peers or providers has lower
    preference
  • Single-homed via multiple links
  • Can avoid some failures
  • Failures might still occur
  • Hot-potato
  • Routing Failures During Recovery Events

37
Conclusions
  • Routing changes can cause
  • End-to-end loss (loss burst)
  • Multiple loss burst
  • RTT
  • Reordering
  • Root cause is
  • Routing policy iBGP configuration
  • Topology is import (I think)
  • Simply adding hysical connetivity does not
    necessarily minimize the impact of routing
    changes on end-to-end path performance

38
Recommendations
  • Reevaluate the mechanism to which MRAI timer is
    applied and the value of the timer.
  • store not only the best path but also the second
    best one at each router

39
What can we learn
  • How to analyze a problem
  • How to experiment
  • Useful methods
  • Hard-working
Write a Comment
User Comments (0)
About PowerShow.com