A Measurement Study on the Impact of Routing Events on EndtoEnd Internet Path Performance
Description:
50% of LB are caused by Failure Events for failover-1. 52% for failover-2 ... Characterizing Connectivity of Destination Prefixes. Representativeness of the experiment ... – PowerPoint PPT presentation
Title: A Measurement Study on the Impact of Routing Events on EndtoEnd Internet Path Performance
1 A Measurement Study on the Impact of Routing Events onEnd-to-End Internet Path Performance
by zzb
2 Motivation
How to improve network performance?
End-to-end Internet path performance degradation routing dynamics.
The root cause of the correlation is unknown
3 Questions
Routing changes
Failover events
Recovery events
Factors
Topological properties,
Routing policies
IBGP configurations
4 Content
Background
Experiment Methodology
Failover events
Recovery events
Representativeness of the experiment
Conclusions recommendations
5 Background
MRAI
30 seconds for eBGP sessions .
5 seconds for iBGP sessions.
Non-valley
Customers do not transit traffic from one provider to another
Peers do not transit traffic from one peer to another.
prefer customer
routes received from a network providers customers are always preferred over those received from its peers or any other routes.
6 Experiment Methodology
Beacon controlled routing changes
7 Experiment Methodology
Probing architecture
Probing methods
UDP packet
50ms interval
Ping
Traceroute
Every hour for 20 minutes
8 Experiment Methodology
Data plane performance metrics
Packet loss
Bursty loss size (consecutive)
Packet delay
RTT
Out of order
Num of reordering
Reordering offset (buffer size)
9 Experiment Methodology
Identifying Routing Failures
Reasons of Loss
Routing dynamics
Route loss
Forwarding loops
Congestion
Ideal way to identify
By traceroute and ping (ICMP)
Route loss - gt destination is unreachable
Loops -gt TTL exceeded
10 Failover Events
Probe host 37 PlanetLab sites
14 choose ISP1
23 choose ISP2
Two failover Events
Failover-1
Failover-2
Entire month of July 2005
11 Data Plane Performance
Majority of Loss Bursts Occur at 0
12 Data Plane Performance
Three intervals
Significant impact on loss bust length RTT
13 Data Plane Performance
Num of reordering is small for all
Reordering offset is impacted
14 Root Causes of Loss Bursts
50 of LB are caused by Failure Events for failover-1
52 for failover-2
Length of Verified LBs longer than Unverified
15 Root Causes of Loss Bursts
Verified last longer than Unverified
LBs Caused by Forwarding loops last longer
16 A strange problem
More than half of the routing failures occur within ISP1. On the contrary, only a small portion of the routing failures occur within ISP2 upon withdrawal of the preferred route via ISP2.
over 80 of all the failover events have routing failures.
We also observe that the occurrence of withdrawal messages is right after the occurrence of failover events, and the withdrawal message is quickly replaced by an announcement.
17 How Rooting Failures Occur
non-valley policy
MRAI of ISP1 is 5s
MRAI of ISP2 is small
Interval against
non-valley policy
can up to 30s
18 Multiple Loss Bursts
75 host -gtless than two
A host up to 6
First two ? majority
57 of first 40 of second
are caused by routing failure
19 Location of Routing Failures
Via ICMP msg DNS
The fist loss burst
The second loss burst
55 in other tier-1 ASes during failover-1
73 in other tier-1 ASes during failover-2
Routing failures are propagated
20 Location of Routing Failures
Via BGP updates
In a tier-1 AS 134 withdrawals from 4 monitored routers
In other ASes 210 withdrawals from 7 ASes which dont include ISP1 and ISP2
Categories of probe hosts
21 Methodology Evaluation
Can we correlate ICMP messages with loss bursts
Ping the Beacon when there is no Beacon event 0.6 are not caused by Beacon events
ICMP blocking in some ISPs
53 of 10 tier-1 ASes
52 of ISP1 and 95 of ISP2
22 Recovery Events
Probe host 37 PlanetLab sites
12 choose restored path via ISP1
25 choose restored path via ISP2
Two recovery Events
Recovery-1
Recovery-2
23 Data Plane Performance
For (a) dont observe of a large loss burst
For (b) a large loss burst last for 100s
For all 29 hosts experience packet loss
24 Data Plane Performance
Loss burst length no difference
But the longest ones can up to 180/140, which must be cause by routing failures (counter-intuitive)
25 Data Plane Performance
Similar to Failover events
Recovery events have impact on RTT
26 Data Plane Performance
Reordering offset(ISP1)
Failover-1 Recovery-1
27 Data Plane Performance
Num of Reordering (ISP1)
Failover-1 Recovery-1
28 Data Plane Performance
Conclusion
Recovery dont contribute to Reordering
Recovery has impact on RTT
Recovery has the most impact on Loss Burst Length
29 Root Causes of Loss Bursts
Recovery indeed causes Routing Failure
May be more
30 Root Causes of Loss Bursts
Evaluate from BGP updates of ISP2
12 withdrawals among 724 recovery events
Little difference between withdrawal and announcement
Show that ISP2 temporarily lose their routes to the beacon
31 Root Causes of Loss Bursts
Duration of loss burst
Loss burst caused by recovery events lasts shorter
32 How Routing Failures occur
Recovery-1
R3---R1 has to wait due to MRAI
R3---R2 timer has just expired
R2 will send a message to R1to poison the previous route
A will experience packet loss while B will not
But if R2-R3 are logical link.
33 Multiple Loss Bursts Caused by Routing Failures
16 of the first loss
8 of the second
More than half of
the second failures
are forwarding loops
Why withdrawal...propagate.explore.loop
34 Location of Routing Failures
The same reason as failover events
35 Representativeness of the experiment
Characterizing Connectivity of Destination Prefixes
36 Representativeness of the experiment
Routing Failures During Failover Events
Multi-homed via a single link
Prefer customer Policy
Route from other peers or providers has lower preference
Single-homed via multiple links
Can avoid some failures
Failures might still occur
Hot-potato
Routing Failures During Recovery Events
37 Conclusions
Routing changes can cause
End-to-end loss (loss burst)
Multiple loss burst
RTT
Reordering
Root cause is
Routing policy iBGP configuration
Topology is import (I think)
Simply adding hysical connetivity does not necessarily minimize the impact of routing changes on end-to-end path performance
38 Recommendations
Reevaluate the mechanism to which MRAI timer is applied and the value of the timer.
store not only the best path but also the second best one at each router
PowerShow.com is a leading presentation sharing website. It has millions of presentations already uploaded and available with 1,000s more being uploaded by its users every day. Whatever your area of interest, here you’ll be able to find and view presentations you’ll love and possibly download. And, best of all, it is completely free and easy to use.
You might even have a presentation you’d like to share with others. If so, just upload it to PowerShow.com. We’ll convert it to an HTML5 slideshow that includes all the media types you’ve already added: audio, video, music, pictures, animations and transition effects. Then you can share it with your target audience as well as PowerShow.com’s millions of monthly visitors. And, again, it’s all free.
About the Developers
PowerShow.com is brought to you by CrystalGraphics, the award-winning developer and market-leading publisher of rich-media enhancement products for presentations. Our product offerings include millions of PowerPoint templates, diagrams, animated 3D characters and more.