A Measurement Study on the Impact of Routing Events on EndtoEnd Internet Path Performance - PowerPoint PPT Presentation

1 / 39

About This Presentation

Title:

A Measurement Study on the Impact of Routing Events on EndtoEnd Internet Path Performance

Description:

50% of LB are caused by Failure Events for failover-1. 52% for failover-2 ... Characterizing Connectivity of Destination Prefixes. Representativeness of the experiment ... – PowerPoint PPT presentation

Number of Views:18

Avg rating:3.0/5.0

Slides: 40

Provided by: zzbng

Category:

more less

Transcript and Presenter's Notes

Title: A Measurement Study on the Impact of Routing Events on EndtoEnd Internet Path Performance

1
A Measurement Study on the Impact of Routing
Events onEnd-to-End Internet Path Performance

by zzb

2
Motivation

How to improve network performance?
End-to-end Internet path performance degradation
routing dynamics.
The root cause of the correlation is unknown

3
Questions

Routing changes
Failover events
Recovery events
Factors
Topological properties,
Routing policies
IBGP configurations

4
Content

Background
Experiment Methodology
Failover events
Recovery events
Representativeness of the experiment
Conclusions recommendations

5
Background

MRAI
30 seconds for eBGP sessions .
5 seconds for iBGP sessions.
Non-valley
Customers do not transit traffic from one
provider to another
Peers do not transit traffic from one peer to
another.
prefer customer
routes received from a network providers
customers are always preferred over those
received from its peers or any other routes.

6
Experiment Methodology

Beacon controlled routing changes

7
Experiment Methodology

Probing architecture
Probing methods
UDP packet
50ms interval
Ping
Traceroute
Every hour for 20 minutes

8
Experiment Methodology

Data plane performance metrics
Packet loss
Bursty loss size (consecutive)
Packet delay
RTT
Out of order
Num of reordering
Reordering offset (buffer size)

9
Experiment Methodology

Identifying Routing Failures
Reasons of Loss
Routing dynamics
Route loss
Forwarding loops
Congestion
Ideal way to identify
By traceroute and ping (ICMP)
Route loss - gt destination is unreachable
Loops -gt TTL exceeded

10
Failover Events

Probe host 37 PlanetLab sites
14 choose ISP1
23 choose ISP2
Two failover Events
Failover-1
Failover-2
Entire month of July 2005

11
Data Plane Performance

Majority of Loss Bursts Occur at 0

12
Data Plane Performance

Three intervals
Significant impact on loss bust length RTT

13
Data Plane Performance

Num of reordering is small for all
Reordering offset is impacted

14
Root Causes of Loss Bursts

50 of LB are caused by Failure Events for
failover-1
52 for failover-2
Length of Verified LBs longer than Unverified

15
Root Causes of Loss Bursts

Verified last longer than Unverified
LBs Caused by Forwarding loops last longer

16
A strange problem

More than half of the routing failures occur
within ISP1. On the contrary, only a small
portion of the routing failures occur within ISP2
upon withdrawal of the preferred route via ISP2.
over 80 of all the failover events have routing
failures.
We also observe that the occurrence of withdrawal
messages is right after the occurrence of
failover events, and the withdrawal message is
quickly replaced by an announcement.

17
How Rooting Failures Occur

non-valley
policy
MRAI of ISP1 is 5s
MRAI of ISP2 is small
Interval against
non-valley policy
can up to 30s

18
Multiple Loss Bursts

75 host -gtless than two
A host up to 6
First two ? majority
57 of first 40 of second
are caused by routing failure

19
Location of Routing Failures

Via ICMP msg DNS
The fist loss burst
The second loss burst
55 in other tier-1 ASes during failover-1
73 in other tier-1 ASes during failover-2
Routing failures are propagated

20
Location of Routing Failures

Via BGP updates
In a tier-1 AS 134 withdrawals from 4 monitored
routers
In other ASes 210 withdrawals from 7 ASes which
dont include ISP1 and ISP2
Categories of probe hosts

21
Methodology Evaluation

Can we correlate ICMP messages with loss bursts
Ping the Beacon when there is no Beacon event
0.6 are not caused by Beacon events
ICMP blocking in some ISPs
53 of 10 tier-1 ASes
52 of ISP1 and 95 of ISP2

22
Recovery Events

Probe host 37 PlanetLab sites
12 choose restored path via ISP1
25 choose restored path via ISP2
Two recovery Events
Recovery-1
Recovery-2

23
Data Plane Performance

For (a) dont observe of a large loss burst
For (b) a large loss burst last for 100s
For all 29 hosts experience packet loss

24
Data Plane Performance

Loss burst length no difference
But the longest ones can up to 180/140, which
must be cause by routing failures
(counter-intuitive)

25
Data Plane Performance

Similar to Failover events
Recovery events have impact on RTT

26
Data Plane Performance

Reordering offset(ISP1)
Failover-1
Recovery-1

27
Data Plane Performance

Num of Reordering (ISP1)
Failover-1
Recovery-1

28
Data Plane Performance

Conclusion
Recovery dont contribute to Reordering
Recovery has impact on RTT
Recovery has the most impact on Loss Burst Length

29
Root Causes of Loss Bursts

Recovery indeed causes Routing Failure
May be more

30
Root Causes of Loss Bursts

Evaluate from BGP updates of ISP2
12 withdrawals among 724 recovery events
Little difference between withdrawal and
announcement
Show that ISP2 temporarily lose their routes to
the beacon

31
Root Causes of Loss Bursts

Duration of loss burst
Loss burst caused by recovery events lasts
shorter

32
How Routing Failures occur

Recovery-1
R3---R1 has to wait due to MRAI
R3---R2 timer has just expired
R2 will send a message to R1to poison the
previous route
A will experience packet loss while B will not
But if R2-R3 are logical link.

33
Multiple Loss Bursts Caused by Routing Failures

16 of the first loss
8 of the second
More than half of
the second failures
are forwarding loops
Why withdrawal...propagate.explore.loop

34
Location of Routing Failures

The same reason as failover events

35
Representativeness of the experiment

Characterizing Connectivity of Destination
Prefixes

36
Representativeness of the experiment

Routing Failures During Failover Events
Multi-homed via a single link
Prefer customer Policy
Route from other peers or providers has lower
preference
Single-homed via multiple links
Can avoid some failures
Failures might still occur
Hot-potato
Routing Failures During Recovery Events

37
Conclusions

Routing changes can cause
End-to-end loss (loss burst)
Multiple loss burst
RTT
Reordering
Root cause is
Routing policy iBGP configuration
Topology is import (I think)
Simply adding hysical connetivity does not
necessarily minimize the impact of routing
changes on end-to-end path performance

38
Recommendations

Reevaluate the mechanism to which MRAI timer is
applied and the value of the timer.
store not only the best path but also the second
best one at each router

39
What can we learn