On the Power of Off-line Data in Approximating Internet Distances - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

On the Power of Off-line Data in Approximating Internet Distances

Description:

... #hops, # AS, depth Linear Regression for Internet distance estimation Multi-variable linear regression Accuracy of picking closest mirror site The next step ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 17
Provided by: Timel8
Category:

less

Transcript and Presenter's Notes

Title: On the Power of Off-line Data in Approximating Internet Distances


1
On the Power of Off-line Data in Approximating
Internet Distances
  • Danny Raz (danny_at_cs.technion.ac.il)
  • Technion - Israel Institute of Technology
  • and
  • Prasun Sinha (prasunsinha_at_lucent.com)
  • Bell Labs., Lucent Technologies

2
Outline
  • Internet Distance
  • Off line metrics
  • Geographic distance, hops, AS, depth
  • Linear Regression for Internet distance
    estimation
  • Multi-variable linear regression
  • Accuracy of picking closest mirror site
  • The next step

3
Internet Distance
  • Internet Distance one way delay between hosts
  • Components of Internet Distance
  • Dynamic
  • Server Load
  • Network Congestion / Router Load
  • Static
  • propagation delay over the links
  • Router processing delay
  • Edge-router processing delay

Goal To study the power of estimating the Static
Internet Distance using off-line metrics
4
Importance of Internet Distance Estimation
  • Picking closest mirror-site/cache
  • For use in Content Distribution Networks

5
Approaches
  • Dynamic
  • Dynamic probing Dykes et. al. Infocom 00
  • Passive monitoring Andrews et. al. Infocom 02
  • Static
  • Semi-active probing (IDMAPs) Jamin et. al.
    Infocom 00
  • Other relevant work
  • Geographic Distance and RTT Padmanabhan Sigcomm
    02

6
Static Internet Distance
AS 1
AS 2
AS 3
Core Router
Edge Router
  • Propagation delay geographical distance
  • Router processing delay hops
  • Edge-router processing delay AS

AS Autonomous System
Static Internet Distance ? geo-distance ?
hop-count ? AS-count ?
7
Data Collection
  • Clients 2500 public libraries in US
  • Servers (mirrors/caches) 8 traceroute locations
    in US
  • The location (latitude, longitude) is known for
    every host.
  • For every client-server pair
  • Run multiple (10) traceroutes
  • Pick the traceroute result with the smallest RTT
  • Compute
  • Geo-distance based on latitude and longitude
  • Hop-count from traceroute
  • AS-count from traceroute based on names of
    routers and IP Address Prefixes

8
Linear Regression(Geo-distance and Hop-count)
minRTT vs. Geo-distance SE (Std. Error) 26.93
minRTT vs. Hop-count SE (Std. Error) 25.71
9
Multiple Linear Regression (Multiple metrics)
minRTT vs. Geo-distance, Hop-count SE 21.52
minRTT vs. Geo-distance, AS-count SE 23.80
10
minRTT ? geo-distance ? hop-count ?
AS-count ?
Term Coefficient p-value
Geo-distance 12.53 (?) lt0.0001
Hop-count 2.45 (?) lt0.0001
AS-count -0.64 (?) 0.0387
  • High correlation between hop-count and AS-count
    (highest among any other pair of metrics)
  • Hop-count and AS-count should not be used together

11
A new Off-line metric Depth
  • Hop-count requires dynamic probing
  • Introduce an alternate metric Depth
  • Average Hop-count to the nearest backbone network
    (a hand-made list of 30 big core networks)
  • Constant per host (client/server)
  • Alternately, measure in units of time rather than
    hops
  • (Client depth Server depth) as a metric

12
Linear Regression (Depth)
minRTT vs. Depth SE 41.02
minRTT vs. Depth and Geo-distance SE 24.52
13
Squared Errors in Estimating minRTT
Metric SE (Standard Error)
Geo-distance, Hop-count 21.52
Geo-distance, AS-count 23.80
Geo-distance, Depth 24.52
Hop-count 25.71
Geo-Distance 26.93
Depth 41.02
14
Accuracy of picking the nearest mirror site
Allowed Delta Random Geo-distance Hop-count Geo-distance, Hop-count Geo-distance, Depth
0 12.50 37.84 44.32 38.41 33.98
10ms 21.15 53.07 58.98 55.91 50.45
20ms 33.75 73.18 76.70 74.89 70.91
30ms 46.25 90.91 88.75 91.36 89.43
880 clients and 8 servers
15
Summary
  • Combination of hop-count and geographic distance
    improves over individual metrics
  • Using Depth along with Geo-distance improves
    performance and is completely off-line
  • For closest mirror selection with 30 ms allowed
    deviation, almost any metric gives 90 accuracy

Is there much space to improve?
16
The Next Step
  • Global Data
  • Collection and analysis of data based on clients
    and servers spread across the globe
  • Using both off-line and on-line
  • Techniques to combine the power of off line
    estimation with on-line estimation.
Write a Comment
User Comments (0)
About PowerShow.com