Title: On the Power of Off-line Data in Approximating Internet Distances
1On the Power of Off-line Data in Approximating
Internet Distances
- Danny Raz (danny_at_cs.technion.ac.il)
- Technion - Israel Institute of Technology
- and
- Prasun Sinha (prasunsinha_at_lucent.com)
- Bell Labs., Lucent Technologies
2Outline
- Internet Distance
- Off line metrics
- Geographic distance, hops, AS, depth
- Linear Regression for Internet distance
estimation - Multi-variable linear regression
- Accuracy of picking closest mirror site
- The next step
3Internet Distance
- Internet Distance one way delay between hosts
- Components of Internet Distance
- Dynamic
- Server Load
- Network Congestion / Router Load
- Static
- propagation delay over the links
- Router processing delay
- Edge-router processing delay
Goal To study the power of estimating the Static
Internet Distance using off-line metrics
4Importance of Internet Distance Estimation
- Picking closest mirror-site/cache
- For use in Content Distribution Networks
5Approaches
- Dynamic
- Dynamic probing Dykes et. al. Infocom 00
- Passive monitoring Andrews et. al. Infocom 02
- Static
- Semi-active probing (IDMAPs) Jamin et. al.
Infocom 00 - Other relevant work
- Geographic Distance and RTT Padmanabhan Sigcomm
02
6Static Internet Distance
AS 1
AS 2
AS 3
Core Router
Edge Router
- Propagation delay geographical distance
- Router processing delay hops
- Edge-router processing delay AS
AS Autonomous System
Static Internet Distance ? geo-distance ?
hop-count ? AS-count ?
7Data Collection
- Clients 2500 public libraries in US
- Servers (mirrors/caches) 8 traceroute locations
in US - The location (latitude, longitude) is known for
every host. - For every client-server pair
- Run multiple (10) traceroutes
- Pick the traceroute result with the smallest RTT
- Compute
- Geo-distance based on latitude and longitude
- Hop-count from traceroute
- AS-count from traceroute based on names of
routers and IP Address Prefixes
8Linear Regression(Geo-distance and Hop-count)
minRTT vs. Geo-distance SE (Std. Error) 26.93
minRTT vs. Hop-count SE (Std. Error) 25.71
9Multiple Linear Regression (Multiple metrics)
minRTT vs. Geo-distance, Hop-count SE 21.52
minRTT vs. Geo-distance, AS-count SE 23.80
10minRTT ? geo-distance ? hop-count ?
AS-count ?
Term Coefficient p-value
Geo-distance 12.53 (?) lt0.0001
Hop-count 2.45 (?) lt0.0001
AS-count -0.64 (?) 0.0387
- High correlation between hop-count and AS-count
(highest among any other pair of metrics) - Hop-count and AS-count should not be used together
11A new Off-line metric Depth
- Hop-count requires dynamic probing
- Introduce an alternate metric Depth
- Average Hop-count to the nearest backbone network
(a hand-made list of 30 big core networks) - Constant per host (client/server)
- Alternately, measure in units of time rather than
hops - (Client depth Server depth) as a metric
12Linear Regression (Depth)
minRTT vs. Depth SE 41.02
minRTT vs. Depth and Geo-distance SE 24.52
13Squared Errors in Estimating minRTT
Metric SE (Standard Error)
Geo-distance, Hop-count 21.52
Geo-distance, AS-count 23.80
Geo-distance, Depth 24.52
Hop-count 25.71
Geo-Distance 26.93
Depth 41.02
14Accuracy of picking the nearest mirror site
Allowed Delta Random Geo-distance Hop-count Geo-distance, Hop-count Geo-distance, Depth
0 12.50 37.84 44.32 38.41 33.98
10ms 21.15 53.07 58.98 55.91 50.45
20ms 33.75 73.18 76.70 74.89 70.91
30ms 46.25 90.91 88.75 91.36 89.43
880 clients and 8 servers
15Summary
- Combination of hop-count and geographic distance
improves over individual metrics - Using Depth along with Geo-distance improves
performance and is completely off-line - For closest mirror selection with 30 ms allowed
deviation, almost any metric gives 90 accuracy
Is there much space to improve?
16The Next Step
- Global Data
- Collection and analysis of data based on clients
and servers spread across the globe - Using both off-line and on-line
- Techniques to combine the power of off line
estimation with on-line estimation.