Title: On the Accuracy of Embeddings for Internet Coordinate Systems
1On the Accuracy of Embeddings for Internet
Coordinate Systems
- Eng Keong Lua, Tim Griffin, Marcelo Pias,
- Han Zheng, Jon Crowcroft.
- University of Cambridge, Computer Laboratory.
2RTT Estimation Is this a good one?
Depends on the APPLICATIONS!
Estimated RTT from planetlab1.comet.columbia.edu
RTT (ms)
Measured RTT from planetlab1.comet.columbia.edu
PlanetLabs sites, from closest to farthest using
measured RTT
3RRT Estimation Is this a good one?
Estimated RTT from planetlab1.pop-mg.rnp.br
Measured RTT from planetlab1.pop-mg.rnp.br
RTT (ms)
PlanetLabs sites, from closest to farthest using
measured RTT
4Internet Coordinates How accurateare they?
Both of the previous examples where
generated using the same Internet coordinate
technique on the same data set
Outline
- What are Internet Coordinates?
- A Close Look at the Lipschitz Embedding
- New Sets of Accuracy Metrics
- Experimental Methodology - PlanetLab Experiments
- Using Other Embeddings
- Revisiting Previous Work
- Conclusion
5What are Internet Coordinates?
- Internet Coordinate System
- Embed Round-Trip-Times (RTTs) into geometric
spaces - Unmeasured RTTs are estimated using geometric
distance - Why Internet Coordinate Systems?
- Extensive measurement of network delays can be
- time consuming
- add to network load
- Construction of overlay topologies through
scalable distance estimation - If accurate, embedding techniques allow us to
predict Internet RTTs without extensive
measurements.
6How embeddings work
L Landmarks H Hosts N Nodes L H
L
H
L
H
Embed Associate a point with each node in N
Compute distance matrix
L
L
H
A Metric Space
H
Estimated RTT matrix
Measured RTT matrix
This data is not used in embedding. (But is
needed for judging accuracy!)
Why we dont use Skitter data!
7Full Embedding L N
L
L
Compute distance matrix
L
Embed
L
A Metric Space
Estimated RTT matrix
Measured RTT matrix
In general, some accuracy is lost even when the
full mesh of data is used
8Two Basic Approaches Method I
- Predicting Internet Network Distance with
Coordinates-based - Approaches (GNP) Ng, Zhang. INFOCOM 2002
- Big Bang Simulation (BBS) Shavitt, Tankel.
INFOCOM 2003, 2004
- Vivaldi Dabek, Cox, Kaashoek, Morris. SIGCOMM
2004
L m
- PIC Costa, Kastro, Rowstron, Key. ICDCS 2004
L
H
L
Embed using optimization algorithms w.r.t an
accuracy metric (n lt m)
Space of n dimensions
H
Measured RTT matrix
9Two Basic Approaches Method II
- Virtual Landmarks Tang, Crovella, IMC 2003
- Constructing Internet Coordinate Systems based
on - Delay Meausurements Lim, Hou, Choi, IMC 2003
L m
- Lighthouses for Scalable Distributed Location
- Pias, Crowcroft, Wilbur, Harris, Bhatti, IPTPS
2003
L
H
L
Dimensionality reduction (n lt m)
Lipschitz embedding
Euclidean Space of m dimensions
Euclidean Space of n dimensions
H
Measured RTT matrix
May attempt to optimize this using a specific
accuracy metric w.r.t the measured RTTs, and/or
the m-dimensional distances
Accuracy may be lost We will look at the
inherent loss of accuracy of this step
10Lipschitz Embedding Example using binary trees
- Full Lipschitz embedding
- into R7 by reading each
- Row 7-dimensional
- Coordinate of the
- node
- E.g. Coordinates of
- Node 1 is F(1)
- 0, 1, 2, 2, 1, 2, 2
11View from a leaf in a binary tree, depth 4
Full 32-D Lipschitz
12View from root in a binary tree, depth 4
13What should Accuracy Mean?
- Several ways to capture Accuracy formally
- Notion depend on the needs of an application
- Some applications require the distances in
embedding accurately reflect the original
distances - In earlier example,
- we have F(7) 2, 3, 4, 4, 1, 2, 0
- d(1,7) 4.47
-
- But it is only 2 in original metric space
14Relative Rank Loss (rrl)
- Relative distance of other nodes
- Is Node A closer than Node B?
- Relative ranking of distances is not lost
- We define Relative Rank Loss (rrl)
- From Node z,
- if sign(R) ? sign(R)
- Order has changed!
15Formal Definition - rrl
rrl is a type of swap distance
Define
16Formal Definition - rrl
- Define Local rrl at Node z is
- Note that 0 (0) lt rrl(F,z) lt 1 (100)
- Maximal Local rrl at Node z MAX(rrl(F,z))
- Average Local rrl at Node z
17Closest Neighbor Loss (cnl)
- Some applications interested only in determining
which nodes are closest - Accurately preserve the set of closest nodes
- For a Node x
- Its Closest Neighbor Loss, cnl(F,x) is 0,
- if any of nodes closest to x X are mapped to
the nodes closest to F(x) - Otherwise, cnl(F,x) is 1
- Global Average cnl(F,x) denotes as cnl(F)
18Relative error for Lipschitz embedding on binary
trees, depth 1 (3 nodes) to 8 (511 nodes)
It is not obvious or intuitive how to interpret
19Scalar independent measures for Lipschitz
embedding on binary trees, depth 1 to 8
cnl tells us that about 96 of 511 nodes in a
tree of depth 8 have a different closest neighbors
Maximal Local rrl tells us that at least 1 node
see over 30 of its relative distance
relationships swapped
rrl shows that on average nodes see over 20 of
their relative distance relationships swapped
20View from a leaf in a hub with 30 spokes
Root node is PUSHED away to a distance of 3.3
21Hub and Spoke Accuracyn spokes and 1 root, where
n ranges from 1 to 30
Rising cnl and falling rrl after n6
22Why PlanetLab?
- Skitter project makes RTT data available from a
small number of monitoring nodes n to m target
nodes, m is order of hundreds of thousands - Yields an asymmetric n x m
- Embedded distances between target nodes cannot be
verified - PlanetLab testbed for Internet planetary-scale
mesh topology
23Methodology
- RTT measurement data collected between PlanetLab
nodes from March 22-28, 2004 - Minimum value between each pair of nodes on
consecutive of 15-min periods - Each day has 96 matrices of pair-wise RTT, with
size of each matrix is 325 x 325 - Over 7-day period, we have 672 matrices
24Methodology
- A representative node is selected in each site to
build a site-by-site matrix, and clean up for
missing entries - Finally, we have 69 x 69 RTT site-by-site matrix
- We further classify into geographical locations
- North America (NA-PL) 44 x 44 RTT site matrix,
majority sites obtain connectivity through
Abilene - Outside North America (ONA-PL) 25 x 25 RTT site
matrix between research and commercial, includes
Australia, Europe, Latin America and Asia - ALL (ALL-PL) 69 x 69 RTT site matrix, consists
of NA-PL ONA-PL
25Results and Observations ALL-PL
- Apply full Lipschitz Embedding
- Minimum, Mean and Maximum rrl
- Difference between Max and Min rrls is high
(57.71) Flip a coin is better! - Global cnl measure is 84.06, only about 15 of
the sites retain their closest neighbors in their
embedding
26Scalability (Meta-) Metric Can embeddings scale?
- Suppose applications only interested in a subset
of nodes, e.g. North America - Would it be better to use an Internet Coordinate
System from ALL-PL or from NA-PL? - To answer to this question will determine if
embedding services could scale - If Y X, we first could use the full Lipschitz
embedding to obtain F(X), then restrict this to
nodes in Y, denote as Superspace embedding - F(Y) and may have very different
embeddings with different accuracy for metric
space spanned by Y
27Superspace and Subspace EmbeddingsLooking at
NA-PL
28Superspace and Subspace Results
- We used NA-PL as a Subspace of ALL-PL F(NA-PL)
Subspace Embedding of NA-PL - F(NA-ALL) NA-PL Superspace Embedding of
NA-PL - Lipschitz Subspace embedding in Euclidean space
is a much better one
29North America (Superspace Embedding) PlanetLab
site with Maximum rrl planetlab1.flux.utah.edu
30North America (Subspace Embedding) PlanetLab
site with Maximum rrl planetlab1.enel.ucalgary.c
a
31CDFs of rrl for Subspace and Superspace Embeddings
32Using Other Embeddings with our PlanetLab ALL-PL
sites using our Accuracy Metrics
- Both BBS (Euclidean) and Vivaldi embeddings in
Euclidean space have the same cnl measure of
75.36 - BBS (Hyperbolic) LRN has the lowest cnl
- Vivaldi has higher maximum rrl compared to BBS
(Euclidean) - BBS (Euclidean) has lowest maximum rrl
- BBS (Hyperbolic) TP embedding has a much higher
maximum rrl than BBS (Hyperbolic) LRN embedding - It has the largest maximum rrl
- Its minimum rrl is lower than BBS (Hyperbolic)
LRN
33Signature plots BBS (Hyperbolic) TP
Lists of close neighbors are being pushed away in
embedded geometric space
34Signature plots Vivaldi
Lists of close neighbors are being pushed away in
embedded geometric space
35Scalability (Meta-) Metric Superspace and
Subspace embeddings
- Vivaldi and BBS embeddings in Euclidean space
have same behavior as Lipschitz embedding - Subspace embedding has better rrl accuracy than
Superspace embedding in Euclidean space - BBS embeddings in Hyperbolic space
- Superspace embedding tends to have a close or
better rrl accuracy than Subspace embedding in
Hyperbolic space
36Revisiting Previous Work with their data sets
using our Accuracy Metrics
- BBS (Hyperbolic) TP in Hyperbolic space has
similar inaccuracy behaviors in rrl as Lipschitz
embedding in Euclidean space for tree-like
network topology - All experiments show list of closest nodes being
pushed away with sharp bi-modal errors - BBS (Hyperbolic) LRN, list of close neighbors is
being pushed away very much further and has
higher maximum rrl
37BBS (Euclidean) using Jan 2000 AS Hierarchical
Tree Network Topology of 150 nodes
38BBS (Hyperbolic) TP using Jan 2000 AS
Hierarchical Tree Network Topology of 150 nodes
39BBS (Hyperbolic) TP using BA Network Topology of
150 nodes
40BBS (Hyperbolic) TP using Mar 2001 AS Network
Topology of 200 nodes
41BBS (Hyperbolic) LRN using Mar 2001 AS Network
Topology of 200 nodes
42Conclusion
- Goal of this work is to apply our new accuracy
metrics to study the accuracy of embeddings for
Internet Coordinate systems - Results of this attempt is not encouraging
- Worthwhile to develop a collection of accuracy
metrics that are able to quantify different
aspects of user-oriented quality - Can we characterize the impact of network
topologies that have good embeddings with respect
to an accuracy metric? - Embeddable Overlay Network (EON)
- Routing nodes are selected to avoid violations of
triangle inequality (for overlay forwarding) - Overlay topology selected to embed with high
accuracy with respect to multiple useful accuracy
metrics
43Thank you.
44BACK UP SLIDES
45Distortion
- Intuition Imagine that contraction(F) 1
(achieved with scalar multiplication) - No contraction in embedding, only expansion
- Largest expansion achieved by some pair x,y
- where d(F(x),F(y)) is expansion(F).d(x,y)
- Relevant to applications that require a global
picture of entire embedding
46Relative Error
- Improve notion of accuracy by redefine the
embedding as F(x) ß.F(x) - Choose ß to minimize relative error,
-
- ß
-
-
- From example, ß 0.5, and reduce the distance
between the embeddings of Node 1 and 7 2.3
47Stress
- A similar Accuracy metric is Stress, defined as
- Both Stress and Relative Error
- quantify the magnitude of differences between
original and embedded distances
48Distortion
- Theoretically, on embeddings, most common notion
of accuracy - Invariant under scalar multiplication
- distortion(F) distortion(a.F), for all a?0
- Worst-case ratio change in the expansion to
shrinkage of relative distances
49Distortion
- Define
- expansion(F) maximum value of r(F,x,y)
- contraction(F) minimum value of r(F,x,y)
- where x,y X (x?y)
- Distortion(F) always gt or 1
- Note Distortion is global worst-case measure
50Local Distortion
- Internet Coordinate Systems interested only in
Local Distortion from one nodes perspectives - For node x, we define
- expansion(F,x)maximum value of r(F,x,y)
- contraction(F,x)minimum value of r(F,x,y)
- where y X (x?y)
-
-
-
- Maximum Local Distortion MAX(Distortion(F,x))
over all x
51Basic Properties of Metric Space
- We define a metric space A pair (X,d) where X is
a set of points with distance function - d X x X ? R for each a,b X is a distance
between a and b, given by d(a,b) - We require for all a,b,c X
- (anti-reflexivity) d(a,b)0 iff ab
- (symmetry) d(a,b)d(a,b)
- (triangle inequality) d(a,b) d(a,c) d(c,b)
52Basic Properties of An Embedding
- An embedding of a finite metric space (X,d) into
(Rk,d) is a mapping F X ? Rk, - k is the dimensionality
- d Rk x Rk ? R is the distance metric function
of the embedding space - If we denote the norm in Rk with , distance
metric d is defined as d(x,y) - , p2 for Euclidean Distance
- Ideally, the distance d(F(a),F(b)) d(a,b)
53PlanetLab site with Mean rrl pli-br-2.hpl.hp.com
54PlanetLab site with Minimum rrl, scaling factor
minimizes relative error applied
planetlab1.comet.columbia.edu
Closest neighbor is not preserved
55PlanetLab site with Maximum rrl
planetlab1.pop-mg.rnp.br
56Using Other Embeddings
- Experiments to apply new accuracy metrics using
Vivaldi and Big Bang Simulation (BBS) (in
Euclidean and Hyperbolic spaces) systems using
our PlanetLab sites of ALL-PL - 3-Dimensional coordinates generated
- p2psim simulator used to generate Vivaldi results
- BBS (Hyperbolic) Two Phases (TP) embedding is
done using landmarks similar to GNP 15
landmarks - BBS (Hyperbolic) Log-Random and Neighbors (LRN)
embedding concurrently embed nodes which
comprises of node pairs whose distance is below a
certain threshold - Number of randomly sampled distance pairs is
n.log n