On the Accuracy of Embeddings for Internet Coordinate Systems - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

On the Accuracy of Embeddings for Internet Coordinate Systems

Description:

On the Accuracy of Embeddings for Internet Coordinate Systems ... Would it be better to use an Internet Coordinate System from ALL-PL or from NA-PL? ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 57
Provided by: SystemsAdm2
Category:

less

Transcript and Presenter's Notes

Title: On the Accuracy of Embeddings for Internet Coordinate Systems


1
On the Accuracy of Embeddings for Internet
Coordinate Systems
  • Eng Keong Lua, Tim Griffin, Marcelo Pias,
  • Han Zheng, Jon Crowcroft.
  • University of Cambridge, Computer Laboratory.

2
RTT Estimation Is this a good one?
Depends on the APPLICATIONS!
Estimated RTT from planetlab1.comet.columbia.edu
RTT (ms)
Measured RTT from planetlab1.comet.columbia.edu
PlanetLabs sites, from closest to farthest using
measured RTT
3
RRT Estimation Is this a good one?
Estimated RTT from planetlab1.pop-mg.rnp.br
Measured RTT from planetlab1.pop-mg.rnp.br
RTT (ms)
PlanetLabs sites, from closest to farthest using
measured RTT
4
Internet Coordinates How accurateare they?
Both of the previous examples where
generated using the same Internet coordinate
technique on the same data set
Outline
  • What are Internet Coordinates?
  • A Close Look at the Lipschitz Embedding
  • New Sets of Accuracy Metrics
  • Experimental Methodology - PlanetLab Experiments
  • Using Other Embeddings
  • Revisiting Previous Work
  • Conclusion

5
What are Internet Coordinates?
  • Internet Coordinate System
  • Embed Round-Trip-Times (RTTs) into geometric
    spaces
  • Unmeasured RTTs are estimated using geometric
    distance
  • Why Internet Coordinate Systems?
  • Extensive measurement of network delays can be
  • time consuming
  • add to network load
  • Construction of overlay topologies through
    scalable distance estimation
  • If accurate, embedding techniques allow us to
    predict Internet RTTs without extensive
    measurements.

6
How embeddings work
L Landmarks H Hosts N Nodes L H
L
H
L
H
Embed Associate a point with each node in N
Compute distance matrix
L
L
H
A Metric Space
H
Estimated RTT matrix
Measured RTT matrix
This data is not used in embedding. (But is
needed for judging accuracy!)
Why we dont use Skitter data!
7
Full Embedding L N
L
L
Compute distance matrix
L
Embed
L
A Metric Space
Estimated RTT matrix
Measured RTT matrix
In general, some accuracy is lost even when the
full mesh of data is used
8
Two Basic Approaches Method I
  • Predicting Internet Network Distance with
    Coordinates-based
  • Approaches (GNP) Ng, Zhang. INFOCOM 2002
  • Big Bang Simulation (BBS) Shavitt, Tankel.
    INFOCOM 2003, 2004
  • Vivaldi Dabek, Cox, Kaashoek, Morris. SIGCOMM
    2004

L m
  • PIC Costa, Kastro, Rowstron, Key. ICDCS 2004

L
H
L
Embed using optimization algorithms w.r.t an
accuracy metric (n lt m)
Space of n dimensions
H
Measured RTT matrix
9
Two Basic Approaches Method II
  • Virtual Landmarks Tang, Crovella, IMC 2003
  • Constructing Internet Coordinate Systems based
    on
  • Delay Meausurements Lim, Hou, Choi, IMC 2003

L m
  • Lighthouses for Scalable Distributed Location
  • Pias, Crowcroft, Wilbur, Harris, Bhatti, IPTPS
    2003

L
H
L
Dimensionality reduction (n lt m)
Lipschitz embedding
Euclidean Space of m dimensions
Euclidean Space of n dimensions
H
Measured RTT matrix
May attempt to optimize this using a specific
accuracy metric w.r.t the measured RTTs, and/or
the m-dimensional distances
Accuracy may be lost We will look at the
inherent loss of accuracy of this step
10
Lipschitz Embedding Example using binary trees
  • Full Lipschitz embedding
  • into R7 by reading each
  • Row 7-dimensional
  • Coordinate of the
  • node
  • E.g. Coordinates of
  • Node 1 is F(1)
  • 0, 1, 2, 2, 1, 2, 2

11
View from a leaf in a binary tree, depth 4
Full 32-D Lipschitz
12
View from root in a binary tree, depth 4
13
What should Accuracy Mean?
  • Several ways to capture Accuracy formally
  • Notion depend on the needs of an application
  • Some applications require the distances in
    embedding accurately reflect the original
    distances
  • In earlier example,
  • we have F(7) 2, 3, 4, 4, 1, 2, 0
  • d(1,7) 4.47
  • But it is only 2 in original metric space

14
Relative Rank Loss (rrl)
  • Relative distance of other nodes
  • Is Node A closer than Node B?
  • Relative ranking of distances is not lost
  • We define Relative Rank Loss (rrl)
  • From Node z,
  • if sign(R) ? sign(R)
  • Order has changed!

15
Formal Definition - rrl
rrl is a type of swap distance
Define
16
Formal Definition - rrl
  • Define Local rrl at Node z is
  • Note that 0 (0) lt rrl(F,z) lt 1 (100)
  • Maximal Local rrl at Node z MAX(rrl(F,z))
  • Average Local rrl at Node z

17
Closest Neighbor Loss (cnl)
  • Some applications interested only in determining
    which nodes are closest
  • Accurately preserve the set of closest nodes
  • For a Node x
  • Its Closest Neighbor Loss, cnl(F,x) is 0,
  • if any of nodes closest to x X are mapped to
    the nodes closest to F(x)
  • Otherwise, cnl(F,x) is 1
  • Global Average cnl(F,x) denotes as cnl(F)

18
Relative error for Lipschitz embedding on binary
trees, depth 1 (3 nodes) to 8 (511 nodes)
It is not obvious or intuitive how to interpret
19
Scalar independent measures for Lipschitz
embedding on binary trees, depth 1 to 8
cnl tells us that about 96 of 511 nodes in a
tree of depth 8 have a different closest neighbors
Maximal Local rrl tells us that at least 1 node
see over 30 of its relative distance
relationships swapped
rrl shows that on average nodes see over 20 of
their relative distance relationships swapped
20
View from a leaf in a hub with 30 spokes
Root node is PUSHED away to a distance of 3.3
21
Hub and Spoke Accuracyn spokes and 1 root, where
n ranges from 1 to 30
Rising cnl and falling rrl after n6
22
Why PlanetLab?
  • Skitter project makes RTT data available from a
    small number of monitoring nodes n to m target
    nodes, m is order of hundreds of thousands
  • Yields an asymmetric n x m
  • Embedded distances between target nodes cannot be
    verified
  • PlanetLab testbed for Internet planetary-scale
    mesh topology

23
Methodology
  • RTT measurement data collected between PlanetLab
    nodes from March 22-28, 2004
  • Minimum value between each pair of nodes on
    consecutive of 15-min periods
  • Each day has 96 matrices of pair-wise RTT, with
    size of each matrix is 325 x 325
  • Over 7-day period, we have 672 matrices

24
Methodology
  • A representative node is selected in each site to
    build a site-by-site matrix, and clean up for
    missing entries
  • Finally, we have 69 x 69 RTT site-by-site matrix
  • We further classify into geographical locations
  • North America (NA-PL) 44 x 44 RTT site matrix,
    majority sites obtain connectivity through
    Abilene
  • Outside North America (ONA-PL) 25 x 25 RTT site
    matrix between research and commercial, includes
    Australia, Europe, Latin America and Asia
  • ALL (ALL-PL) 69 x 69 RTT site matrix, consists
    of NA-PL ONA-PL

25
Results and Observations ALL-PL
  • Apply full Lipschitz Embedding
  • Minimum, Mean and Maximum rrl
  • Difference between Max and Min rrls is high
    (57.71) Flip a coin is better!
  • Global cnl measure is 84.06, only about 15 of
    the sites retain their closest neighbors in their
    embedding

26
Scalability (Meta-) Metric Can embeddings scale?
  • Suppose applications only interested in a subset
    of nodes, e.g. North America
  • Would it be better to use an Internet Coordinate
    System from ALL-PL or from NA-PL?
  • To answer to this question will determine if
    embedding services could scale
  • If Y X, we first could use the full Lipschitz
    embedding to obtain F(X), then restrict this to
    nodes in Y, denote as Superspace embedding
  • F(Y) and may have very different
    embeddings with different accuracy for metric
    space spanned by Y

27
Superspace and Subspace EmbeddingsLooking at
NA-PL
28
Superspace and Subspace Results
  • We used NA-PL as a Subspace of ALL-PL F(NA-PL)
    Subspace Embedding of NA-PL
  • F(NA-ALL) NA-PL Superspace Embedding of
    NA-PL
  • Lipschitz Subspace embedding in Euclidean space
    is a much better one

29
North America (Superspace Embedding) PlanetLab
site with Maximum rrl planetlab1.flux.utah.edu
30
North America (Subspace Embedding) PlanetLab
site with Maximum rrl planetlab1.enel.ucalgary.c
a
31
CDFs of rrl for Subspace and Superspace Embeddings
32
Using Other Embeddings with our PlanetLab ALL-PL
sites using our Accuracy Metrics
  • Both BBS (Euclidean) and Vivaldi embeddings in
    Euclidean space have the same cnl measure of
    75.36
  • BBS (Hyperbolic) LRN has the lowest cnl
  • Vivaldi has higher maximum rrl compared to BBS
    (Euclidean)
  • BBS (Euclidean) has lowest maximum rrl
  • BBS (Hyperbolic) TP embedding has a much higher
    maximum rrl than BBS (Hyperbolic) LRN embedding
  • It has the largest maximum rrl
  • Its minimum rrl is lower than BBS (Hyperbolic)
    LRN

33
Signature plots BBS (Hyperbolic) TP
Lists of close neighbors are being pushed away in
embedded geometric space
34
Signature plots Vivaldi
Lists of close neighbors are being pushed away in
embedded geometric space
35
Scalability (Meta-) Metric Superspace and
Subspace embeddings
  • Vivaldi and BBS embeddings in Euclidean space
    have same behavior as Lipschitz embedding
  • Subspace embedding has better rrl accuracy than
    Superspace embedding in Euclidean space
  • BBS embeddings in Hyperbolic space
  • Superspace embedding tends to have a close or
    better rrl accuracy than Subspace embedding in
    Hyperbolic space

36
Revisiting Previous Work with their data sets
using our Accuracy Metrics
  • BBS (Hyperbolic) TP in Hyperbolic space has
    similar inaccuracy behaviors in rrl as Lipschitz
    embedding in Euclidean space for tree-like
    network topology
  • All experiments show list of closest nodes being
    pushed away with sharp bi-modal errors
  • BBS (Hyperbolic) LRN, list of close neighbors is
    being pushed away very much further and has
    higher maximum rrl

37
BBS (Euclidean) using Jan 2000 AS Hierarchical
Tree Network Topology of 150 nodes
38
BBS (Hyperbolic) TP using Jan 2000 AS
Hierarchical Tree Network Topology of 150 nodes
39
BBS (Hyperbolic) TP using BA Network Topology of
150 nodes
40
BBS (Hyperbolic) TP using Mar 2001 AS Network
Topology of 200 nodes
41
BBS (Hyperbolic) LRN using Mar 2001 AS Network
Topology of 200 nodes
42
Conclusion
  • Goal of this work is to apply our new accuracy
    metrics to study the accuracy of embeddings for
    Internet Coordinate systems
  • Results of this attempt is not encouraging
  • Worthwhile to develop a collection of accuracy
    metrics that are able to quantify different
    aspects of user-oriented quality
  • Can we characterize the impact of network
    topologies that have good embeddings with respect
    to an accuracy metric?
  • Embeddable Overlay Network (EON)
  • Routing nodes are selected to avoid violations of
    triangle inequality (for overlay forwarding)
  • Overlay topology selected to embed with high
    accuracy with respect to multiple useful accuracy
    metrics

43
Thank you.
  • Questions?

44
BACK UP SLIDES
45
Distortion
  • Intuition Imagine that contraction(F) 1
    (achieved with scalar multiplication)
  • No contraction in embedding, only expansion
  • Largest expansion achieved by some pair x,y
  • where d(F(x),F(y)) is expansion(F).d(x,y)
  • Relevant to applications that require a global
    picture of entire embedding

46
Relative Error
  • Improve notion of accuracy by redefine the
    embedding as F(x) ß.F(x)
  • Choose ß to minimize relative error,
  • ß
  • From example, ß 0.5, and reduce the distance
    between the embeddings of Node 1 and 7 2.3

47
Stress
  • A similar Accuracy metric is Stress, defined as
  • Both Stress and Relative Error
  • quantify the magnitude of differences between
    original and embedded distances

48
Distortion
  • Theoretically, on embeddings, most common notion
    of accuracy
  • Invariant under scalar multiplication
  • distortion(F) distortion(a.F), for all a?0
  • Worst-case ratio change in the expansion to
    shrinkage of relative distances

49
Distortion
  • Define
  • expansion(F) maximum value of r(F,x,y)
  • contraction(F) minimum value of r(F,x,y)
  • where x,y X (x?y)
  • Distortion(F) always gt or 1
  • Note Distortion is global worst-case measure

50
Local Distortion
  • Internet Coordinate Systems interested only in
    Local Distortion from one nodes perspectives
  • For node x, we define
  • expansion(F,x)maximum value of r(F,x,y)
  • contraction(F,x)minimum value of r(F,x,y)
  • where y X (x?y)
  • Maximum Local Distortion MAX(Distortion(F,x))
    over all x

51
Basic Properties of Metric Space
  • We define a metric space A pair (X,d) where X is
    a set of points with distance function
  • d X x X ? R for each a,b X is a distance
    between a and b, given by d(a,b)
  • We require for all a,b,c X
  • (anti-reflexivity) d(a,b)0 iff ab
  • (symmetry) d(a,b)d(a,b)
  • (triangle inequality) d(a,b) d(a,c) d(c,b)

52
Basic Properties of An Embedding
  • An embedding of a finite metric space (X,d) into
    (Rk,d) is a mapping F X ? Rk,
  • k is the dimensionality
  • d Rk x Rk ? R is the distance metric function
    of the embedding space
  • If we denote the norm in Rk with , distance
    metric d is defined as d(x,y)
  • , p2 for Euclidean Distance
  • Ideally, the distance d(F(a),F(b)) d(a,b)

53
PlanetLab site with Mean rrl pli-br-2.hpl.hp.com
54
PlanetLab site with Minimum rrl, scaling factor
minimizes relative error applied
planetlab1.comet.columbia.edu
Closest neighbor is not preserved
55
PlanetLab site with Maximum rrl
planetlab1.pop-mg.rnp.br
56
Using Other Embeddings
  • Experiments to apply new accuracy metrics using
    Vivaldi and Big Bang Simulation (BBS) (in
    Euclidean and Hyperbolic spaces) systems using
    our PlanetLab sites of ALL-PL
  • 3-Dimensional coordinates generated
  • p2psim simulator used to generate Vivaldi results
  • BBS (Hyperbolic) Two Phases (TP) embedding is
    done using landmarks similar to GNP 15
    landmarks
  • BBS (Hyperbolic) Log-Random and Neighbors (LRN)
    embedding concurrently embed nodes which
    comprises of node pairs whose distance is below a
    certain threshold
  • Number of randomly sampled distance pairs is
    n.log n
Write a Comment
User Comments (0)
About PowerShow.com