Loading...

PPT – Fast Shortest Path Distance Estimation in Large Networks PowerPoint presentation | free to download - id: 1c5eec-ZDc1Z

The Adobe Flash plugin is needed to view this content

Fast Shortest Path Distance Estimation in Large

Networks

- Michalis Potamias Francesco Bonchi

Carlos Castillo Aristides Gionis

Context-aware Search

use shortest-path distance in wikipedia

links-graph!

Social Search

- John searches Mary
- Ranking
- Mary A
- Mary B
- Mary C

use shortest-path distance in friendship graph!

Problem and Solutions

- DB Graph G (V,E)
- Query Nodes s and t in V
- Goal Compute fast shortest path d(s,t)
- Exact Solution
- BFS - Dijkstra
- Bidirectional- Dijkstra with A (aka ALT

methods) - Ikeda, 1994 Pohl, 1971 Goldberg and

Harrelson, SODA 2005 - Heuristic Solution
- Random Landmarks
- Kleinberg et al, FOCS 2004 Vieira et al, CIKM

2007 - Better Landmarks!

The Landmarks Method

- Offline
- Precompute distance of all nodes to a small set

of nodes (landmarks) - Each node is associated with a vector with its

SP-distance from each landmark (embedding) - Query-time
- d(s,t) ?
- Combine the embeddings of s and t to get an

estimate of the query

Contribution

- Proved that covering the network w. landmarks is

NP-hard. - Devised heuristics for good landmarks.
- Experiments with 5 large real-world networks and

more than 30 heuristics. Comparison with state of

the art. - Application to Social Search.

Algorithmic Framework

- Triangle Inequality
- Observation the case of equality

The Landmarks Method

- Selection Select k landmarks
- Offline Run k BFS/Dijkstra and store the

embeddings of each node - F(s) ltdG(u1, s), dG(u2, s), , dG(uk, s)gt

lts1, s2, , sdgt - Query-time dG(s,t) ?
- Fetch F(s) and F(t)
- Compute minisi ti (i.e. inf of UB) ... in

time O(k)

Example query d(s,t)

d(_,u1) d(_,u2) d(_,u3) d(_,u4)

s 2 4 5 2

t 3 5 1 4

UB 5 9 6 6

LB 1 1 4 2

Coverage Using Upper Bounds

- A landmark u covers a pair (s, t), if u lies on a

shortest path from s to t - Problem Definition find a set of k landmarks

that cover as many pairs (s,t) in V x V - NP-hard
- k 1 node with the highest betweenness

centrality - k gt 1 greedy set-cover (too expensive)

Basic Heuristics

- Random (baseline)
- Choose central nodes!
- Degree
- Closeness centrality
- Closeness of u is the average distance of u to

any vertex in G - Caveat The selected landmarks may cover the same

pairs we need to make sure that landmarks cover

different pairs!!

Constrained Heuristics

- Spread the landmarks in the graph!
- Rank all nodes according to Degree or Centrality
- Iteratively choose the highest ranking nodes.

Remove h-neighbors of each selected node from

candidate set - Denote as
- Degree/h
- Closeness/h
- Best results for h 1

Partitioning-based Heuristics

- Use partitioning to spread nodes!
- Utilize any partitioning scheme and
- Degree/P
- Pick the node with the highest degree in each

partition - Closeness/P
- Pick the node with the highest closeness in each

partition - Border/P
- Pick the nodes close to the border in each

partition. Maximize the border-value that is

given from the following formula

Border/P

- d1(u) 3
- d2(u) 3
- d3(u) 2
- b(u)
- d1(u)(d2(u) d3(u))
- 3(3 2)

Versus Random - error

Versus Random - triangulation

Versus ALT - efficiency

Ours (10) Operations 20 100 500 50 50

ALT Operations 60K 40K 80K 20K 2K

ALT Visited Nodes 7K 10K 20K 2K 2K

Social Search Task

Conclusion

- Heuristic landmarks yield remarkable tradeoffs

for SP-distance estimation in huge graphs - Hard to find the optimal landmarks
- Border/P and Centrality heuristics outperform

Random even by a factor of 250. - For a 10 error, thousand times faster than state

of the art exact algorithms (ALT) - Novel search paradigms need distance as primitive
- Approximations should be computed in milliseconds
- Future Work
- Provide fast estimation for more graph primitives!

Thank you!

- ?

Datasets

- Five real world datasets

Social Search Task

- A node is searching for nodes that satisfy some

tag. Target is to rank the relevant nodes

according to SP-distance from query-issuer node. - Example(John, Mary)
- Tags
- Flickr Users are tagged with tags they have used

on photos - YM-IM Users are tagged with movies they have

rated - Wikipedia Pages are tagged with terms they

contain - DBLP We randomly tagged authors

Why is it important?

- Shortest-path distance can serve as primitives of

ranking functions in search tasks. - But the graphs are massive and the queries need

real-time responses - Precomputing and storing SP-distance for n2 pairs

costs O(n2) space - Infeasible a graph with 5M nodes corresponds to

12.5 trillion pairs - Use landmarks!

Efficiency

- Selection step
- Closeness is computed via BFS from random nodes.
- Offline step
- Embedding construction costs d BFS traversals

O(md) - Online step is just O(d)!!!
- In contrast with BFS times

Related Work

- Exact solution BFS/Dijkstra
- Goldberg SOFSEM07
- Dijkstra, A, landmarks for LB pruning
- Road networks
- Kleinberg et al FOCS04
- They provide with theoretic bounds for

approximating SP-distance with random landmarks - Landmarks in networks literature
- Ng and ZhangINFOCOM01 use landmarks for

internet measurements - Tang and Crovella IMC03 define virtual

landmarks for internet measurements - Vieira et al CIKM07
- They use landmarks indexing for the task of

social search in a social network - They select the landmarks randomly
- Amer-Yahia et al VLDB08
- Neighborhood is used in Results-Ranking in

tagging websites

Landmarks Selection Cost

Landmarks

- Assume we are given the landmarks and the

embedding - u F(u)ltd(l1, u), d(l2, u), , d(ld, u)gt

ltu1, u2, , udgt - Bounding d(s,t)
- Estimates
- Best Upper Bound
- Best Lower Bound
- Middle point

Landmarks-1

- Define centrality
- Where is 1 if u lies on at least one

SP from s to t - Optimal landmark for Landmarks-1 is the vertex

that maximizes C(u) - ..similar to Betweenness Centrality

Landmarks-Cover is NP-hard

- Proof
- Consider decision version of Vertex-Cover
- VC Given G and integer k, decide if there exists

a cover of at most k vertices for all edges of G - Consider set D, a solution to Landmarks-Cover
- All 1-hop neighbors are covered thus all edges

are covered, therefore D is also a Vertex-Cover

for G. - Consider set C, a solution to Vertex-Cover
- Consider any pair of vertices (s,t), and any

shortest path between them. Some vertices of C

lie on this path, thus C is also a

Landmarks-Cover ? - Landmarks-d is also hard
- Approximation Algorithm from Set-Cover runs in

O(n3)