Fast Shortest Path Distance Estimation in Large Networks - PowerPoint PPT Presentation

Loading...

PPT – Fast Shortest Path Distance Estimation in Large Networks PowerPoint presentation | free to download - id: 1c5eec-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Fast Shortest Path Distance Estimation in Large Networks

Description:

... Path Distance Estimation in Large Networks. Michalis Potamias Francesco Bonchi. Carlos Castillo Aristides Gionis. Shortest Paths in Large Networks _at_ CIKM 2009. 2 ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 21
Provided by: Mich240
Learn more at: http://cs-people.bu.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Fast Shortest Path Distance Estimation in Large Networks


1
Fast Shortest Path Distance Estimation in Large
Networks
  • Michalis Potamias Francesco Bonchi
    Carlos Castillo Aristides Gionis

2
Context-aware Search
use shortest-path distance in wikipedia
links-graph!
3
Social Search
  • John searches Mary
  • Ranking
  • Mary A
  • Mary B
  • Mary C

use shortest-path distance in friendship graph!
4
Problem and Solutions
  • DB Graph G (V,E)
  • Query Nodes s and t in V
  • Goal Compute fast shortest path d(s,t)
  • Exact Solution
  • BFS - Dijkstra
  • Bidirectional- Dijkstra with A (aka ALT
    methods)
  • Ikeda, 1994 Pohl, 1971 Goldberg and
    Harrelson, SODA 2005
  • Heuristic Solution
  • Random Landmarks
  • Kleinberg et al, FOCS 2004 Vieira et al, CIKM
    2007
  • Better Landmarks!

5
The Landmarks Method
  • Offline
  • Precompute distance of all nodes to a small set
    of nodes (landmarks)
  • Each node is associated with a vector with its
    SP-distance from each landmark (embedding)
  • Query-time
  • d(s,t) ?
  • Combine the embeddings of s and t to get an
    estimate of the query

6
Contribution
  1. Proved that covering the network w. landmarks is
    NP-hard.
  2. Devised heuristics for good landmarks.
  3. Experiments with 5 large real-world networks and
    more than 30 heuristics. Comparison with state of
    the art.
  4. Application to Social Search.

7
Algorithmic Framework
  • Triangle Inequality
  • Observation the case of equality

8
The Landmarks Method
  • Selection Select k landmarks
  • Offline Run k BFS/Dijkstra and store the
    embeddings of each node
  • F(s) ltdG(u1, s), dG(u2, s), , dG(uk, s)gt
    lts1, s2, , sdgt
  • Query-time dG(s,t) ?
  • Fetch F(s) and F(t)
  • Compute minisi ti (i.e. inf of UB) ... in
    time O(k)

9
Example query d(s,t)
d(_,u1) d(_,u2) d(_,u3) d(_,u4)
s 2 4 5 2
t 3 5 1 4

UB 5 9 6 6
LB 1 1 4 2
10
Coverage Using Upper Bounds
  • A landmark u covers a pair (s, t), if u lies on a
    shortest path from s to t
  • Problem Definition find a set of k landmarks
    that cover as many pairs (s,t) in V x V
  • NP-hard
  • k 1 node with the highest betweenness
    centrality
  • k gt 1 greedy set-cover (too expensive)

11
Basic Heuristics
  • Random (baseline)
  • Choose central nodes!
  • Degree
  • Closeness centrality
  • Closeness of u is the average distance of u to
    any vertex in G
  • Caveat The selected landmarks may cover the same
    pairs we need to make sure that landmarks cover
    different pairs!!

12
Constrained Heuristics
  • Spread the landmarks in the graph!
  • Rank all nodes according to Degree or Centrality
  • Iteratively choose the highest ranking nodes.
    Remove h-neighbors of each selected node from
    candidate set
  • Denote as
  • Degree/h
  • Closeness/h
  • Best results for h 1

13
Partitioning-based Heuristics
  • Use partitioning to spread nodes!
  • Utilize any partitioning scheme and
  • Degree/P
  • Pick the node with the highest degree in each
    partition
  • Closeness/P
  • Pick the node with the highest closeness in each
    partition
  • Border/P
  • Pick the nodes close to the border in each
    partition. Maximize the border-value that is
    given from the following formula

14
Border/P
  • d1(u) 3
  • d2(u) 3
  • d3(u) 2
  • b(u)
  • d1(u)(d2(u) d3(u))
  • 3(3 2)

15
Versus Random - error
16
Versus Random - triangulation
17
Versus ALT - efficiency

Ours (10) Operations 20 100 500 50 50
ALT Operations 60K 40K 80K 20K 2K
ALT Visited Nodes 7K 10K 20K 2K 2K
18
Social Search Task
19
Conclusion
  • Heuristic landmarks yield remarkable tradeoffs
    for SP-distance estimation in huge graphs
  • Hard to find the optimal landmarks
  • Border/P and Centrality heuristics outperform
    Random even by a factor of 250.
  • For a 10 error, thousand times faster than state
    of the art exact algorithms (ALT)
  • Novel search paradigms need distance as primitive
  • Approximations should be computed in milliseconds
  • Future Work
  • Provide fast estimation for more graph primitives!

20
Thank you!
  • ?

21
Datasets
  • Five real world datasets

22
Social Search Task
  • A node is searching for nodes that satisfy some
    tag. Target is to rank the relevant nodes
    according to SP-distance from query-issuer node.
  • Example(John, Mary)
  • Tags
  • Flickr Users are tagged with tags they have used
    on photos
  • YM-IM Users are tagged with movies they have
    rated
  • Wikipedia Pages are tagged with terms they
    contain
  • DBLP We randomly tagged authors

23
Why is it important?
  • Shortest-path distance can serve as primitives of
    ranking functions in search tasks.
  • But the graphs are massive and the queries need
    real-time responses
  • Precomputing and storing SP-distance for n2 pairs
    costs O(n2) space
  • Infeasible a graph with 5M nodes corresponds to
    12.5 trillion pairs
  • Use landmarks!

24
Efficiency
  • Selection step
  • Closeness is computed via BFS from random nodes.
  • Offline step
  • Embedding construction costs d BFS traversals
    O(md)
  • Online step is just O(d)!!!
  • In contrast with BFS times

25
Related Work
  • Exact solution BFS/Dijkstra
  • Goldberg SOFSEM07
  • Dijkstra, A, landmarks for LB pruning
  • Road networks
  • Kleinberg et al FOCS04
  • They provide with theoretic bounds for
    approximating SP-distance with random landmarks
  • Landmarks in networks literature
  • Ng and ZhangINFOCOM01 use landmarks for
    internet measurements
  • Tang and Crovella IMC03 define virtual
    landmarks for internet measurements
  • Vieira et al CIKM07
  • They use landmarks indexing for the task of
    social search in a social network
  • They select the landmarks randomly
  • Amer-Yahia et al VLDB08
  • Neighborhood is used in Results-Ranking in
    tagging websites

26
Landmarks Selection Cost
27
Landmarks
  • Assume we are given the landmarks and the
    embedding
  • u F(u)ltd(l1, u), d(l2, u), , d(ld, u)gt
    ltu1, u2, , udgt
  • Bounding d(s,t)
  • Estimates
  • Best Upper Bound
  • Best Lower Bound
  • Middle point

28
Landmarks-1
  • Define centrality
  • Where is 1 if u lies on at least one
    SP from s to t
  • Optimal landmark for Landmarks-1 is the vertex
    that maximizes C(u)
  • ..similar to Betweenness Centrality

29
Landmarks-Cover is NP-hard
  • Proof
  • Consider decision version of Vertex-Cover
  • VC Given G and integer k, decide if there exists
    a cover of at most k vertices for all edges of G
  • Consider set D, a solution to Landmarks-Cover
  • All 1-hop neighbors are covered thus all edges
    are covered, therefore D is also a Vertex-Cover
    for G.
  • Consider set C, a solution to Vertex-Cover
  • Consider any pair of vertices (s,t), and any
    shortest path between them. Some vertices of C
    lie on this path, thus C is also a
    Landmarks-Cover ?
  • Landmarks-d is also hard
  • Approximation Algorithm from Set-Cover runs in
    O(n3)
About PowerShow.com