Energy Efficient Exact kNN Search in Wireless Broadcast Environments - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Energy Efficient Exact kNN Search in Wireless Broadcast Environments

Description:

p11, p12 are pruned since d11, d12 are both b. Background R-trees (kNN search) ... Now v can be pruned. New Rule (w-opt algorithm) ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 28
Provided by: loca297
Category:

less

Transcript and Presenter's Notes

Title: Energy Efficient Exact kNN Search in Wireless Broadcast Environments


1
Energy Efficient Exact kNN Search in Wireless
Broadcast Environments
  • Bugra Gedik, Aameek Singh, Ling Liu
  • College of Computing
  • Georgia Institute of Technology

2
Motivation
  • There is strong interest in location based
    services (LBSs)
  • Popularity of mobile wireless communications
  • Emergence of positioning technologies like GPS
  • LBSs provide mobile users with information that
    is specialized to their location, ex
  • the nearest gas station or the five closest
    restaurants
  • LBSs are accessed through a common wireless
    channel, which connects the users to the service
    provider
  • Limited network bandwidth -gt Wireless
    broadcast
  • Power constrained mobile devices -gt Energy
    efficient search
  • An interesting problem in this domain is,
  • Investigation of indexing and searching
    mechanisms for energy efficient querying of
    location dependent data in wireless broadcast
    environments

3
Problem Definition
  • kNN Search on the Air problem
  • Broadcasting location dependent data together
    with a spatial index on the wireless medium and
    searching this broadcast on mobile client devices
    in order to answer k nearest neighbor (kNN)
    queries in an energy efficient way
  • Traditional mechanisms do not work well
  • Access to the medium is sequential
  • Methods based on approximate results 30
  • How about exact search?

4
Some Example Applications
  • Commercial Setting
  • The server can broadcast locations of gas
    stations
  • A car driver can pose a query like Give me the
    positions of the 3 nearest gas stations
  • Military Setting
  • The server can build a spatial index on positions
    of military units and broadcast it to the
    battlefield (possibly encrypted)
  • Then the mobile devices can tune in and process
    the broadcast to answer spatial queries
  • One such query can be posed by a tank as Give
    me the positions and names of the 10 nearest
    friendly units
  • kNN search on low dimensional spaces for
    sequential access mediums

5
Background Air Indexing
  • Air indexes
  • In wireless broadcast, it is crucial that energy
    is conserved on the mobile unit side when
    answering queries on the broadcasted data.
  • To alleviate the vast energy consumption problem
    of searching un-indexed data, air indexes were
    introduced.
  • Access Time latency measure
  • the time difference between the point at which a
    query is posed and the point at which result of
    the query is fully computed
  • Tune-in Time energy consumption measure
  • the total time during which the mobile unit was
    listening to data from the wireless medium ( of
    packets read)
  • Air indexing strives to decrease tune-in time
    while keeping the increase in access latency due
    to the broadcast of extra index information
    minimal.

6
Background R-trees
  • R-trees are spatial index structures widely used
    to index n-dimensional points or rectangles
  • Practical for secondary storage, nodes correspond
    to disk blocks (packets in wireless broadcast)
  • R-trees can be thought of as the multidimensional
    version of B-trees

7
Background R-trees (kNN search)
  • A branch and bound algorithm
  • Uses a heuristic to select branches to follow
  • The need for backtracking makes the algorithm ill
    suited for wireless medium lt- sequential access
    nature
  • Metrics used in the algorithm
  • Given a point P and the mbr of a node N,
  • MINDIST(P,N) is the minimum distance from P to
    Ns mbr
  • MAXDIST(P,N) is the maximum distance from P to
    Ns mbr
  • MINMAXDIST(P,N) is the maximum possible minimum
    distance between P and the mbr of closest object
    residing in Ns mbr

8
Background R-trees (kNN search)
  • ItemQueue is a priority queue that stores nodes
    to explore. It is sorted on the MINDIST measure
    and does not have a predefined size.
  • ResultQueue is a list of size maximum k, that
    stores the best k nodes seen so far. It is sorted
    on the MINMAXDIST measure.
  • kthdist MINMAXDIST value of the kth item in
    ResultQueue, infinity if less than k items are
    present. It is an upper bound on the distance of
    the kth nearest neighbor.

kthdist
ItemQueue
ResultQueue
?
(R0,0)
(R1,0) (R2,a)
(R2,b) (R1,c)
c
k2
9
Background R-trees (kNN search)
  • ItemQueue is a priority queue that stores nodes
    to explore. It is sorted on the MINDIST measure
    and does not have a predefined size.
  • ResultQueue is a list of size maximum k, that
    stores the best k nodes seen so far. It is sorted
    on the MINMAXDIST measure.
  • kthdist MINMAXDIST value of the kth item in
    ResultQueue, infinity if less than k items are
    present. It is an upper bound on the distance of
    the kth nearest neighbor.

kthdist
ItemQueue
ResultQueue
?
(R0,0)
(R1,0) (R2,a)
(R2,b) (R1,c)
c
(R4,d) (R3,f) (R5,h) (R2,a)
(R3,g) (R2,b)
b
k2
10
Background R-trees (kNN search)
  • ItemQueue is a priority queue that stores nodes
    to explore. It is sorted on the MINDIST measure
    and does not have a predefined size.
  • ResultQueue is a list of size maximum k, that
    stores the best k nodes seen so far. It is sorted
    on the MINMAXDIST measure.
  • kthdist MINMAXDIST value of the kth item in
    ResultQueue, infinity if less than k items are
    present. It is an upper bound on the distance of
    the kth nearest neighbor.

kthdist
ItemQueue
ResultQueue
?
(R0,0)
(R1,0) (R2,a)
(R2,b) (R1,c)
c
(R4,d) (R3,f) (R5,h) (R2,a)
(R3,g) (R2,b)
b
(R3,f) (R5,h) (R2,a)
(p13,d13) (R3,g)
g
p11, p12 are pruned since d11, d12 are both gt b
k2
11
Background R-trees (kNN search)
  • ItemQueue is a priority queue that stores nodes
    to explore. It is sorted on the MINDIST measure
    and does not have a predefined size.
  • ResultQueue is a list of size maximum k, that
    stores the best k nodes seen so far. It is sorted
    on the MINMAXDIST measure.
  • kthdist MINMAXDIST value of the kth item in
    ResultQueue, infinity if less than k items are
    present. It is an upper bound on the distance of
    the kth nearest neighbor.

kthdist
ItemQueue
ResultQueue
?
(R0,0)
(R1,0) (R2,a)
(R2,b) (R1,c)
c
(R4,d) (R3,f) (R5,h) (R2,a)
(R3,g) (R2,b)
b
(R3,f) (R5,h) (R2,a)
(p13,d13) (R3,g)
g
(R5,h) (R2,a)
(p13,d13) (p1,d1)
d1
Result p13, p1
k2
12
Adapting the Algorithm for Wireless Medium
(w-conv alg)
  • It is not possible to sort ItemQueue on the
    MINDIST measure
  • MINDIST ordering of tree nodes is not consistent
    with their order of appearance in the broadcast
  • Reading a node from the medium based on the
    topmost item in the MINDIST sorted ItemQueue may
    result in leaving behind other tree nodes that
    have entries in ItemQueue
  • As a result, the items in ItemQueue have to be
    sorted based on their appearance order on the
    medium
  • We cannot immediately stop the search when the
    ResultQueue consists of objects only
  • It is not guaranteed that the rest of the items
    in ItemQueue cannot generate an object closer
    than the current k
  • Because the queue is no more sorted on MINDIST
    measure
  • As a result, the search halts when the ItemQueue
    becomes empty

13
Optimizing the Algorithm for Wireless Medium
(w-opt alg)
  • After the root node is processed
  • ItemQueue u, v, w
  • ResultQueue (v,5), (w,14)
  • kthdist 14
  • Node v cannot be pruned
  • In fact, in this example it is possible to prune
    it
  • Knowledge
  • If the minimum fanout of the tree is f, then we
    know that there are at least f l-1 objects under
    node ws mbr

14
Optimizing the Algorithm for Wireless Medium
(w-opt alg)
  • Given this knowledge, at the time when we add w
    to ItemQueue
  • There is one object at most 5 away from the query
    point
  • There are at least f l-11 ? 1 objects at most 10
    away from the query point
  • Then kthdist 10 after the root node is
    processed
  • Now v can be pruned
  • New Rule (w-opt algorithm)
  • While adding a node, say node N at level i, with
    its MINMAXDIST measure to ResultQueue, we also
    insert f i -1 additional entries with the MAXDIST
    measure of node N. ResultQueue is sorted on the
    associated measures of the entries

15
Does Serialization Order Matter?
  • The way a spatial index is organized on the
    broadcast medium
  • of index nodes read by the kNN search, tune in
    time
  • Previous work on range and kNN search in
    broadcast environments 28, 10, 30, 15 used a
    depth first search (dfs) order serialization of
    the tree,
  • conventional algorithm is based on dfs guided by
    a heuristic

Proof In the paper.
16
Does Serialization Order Matter?
  • Result circle The circle formed around the query
    point using its distance from the kth nearest
    neighbor
  • A wrong decision causes high cost in terms of
    tune-in time, as we are trapped in a branch
  • BFS serialization does not share the same
    problem, but it may result in having a large
    ItemQueue size

result circle
17
Histograms for Better Pruning? (w-hist alg)
  • A simple grid-like histogram can be used to
    obtain an upper bound on size of the result
    circle of a kNN query
  • Given a query point P and any set of k objects,
    the distance between P and the furthest object in
    the given set of objects from P, is larger than
    the radius of the result circle
  • As a result, any circle centered at point P that
    covers some set histogram cells such that the sum
    of number of objects located under these
    histogram cells is larger than k, covers the
    result circle.
  • Pick the closest non-empty cells to the query
    point, such that the total number of objects
    contained in these cells is at least k
  • Set r as the maximum of MAXDISTs of all cells
    that are picked
  • Circle centered at P with radius r is called
    the pruning circle (PC)

18
Histograms for Proving Bounds? (w-alg)
  • New Rule (w-hist algorithm)
  • When a new node is to be added to ItemQueue, it
    can be discarded if its mbr does not intersect
    with the pruning circle
  • The histogram should not be too large, otherwise
    the gain from pruning cannot compensate the cost
    of reading the histogram
  • In fact we have a formula for an upper bound on
    the tune-in time (for uniformly distributed
    data)
  • Histogram cell size cannot be taken smaller than
    the value that maximizes the above equation

19
Histograms for Proving Bounds?
  • Histograms help decrease the max. length of the
    ItemQueue through pruning
  • They may not always decrease the tune-in time,
    especially when the distribution is skewed
  • As a result, the tradeoffs require further study

20
Non-spatial Predicates
  • Non-spatial attributes that may need to be taken
    into account when answering queries, ex type of
    an object
  • We consider two techniques to answer kNN queries
    that may specify an optional type constraint
  • The t-index Method
  • Use t separate indexes each indexing only its
    associated type
  • Keep a lookup structure that has pointers for
    each type
  • Really efficient for queries with type constrains
  • Costly for other queries, needs to lookup all
    indexes
  • Improve the performance of queries without type
    constraints, the order of the indexes can be
    selected such that an index corresponding to type
    i comes before the one corresponding to j on the
    broadcast medium if ni gt nj .

21
Non-spatial Predicates
  • The t-hist/1-index Method
  • Use a single spatial index which indexes all
    objects
  • Include t histograms, each one built for a
    particular type
  • Keep a lookup structure that has pointers for
    each type
  • Let the leaf nodes of the tree also mark the type
    of each object
  • Queries with type constraints can be processed by
    only using the pruning circle derived from the
    associated histogram of the given type
  • Really efficient for queries without type
    constraints
  • To improve the performance of queries with type
    constraints, we can prune a node whose mbr does
    not intersect with a non-empty cell of the
    histogram of the associated type
  • We name this latter optimization as
    t-hist/1-index/hp method
  • Note that it is only fair to compare them when
    the total index size (together with histograms)
    occupied by the two methods is same

22
Tune-in Time and Queue Size
  • tune-in time
  • with dfs organization
  • w-opt 33 better than w-conv
  • with bfs organization
  • w-opt 30 better than w-conv
  • bfs organization provides up to 55 improvement
    over dfs organization
  • queue size
  • bfs organization has larger memory requirement
    than dfs
  • w-opt decreases the memory requirement for both
    organizations
  • tune-in time
  • after k 4 w-opt starts providing significant
    improvement over w-conv
  • after k 8 w-opt performs better than w-conv
    independent of the index serialization order
  • queue size
  • memory requirement of bfs organization grows fast
    with increasing k when w-conv is used and
    significantly drops when w-opt is used.

23
Scalability w.r.t. Number of Objects
  • Similar trends are observed
  • w-opt search and bfs organization prevailing over
    other configurations.
  • However, comparing Figure 9 and Figure 7 reveals
    that the improvement in tune-in time increases
    with increasing k
  • In fact one can prove that for NN search (k 1)
    w-opt reduces to w-conv.

24
Merits and Demerits of Histograms
  • Figure 10
  • a single packet histogram improves the tune-in
    time
  • for this setup, histograms with larger sizes do
    not provide better tune-in times
  • the effect of using a histogram is more prominent
    with the dfs organization
  • with histograms the bfs and dfs organizations are
    effectively the same
  • Figure 11
  • with skewed data the latter observation no more
    holds bfs organization with w-opt search
    outperforms all alternatives
  • this does not indicate that histograms are
    useless for skewed data sets
  • a small histogram significantly (more than 50)
    decreases the queue size increases the tune-in
    time marginally (around 5)

25
More on Histograms andSearch with Non-spatial
Predicates
  • Figure 12
  • for larger number of objects it is better to use
    larger histograms
  • dfs organization is more sensitive to the
    increase in the number of objects
  • Figure 15
  • t-index performs much better for queries with
    type constraints
  • t-hist/1-index/hp method (powered by the
    additional histogram pruning), achieves
    significant improvement in tune-in time,
    especially for queries having constraints on
    infrequent types
  • Figure 16
  • t-index performs poorly for queries without type
    constraints
  • Although not outperforming t-index for type
    constrained queries, t-hist/1-index/hp strikes a
    good balance between two types of queries

26
Conclusions
  • The introduced w-opt search technique
    significantly decreases tune-in time,
    irrespective of how the index is organized (bfs
    or dfs)
  • Organizing the index in bfs manner provides
    considerably better tune-in time but a higher
    memory requirement due to queue size
  • Using histograms can improve tune-in time,
    although the improvement is marginal for bfs
    organization as opposed dfs organization
  • The use of histograms can be extended to support
    answering kNN queries with type constraints.

27
The End !
  • Thanks
  • ?? Questions ??
Write a Comment
User Comments (0)
About PowerShow.com