Title: SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases Mohamed F. Mokbel, Xiaopeng Xiong, Walid G. Aref
1SINA Scalable Incremental Processing of
Continuous Queries in Spatio-temporal
DatabasesMohamed F. Mokbel, Xiaopeng Xiong,
Walid G. Aref
Presented by Nilu Thakur Prasad Sriram
SIGMOD 2004 June 13-18, Paris, France.
2Outline
- Introduction
- Problem definition
- Contributions
- Key Concepts
- Shared Execution
- Hashing
- Invalidation
- Joining
- Validations
- Assumptions
- Rewrite Today
3Introduction (1/3)
- Moving query on stationary objects
Find the nearest gas station(s) within 1 miles of
moving red car
4Introduction (2/3)
- Another Example.
- Moving query on moving objects
Continuously find all police cars within 3 miles
of the moving red car
5Introduction (3/3)
- Another Example.
- Stationary query on moving objects
Continuously find all vehicles within 1 miles of
my house
My House
6Problem Definition
- Input
- Given a large number of mobile/stationary objects
and continuous spatio-temporal queries - Output
- Produce fast, complete and correct results
- Objective
- Continuous evaluation
- Scalability in terms of number of queries
- Report only updates to previous answer
- Constraints
- Any delay in query response might result in
outdated answer - Limited Network bandwidth
7Contributions
- Shared execution paradigm
- Groups similar queries in a query table
- Spatial join between moving queries and moving
objects - Differ from previous approaches of using R-tree
and Q index structure for moving query on moving
object (Instead uses spatial join assuming no
indexing structure) - Incremental evaluation (Most Significant)
- Maintains an in-memory table to store positive
and negative updates - Negative updates may cancel previous positive
update vice versa - Sends a set of updates to queries every T time
8Key concept Shared execution
- Spatial join between moving objects moving
queries
9Shared Execution
Slides Courtesy Mokbel et al
10Key concepts (continued)
- Shared Execution
- Spatial join can use R-tree index for stationary
objects - Q-index can be used for stationary queries
- No index structure when both query and object are
moving - Incremental Evaluation
- Hashing
- Invalidation
- Joining
11State diagram of SINA
Memory Full or Timeout
DISK
HASHING
INVALIDATION
JOINING
Memory Full or Timeout
Stream of moving objects moving queries
Done
Memory-disk Join
In-Memory Hashing
Invalidation
Negative update
Positive update
Negative positive update
Incremental Result
Send Incremental results to queries
Q1 Q2Qn-1 Qn
12Key concept An example to understand
Q1-Q5 represents 5 continuous Range Queries P1-p9
represents objects, White circle Moving objects
(p1,p2,p3,p4) black circle Non-moving
objects dashed line represents moving queries(q1,
q3, q5)
13Key concept Step I-Hashing
- Two in-memory hash table with N buckets for
storing moving objects moving queries - One in-memory query table to keep track of upper
left and lower right corners of query region - Hashing --gt probing --gtstoring --gt (q3,p2)
reported
14Key Concept Step II-Invalidation
Map objects and queries to one or more disk-based
NN grid cells Flush out the buckets containing
moved objects and queries If object maps to same
grid then the object has not moved Else Add the
object entry in this grid cell Look for queries
that contain this object. Remove these objects
from the queries by sending negative
updates. Repeat the same procedure for
invalidating queries.
Query entry
Object entry
15Key concept Step III- Joining
- No additional data structure
- Two spatial join operations for each grid cell
- Join in-memory objects with in-disk queries
- Join in-memory moving queries with in-disk
objects - Send updated answers to clients
- Clear all memory data structures
16Performance Analysis (1)
Answer size
Impact of Grid Size N
17Performance Analysis (2)
Scalability with number of objects
Scalability with number of queries
18Performance Analysis (3)
of moving objects
Scalability with update rates
19Extensibility
- Querying the future
- K-Nearest Neighbor queries
- Aggregate queries
- Out-of-Sync clients
20Assumptions
- No computational capabilities on the Client side
- No Storage capabilities on the client side
- Both the assumptions are fair considering that
many times client uses cheap, low battery and
passive devices that do not have computational or
storage capabilities. - No velocity Assumptions.
- Optimal time interval for sending updates to
queries set to 10 seconds.
21Validations
- Methodology
- Experiments performed on synthetic data
- Used Network-Based Generator of Moving Objects
- Input to generator is road map of city of
Oldenburg, Germany - Theorem-Proving
- Validation criteria
- Comparison with other non-incremental algorithms
based on - Size of the results
- Impact of grid size
- Scalability with number of objects
- Performance in terms of CPU and I/O time
-
- Advantages
- Very much appropriate to check correctness
efficiency of proposed algorithm where rich
datasets with various problem features are not
available. - Disadvantages
- Real world conditions might differ from
experimental results
22Rewrite today
- Assumptions
- No unreasonable assumptions made. In fact,
removes some previous assumptions made by other
related techniques - Preservations
- Incremental way of sending updates
- Shared execution
- Not having assumptions about computational
capabilities of client - Improvements
- Incorporate some techniques to determine the
optimal T i.e., time between sending updates - Through experiments
- Learning based on the past statistics about how
valid the previous updates were - Extend to handle queries involving huge object
histories