Stream Database Systems' Survey - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Stream Database Systems' Survey

Description:

... tuples between actual tuples of an input stream ... Stream Processing. ... Resource Management, and Approximation in a Data Stream Management System. ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 18
Provided by: csPu
Category:

less

Transcript and Presenter's Notes

Title: Stream Database Systems' Survey


1
Stream Database Systems.Survey
  • Moustafa A. Hammad
  • Jan. 2003

2
Introduction to data streams.
  • New applications
  • Emerging applications (e.g., pervasive computing
    and sensor-based environments),
  • Habitat monitoring- Bird Counter Query Keep a
    count of the number of times birds have entered a
    particular nest. http//hake.stanford.edu/stream/
    sqr/birdmon.html
  • Real-time business processing and network
    management,
  • Monitoring Sales from Different Department
    Stores
  • Report the count of sales items sold by all
    stores in the last two hours categorized by each
    store.
  • Customer Monitoring Query Maintain the fraction
    of packets on a particular backbone link B
    generated by a particular customer network C in
    the past hour.
    http//hake.stanford.edu/stream/sqr/netmon.html
  • Telecommunication management,
  • Video security monitoring and surveillance
    applications.

3
Introduction.
  • New requirements
  • Input streams with infinite supply of data,
    arriving on real-time, not necessary
    synchronized.
  • long-running Continuous Query (CQ), large number
    of concurrent CQs, exact and approximate answers
    are relevant, quality of service (QoS), (e.g,
    real-time requirement).
  • Human Active Database Passive (HADP) versus
    Database Active Human Passive (DAHP) model.
    Aurora-VLDB02
  • Traditional query processing needs rethinking to
    deal with data streams.

4
Background
Time
  • Stream Model
  • Data stream
  • Unlimited sequence of data items, ordered by the
    time-stamp at which each item is added to the
    stream.
  • A single data item is a binary tuple (value(s),
    Sequence_order),
  • Sequence_order
  • time-stamp time of creation (valid time) or
    time when received by the system (transaction
    time), or
  • (1,2,3,.)
  • Sensor (a source of data stream).
  • Continuously send (pushing) or
  • Asynchronously retrieve (based on a request).

t5
t4
t3
t2
t1
Old tuples
New tuples
5
Outline
  • Aurora, Aurora-VLDB03
  • Aurora, Medusa. Aurora-CIDR03
  • STREAM, STREAM-CIDR03
  • Telegraph, Tele-CIDR03
  • Eddy, SteM, PSoup

6
Aurora
Aurora-VLDB02
7
Aurora Operators
  • Windowed operators (group operators)
  • Slide, (e.g., report count over last hour)
  • Tumble, (e.g., report daily count)
  • Latch, like Tumble, however maintain states
    between Tumble windows
  • Resample, interpolate tuples between actual
    tuples of an input stream
  • Filter (e.g., report tuples with input value gt 5)
  • Drop (e.g., drop an input tuple every 10 tuples)
  • Map (e.g., get the square of every input tuple)
  • GroupBy (e.g., categorize stream reading by
    location value)
  • Join (e.g., find similar values in two input data
    streams in the last 30 minutes)

8
Aurora Query Model
Aurora-VLDB02
9
Aurora Run-time Architecture
  • Train scheduler
  • ( a set of heuristics to minimize trips to disk
    and reduces processing time per box)
  • Load Shedding by dropping tuples or applying
    filters

Aurora-VLDB02
10
Aurora and Medusa
  • Aurora Intra-participant Distribution
  • Multiple Aurora nodes each execute sub-query (sub
    network), single administrative control for load
    balancing and dynamic reconfiguration.
  • Medusa Inter-participant Federated Operation
  • Provides service delivery among multiple
    participant. A single participant may be
    configured as Aurora architectures.
  • Further reference, Aurora-CIDR03

11
STREAM
  • Centralized system (similar to Aurora).
  • Carries much similarity with Aurora system.
  • Current published work is distinguished by
    providing new Query definition and semantic over
    data streams.

12
Query Syntax
  • Modified SQL language, Continuous Query Language
    (CQL).
  • A stream in the From clause may be followed by an
    optional sliding window specification, enclosed
    in brackets, and an optional sampling clause.
  • A window specification consists of an optional
    partitioning clause, a mandatory window size, and
    an optional filtering predicate.
  • Example
  • Select Count()
  • From Requests S Partition By S.client_id Rows 10
    Preceding Where S.domain stanford.edu Sample
    (10)
  • Where S.URL Like http//cs.stanford.edu/
  • A similar query without sample clause appears in
    STREAM-CIDR03

13
Query Semantic
STREAM-CIDR02
14
Telegraph
  • Emphasize on execution model for continuous
    queries over data streams
  • Introduces Eddy, SteM, Psoup,
  • Integrating the above technologies in a dataflow
    processing system using

15
Eddy SteM
Tele-CIDR03
  • Adaptive routing and processing using eddies and
    SteM.
  • Eddy routes tuples to next operator. Each SteM
    acts as half join in traditional join operators.

16
PSoup
Tele-CIDR03
17
Related Work
  • Aurora-CIDR03 M. Cherniack, H. Balakrishnan,
    M. Balazinska, D. Carney, U. Cetintemel, Y. Xing,
    S. Zdonik. Scalable Distributed Stream
    Processing. In proceedings of the First Biennial
    Conference on Innovative Database Systems
    (CIDR'03), Asilomar, CA, January 2003.
  • Aurora-VLDB02 D. Carney, U. Cetintemel, M.
    Cherniack, C. Convey, S. Lee, G. Seidman, M.
    Stonebraker, N. Tatbul, S. Zdonik. Monitoring
    Streams A New Class of Data Management
    Applications. In proceedings of the 28th
    International Conference on Very Large Data Bases
    (VLDB'02), August 20-23, Hong Kong, China.
  • Tele-CIDR03 TelegraphCQ Continuous Dataflow
    Processing for an Uncertain World. Sirish
    Chandrasekaran, Owen Cooper, Amol Deshpande,
    Michael J. Franklin, Joseph M. Hellerstein, Wei
    Hong, Sailesh Krishnamurthy, Samuel R. Madden,
    Vijayshankar Raman, Fred Reiss, and Mehul A.
    Shah, 1st CIDR Conf., Jan 2003, Asilomar, CA.
  • STREAM-CIDR03 R. Motwani, J. Widom, A. Arasu,
    B. Babcock, S. Babu, M. Datar, G. Manku, C.
    Olston, J. Rosenstein, and R. Varma. Query
    Processing, Resource Management, and
    Approximation in a Data Stream Management System.
    In Proc. of the 2003 Conference on Innovative
    Data Systems Research (CIDR), January 2003.
Write a Comment
User Comments (0)
About PowerShow.com