Continuous%20Query%20Languages%20for%20DSMS - PowerPoint PPT Presentation

About This Presentation
Title:

Continuous%20Query%20Languages%20for%20DSMS

Description:

In PODS, Santa Barbara, CA, May 2001. [25]P. Seshadri. ... Tribeca: A stream database manager for network traffic analysis. In VLDB, 1996. ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 32
Provided by: mir135
Learn more at: http://web.cs.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: Continuous%20Query%20Languages%20for%20DSMS


1
Continuous Query Languages for DSMS
  • CS240B Notes
  • by
  • Carlo Zaniolo

2
DSMS Languages for Continuous Queries
  • Relational Algebra Operators
  • SQL and User-Defined aggregates
  • The Blocking problem
  • The expressive Power problem
  • XML streams and their query languages.

3
CQLs for DSMS
  • Most of DSMS projects use SQL for continuous
    queriesfor good reasons, since
  • Many applications span data streams and DB tables
  • A CQL based on SQL will be easier to learn use
  • Moreover the fewer the differences the better!
  • But DSMS were designed for persistent data and
    transient queries---not for persistent queries on
    transient data
  • Adaptation of SQL and its enabling technology
    presents many research challenges
  • Lack of expressive powereven worse now since
    only nonblocking operators are allowed.

4
Continuous Query Graph many componentsarbitrary
DAGs
?
Source1
U
Sink
Source2
s
5
Relational Algebra Operators
  • Stored data
  • Selection, Projection
  • Union
  • Join (including X) on tables
  • Set Difference
  • Aggregates
  • Traditional Blocking aggregates
  • OLAP functions on windows or unlimited preceding
  • Data Streams
  • ... same
  • Union by Sort-Merging on timestamps
  • Join of Stream with table
  • Window joins on streams (timestamps merged into 1
    column)
  • No stream difference (blockingdiff of stream
    with table OK).
  • Aggregates
  • No blocking aggregate
  • OLAP functions on windows or unlimited preceding
  • Slides, and tumbles.

6
Bolts and Nuts
  • create stream bids(bid, item, offer, Time)
  • create stream mybids as (select bid, offer, Time
  • from bids
  • where itembolt
  • union
  • select bid, offer,
    Time
  • from bids
  • where itemnut)
  • Result same as select bid, offer, Time where
  • item bolt or itemnut

7
Joins
  • We could create a stream called interesting bids
    by say joining bids with the interesting_items
    table.
  • We next find the bolt bids for which there was a
    nut bid offered in the last 5 minutes for the
    same price.
  • create stream selfjoinbids as
  • (select S1.bid, S1.offer, S2.bid,
    S2.Time
  • from bids as S1, bids as S2 window of
    5 minutes
  • where S1.itembolt and S2.itemnut and
  • S1.offerS2.offer)
  • The window condition implies that S1.Time gt
    S2.Time and S2.Time gt S1.Time-5 minutes.
  • Windows on both streams are used very often.

\
8
Processing Unions
  • Union When tuples are present at all inputs,
    select one with minimal timestamp and
  • Production add this tuple to the output, and
  • Consumption remove it from the input.

9
Window Joins
A
B
  • Window Join of Stream A and Stream BWhen tuples
    are present at both inputs, and the timestamp of
    A is less or equal than that of B, then perform
    the following operations (symmetric operations
    are performed if timestamp of B is less or equal
    than that of A)
  • Production compute the join of the tuple in A
    with the tuples in W(B) and add the resulting
    tuples to output buffer (these tuple have the
    same timestamp a the tuple in A)
  • Consumption the current tuple in A is removed
    from the input and added to the window buffer
    W(A) (from which the expired tuples are also
    removed)

10
Relational Algebra Operators
  • Stored data
  • Selection, Projection
  • Union
  • Join (including X) on tables
  • Set Difference
  • Aggregates
  • Traditional Blocking aggregates
  • OLAP functions on windows or unlimited preceding
  • Data Streams
  • ... same
  • Union by Sort-Merging on timestamps
  • Join of Stream with table
  • Window joins on streams (timestamps merged into 1
    column)
  • No stream difference (blockingdiff of stream
    with table OK).
  • Aggregates
  • No blocking aggregate
  • OLAP functions on windows or unlimited preceding
  • Slides, and tumbles.
  • Including UDAs

11
User-Defined AggregatesMax Power via Min SQL
Extensions
  • Windows (logical, physical, slides, tumbles,)
    flexible synopses that solve the blocking problem
    for aggregates
  • DSMS only support these constructs on built-in
    aggregates
  • ESL is the first to support the complete
    integration of these two
  • User Defined Aggregates (UDAs) the key to power
    and extensibility, and
  • And thus can support data mining,
  • XML,
  • sequences not supported by other DSMS
  • One framework for aggregates and windows, whether
    they are built-ins or user-defined, and
    independent on the language used to define
    them.

12
Defining Traditional Aggregates
  • Specification consists of 3 blocks of code---
    Written in an external PL (as DBMS and other DSMS
    do), or
  • In SQL itself (SQL becomesTuring Complete!)
  • INITIALIZE
  • Executed upon the arrival of the first tuple
  • ITERATE
  • Executed upon the arrival of each subsequent
    tuples (an incremental computation suitable for
    streams)
  • TERMINATE
  • Executed after the end of the relation/stream has
    been reached
  • Invocation SELECT myavg(start_price)  FROM
    OpenAuction

13
The UDA AVG in SQL
  • AGGREGATE avg(Next Int) Real
  • TABLE state(tsum Int, cnt Int)
  • INITIALIZE
  • INSERT INTO state VALUES (Next, 1)
  • ITERATE
  • UPDATE state
  • SET tsumtsumNext, cntcnt1
  • TERMINATE
  • INSERT INTO RETURN
  • SELECT tsum/cnt FROM state
  • INSERT INTO RETURN in TERMINATE ? a blocking
    UDA

14
NonBlocking UDA AVG of last 200 Values
  • AGGREGATE myavg(Next Int) Real
  • TABLE state(tsum Int, cnt Int)
  • INITIALIZE
  • INSERT INTO state VALUES (Next, 1)
  • ITERATE
  • UPDATE state SET tsumtsumNext, cntcnt1
  • INSERT INTO RETURN
  • SELECT tsum/cnt FROM state
  • WHERE cnt 200 0
  • UPDATE state SET tsumNext, cnt1
  • WHERE cnt 200 1
  • TERMINATE
  • Empty TERMINATE Denotes a non-blocking UDA

15
UDAs in ESL
  • In ESL user-defined Aggregates (UDAs) can be
    defined directly in SQL, rather than in a PL
  • Native extensibility in SQL via UDAs (which can
    also be defined in a PL for better performance)
  • No impedance mismatch
  • Access to DB tables from UDAs
  • Data Independence and optimization
  • Good ease of use and performance
  • Turing completeness nb-completeness.

16
Data Intensive Applications UDAs
  • Complex Applications can expressed concisely,
    with good performance
  • ATLAS a single-user DBMS developed at UCLA.
  • Support for SQL with UDAs
  • On top of Berkeley-DB record manager.
  • Data Mining Algorithms in ATLAS
  • Decision Tree Classifiers 18 lines of codes
  • APriori 40 lines of codes
  • Modest overhead lt50 w.r.t procedural UDA
  • Data Stream Applications in ESL
  • Data Stream Mining, approximate aggregates,
    sketches, histograms,

17
SQL2003 OLAP FunctionsAggregates on Windows
CREATE STREAM ClosedAuction (/auction closings
/ itemID, /id of
the item in this auction./ buyerID
/buyer of this item./) Final price real
/final price of the item /,
Current_time) order by source
Auctions
  • For each seller, show the average selling price
    over the last 10 items sold (physical window)

CREATE STREAM LastTenAvg SELECT
sellerID, AVG(price) OVER(PARTITION BY sellerID
ROWS 9 PRECEDING), Current_time FROM
ClosedPrice
18
Optimizing Window AVG in ESL
  • For each expired tuple decrease the count by one
    and the sum by the expired valueworks for
    logical physical windows
  • WINDOW AGGREGATE avg(Next Real) Real TABLE
    state(tsum Int, cnt Real) TABLE
    inwindow(wnext Real)
  • INITIALIZE INSERT INTO state
    VALUES (Next, 1)
  • ITERATE UPDATE state SET tsumtsumNext,
    cntcnt1 INSERT INTO RETURN SELECT
    tsum/cnt FROM state
  • EXPIRE /if there are expired tuples, take the
    oldest / UPDATE state SET cnt
    cnt-1, tsum tsum (select wnext
    FROM inwindow
    WHERE oldest(inwindow))

19
MAX
  • System maintains inwindow
  • Remove dominated (less older) values
  • The oldest is always the max.

WINDOW AGGREGATE max (Next Real) Real TABLE
inwindow(wnext real) INITIALIZE etc.
/system adds new tuples to inwindow/
ITERATE DELETE FROM inwindow WHERE
wnext ltNext INSERT INTO RETURN
SELECT wnext FROM inwindow
WHERE oldest(inwindow)
EXPIRE /expired tuples removed
automatically/
20
For Each Aggregate two versions
  • The traditional Base aggregate with terminate
  • The Window aggregate with inwindow and expire.
  • These definitions will take care of both logical
    and physical windows.
  • But there are more complications slides and
  • tumbles.

21
Slides and Tumbles
  • Every two minutes, show the average selling
    price over the last 10 minutes (logical window)


CREATE STREAM LastTenAvg SELECT sellerID,
max(price) OVER(RANGE 10 MINUTE PRECEDING
SLIDE 2
MINUTE), Current_time FROM ClosedPrice
Here the window is W10 and the slide is S2.
Tumble When S W
22
SLIDEs
  • The slide constructs divides a window into panes,
    results only returned at the end of each pane
  • Slide is conducive to optimization.
  • Combine summaries into the desired aggregation
  • E.g. MAX(1, 2, 3, 4) MAX(MAX(1,2), MAX(3,4))
    4
  • I.e., for MAX, we can perform MAX on subsets of
    numbers as local summaries, then combine them
    together to get the true MAX
  • Proposed before but what constructs should be
    used to integrate these concepts into the
    language?

23
Slides Tumbles--Examples
  • Tumble where the SLIDE size is equal or larger
    than the window size
  • E.g. Once every 50 tuples, compute and return
    average over the last 10 tuples
  • Easy to optimize
  • Skip the first 40 tuples of every 50 tuples, and
    compute the blocking base version of the
    aggregate on the last 10
  • Slide where slide size is smaller than the
    window size
  • E.g. Once every 10 tuples, compute and return
    average over the last 50 tuples
  • Naïve implementation--not optimized
  • Perform incremental maintenance on every incoming
    tuple
  • Ignore RETURN statements for most incoming tuples
  • Only invoke RETURN once every 10 tuples

24
Pane-based SLIDE Optimization
  • Two-level cascading aggregates using two existing
    aggregates
  • Perform sub-aggregation inside each pane using
    the base aggregate No need for incremental
    maintenance here
  • Computed with a blocking aggregate once for each
    pane
  • Combine the summary tuples using the window
    aggregate that returns on every incoming tuple
    (non-blocking)
  • With incremental maintenance here
  • At any time, only the last un-finished pane needs
    to store data tuples
  • all finished panes are reduced to one reusable
    summary tuple

25
Pane-based SLIDE optimization
  • Example SUM with window size 50 tuples, and
    slide size 10 tuples
  • First create a stream of summary tuples using
    base aggregate
  • CREATE STREAM temp AS (
  • SELECT itemID, base_max(sale_price) as s
  • OVER(PARTITION BY itemID ROWS 9 PRECEDING
    SLIDE 10)
  • FROM Auction)
  • Then apply the window version of the aggregate
  • SELECT itemID, window_max(s)
  • OVER(PARTITION BY itemID ROWS 4 PRECEDING)
  • FROM temp
  • This simple approach can be used to implement
    very complex aggregations (e.g. ensemble
    classifiers)
  • Applies uniformly to logical/physical windows
    defined in SQL or in an external language

26
Summary
  • Logical, Physical x tumble, slide,
    unlimited_preceding
  • Six different types of calls, supported by two
    definitions
  • Both SQL or procedural languages can be used in
    the definition.

27
Window UDAs vs. Base UDAs
  • Base UDAs
  • called as traditional SQL-2 aggregates, with
  • optional GROUP BY
  • Window UDAs
  • called with SQL2003 OVER clause
  • logical or physical windows
  • optional PARTITION BY and SLIDE clauses in ESL
  • Clear semantics and optimization rules unify
  • UDAsSQL or PL-defined, algebraic or not
  • window (logical physical), slice, tumbles,
    etc.
  • System and user roles in optimization.

28
Window UDAs Physical Optimization
  • The Stream Mill System provides efficient support
    for
  • Management of new expiring tuples in buffer
  • Main memory intelligent paging into disk
  • Events caused by tuple expiration
  • Users can access the buffer as the table called
    inwindow

29
Conclusion
  • Language Technology
  • ESL a very powerful language for data stream and
    DB applications
  • Simple semantics and unified syntax conforming
    to SQL2003 standards
  • Strong case for the DB-oriented approach to data
    streams
  • System Technology
  • Some performance-oriented techniques
    well-developede.g., buffer management for
    windows
  • For others work is still in progressstay tuned
    for latest news
  • Stream Mill is up and running http//wis.cs.ucla.
    edu/stream-mill

30
  • The End
  • THANK YOU !

31
References
  • 1ATLaS user manual. http//wis.cs.ucla.edu/atlas
    .
  • 2SQL/LPP A Time Series Extension of SQL Based
    on Limited Patience Patterns, volume 1677 of
    Lecture Notes in Computer Science. Springer,
    1999.
  • 4A. Arasu, S. Babu, and J. Widom. An abstract
    semantics and concrete language for continuous
    queries over streams and relations. Technical
    report, Stanford University, 2002.
  • 5B. Babcock, S. Babu, M. Datar, R. Motwani, and
    J. Widom. Models and issues in data stream
    systems. In PODS, 2002.
  • 9D. Carney, U. Cetintemel, M. Cherniack, C.
    Convey, S. Lee, G. Seidman, M. Stonebraker, N.
    Tatbul, and S. Zdonik. Monitoring streams - a new
    class of data management applications. In VLDB,
    Hong Kong, China, 2002.
  • 10J. Celko. SQL for Smarties, chapter Advanced
    SQL Programming. Morgan Kaufmann, 1995.
  • 11S. Chandrasekaran and M. Franklin. Streaming
    queries over streaming data. In VLDB, 2002.
  • 12J. Chen, D. J. DeWitt, F. Tian, and Y. Wang.
    NiagaraCQ A scalable continuous query system for
    internet databases. In SIGMOD, pages 379-390, May
    2000.
  • 13C. Cranor, Y. Gao, T. Johnson, V. Shkapenyuk,
    and O. Spatscheck. Gigascope A stream database
    for network applications. In SIGMOD Conference,
    pages 647-651. ACM Press, 2003.
  • 14Lukasz Golab and M. Tamer Özsu. Issues in
    data stream management. ACM SIGMOD Record,
    32(2)5-14, 2003.
  • 15J. M. Hellerstein, P. J. Haas, and H. J.
    Wang. Online aggregation. In SIGMOD, 1997.
  • 16 Yijian Bai, Hetal Thakkar, Chang Luo, Haixun
    Wang, Carlo Zaniolo A Data Stream Language and
    System Designed for Power and Extensibility.
    Proc. of the ACM 15th Conference on Information
    and Knowledge Management (CIKM'06), 2006
  • 17 Yijian Bai, Hetal Thakkar, Haixun Wang and
    Carlo Zaniolo Optimizing Timestamp Management in
    Data Stream Management Systems. ICDE 2007.

32
References (Cont.)
  • 18 Yan-Nei Law, Haixun Wang, Carlo Zaniolo
    Query Languages and Data Models for Database
    Sequences and Data Streams. VLDB 2004 492-503
  • 19 Sam Madden, Mehul A. Shah, Joseph M.
    Hellerstein, and Vijayshankar Raman. Continuously
    adaptive continuous queries over streams. In
    SIGMOD, pages 49-61, 2002.
  • 20R. Motwani, J. Widom, A. Arasu, B. Babcock,
    M. Datar S. Babu, G. Manku, C. Olston, J.
    Rosenstein, and R. Varma. Query processing,
    approximation, and resource management in a data
    stream management system. In First CIDR 2003
    Conference, Asilomar, CA, 2003.
  • 21R. Ramakrishnan, D. Donjerkovic, A.
    Ranganathan, K. Beyer, and M. Krishnaprasad.
    SRQL Sorted relational query language, 1998.
  • 23Reza Sadri, Carlo Zaniolo, and Amir M.
    Zarkesh andJafar Adibi. A sequential pattern
    query language for supporting instant data
    minining for e-services. In VLDB, pages 653-656,
    2001.
  • 24Reza Sadri, Carlo Zaniolo, Amir Zarkesh, and
    Jafar Adibi. Optimization of sequence queries in
    database systems. In PODS, Santa Barbara, CA, May
    2001.
  • 25P. Seshadri. Predator A resource for
    database research. SIGMOD Record, 27(1)16-20,
    1998.
  • 26P. Seshadri, M. Livny, and R. Ramakrishnan.
    SEQ A model for sequence databases. In ICDE,
    pages 232-239, Taipei, Taiwan, March 1995.
  • 27Praveen Seshadri, Miron Livny, and Raghu
    Ramakrishnan. Sequence query processing. In ACM
    SIGMOD 1994, pages 430-441. ACM Press, 1994.
  • 28M. Sullivan. Tribeca A stream database
    manager for network traffic analysis. In VLDB,
    1996.
  • 29D. Terry, D. Goldberg, D. Nichols, and B.
    Oki. Continuous queries over append-only
    databases. In SIGMOD, pages 321-330, 6 1992.
  • 30Peter A. Tucker, David Maier, Tim Sheard, and
    Leonidas Fegaras. Exploiting punctuation
    semantics in continuous data streams. IEEE Trans.
    Knowl. Data Eng, 15(3)555-568, 2003.
  • 31Haixun Wang and Carlo Zaniolo. ATLaS a
    native extension of SQL for data minining. In
    Proceedings of Third SIAM Int. Conference on Data
    MIning, pages 130-141, 2003.
Write a Comment
User Comments (0)
About PowerShow.com