Tuning the top-k view update process - PowerPoint PPT Presentation

About This Presentation
Title:

Tuning the top-k view update process

Description:

Problem of maintaining materialized top-k views, when updates occur in the base relation ... The crux of the approach is to materialize an appropriate number of ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 34
Provided by: OZZY
Category:
Tags: pref | process | top | tuning | update | view

less

Transcript and Presenter's Notes

Title: Tuning the top-k view update process


1
Tuning the top-k view update process
  • Eftychia Baikousi
  • Panos Vassiliadis


2
Forecast
  • Problem of maintaining materialized top-k views,
    when updates occur in the base relation
  • Extra difficulty address the problem in the
    presence of high deletion rates
  • The crux of the approach is to materialize an
    appropriate number of extra tuples kcomp to
    sustain the deletion rates that are drastically
    higher than average
  • The correct estimation fine tuning of kcomp is
    not obvious
  • We use appropriate probabilistic methods

3
Contents
  • Motivation Problem Definition
  • Overview of our Method
  • Computation of rates affecting the view
  • Computation of kcomp
  • Fine tuning kcomp
  • Experiments
  • Conclusions

4
Contents
  • Motivation Problem Definition
  • Overview of our Method
  • Computation of rates affecting the view
  • Computation of kcomp
  • Fine tuning kcomp
  • Experiments
  • Conclusions

5
Top-k query
  • Given
  • a relation R (id, x1, x2, x3) and
  • a query Q, sum(x1, x2, x3)
  • Find k tuples with highest grades according to Q

R
id x1 x2 x3
a 0.3 0.6 0.7
b 0.2 0.3 0.4
c 0.4 0.5 0.9
d 0.7 0.6 0.1
sum
1.6
0.9
1.8
1.4
Top-2 tuples
6
Motivating Example
  • Shopping Center
  • Customers sign in with a palmtop (PDA)
  • Need for advertisements Special offers to
    Customers
  • Given
  • relation Customers (id, name, age, salary, )
  • materialized view V of the top-2 (Younger and
    Highly paid Customers) according to the query Q
    - age 2salary
  • Maintain the view V
  • Customers sign in and out (e.g., train
    departures, working hours)

id name age salary
1 John 18 20
2 Mary 42 25
3 Bill 26 35
4 Peter 57 37
Q
22
8
44
17
name Q
Bill 44
John 22
Customers
V
7
Problem definition
  • Given
  • a base relation R (ID, X, Y) that originally
    contains N tuples,
  • a materialized view V
  • that contains top-k tuples of the form (id, val)
    where val is the score according to a function
    Q(x,y)ax by and a, b are constant parameters,
  • the update ratios ?ins, ?del and ?upd for
    insertions, deletions and updates respectively
    over the base relation R,
  • Compute
  • kcomp
  • that is of the form kcomp k ?k
  • Such that
  • the view will contain at least k tuples,
  • k kcomp, with probability p, after a period T

V
id Q

8
Related Work
  • Ke Yi, Hai Yu, Jun Yang, Gangqiang Xia, Yuguo
    Chen
  • Efficient Maintenance of Materialized Top-k
    Views, ICDE 03
  • Maintain a materialized top-k view when updates
    occur in the base table
  • Compute a kmax (instead of the necessary k)
  • adjusted at runtime so a refill query is rarely
    needed
  • formulates the problem through a random walk model
  • The method is theoretically guaranteed to work
    well only when the probabilities
  • of insertions and deletions are equal, pinspdel
  • of insertions are more frequent than deletions
    pinsgtpdel
  • There is no quality-of-service guarantee when
  • deletions are more probable than insertions,
    pinsltpdel

9
Motivating Example
  • Customers sign in and out
  • Due to train departures, working hours
  • At certain time periods, deletions are more
    probable than insertions pinsltpdel
  • The view will not contain at least k tuples

Customers
id name age salary
1 John 18 20
2 Mary 42 25
3 Bill 26 35
4 Peter 57 37
Q
22
8
44
17
V
name Q
Bill 44
John 22
10
Contents
  • Motivation Problem Definition
  • Overview of our Method
  • Computation of rates affecting the view
  • Computation of kcomp
  • Fine tuning kcomp
  • Experiments
  • Conclusions

11
Overview of the method
  1. Compute the ratios of the incoming source updates
    that affect the view
  2. Compute kcomp
  3. Fine tune kcomp

12
Empirical Cumulative Distribution Function ECDF
  • ECDF is a non parametric cumulative distribution
    function that adapts itself to the data
  • Definition
  • Fn(x)
  • represents the proportion of observations in a
    sample less than or equal to x
  • assigns the probability 1/n to each of n
    observations in the sample
  • estimates the true population proportion F(x)

13
Computation of update rates that affect V
  • Given
  • a relation Customers (id, name, age, salary, )
    having N4 tuples
  • a materialized view V containing top-2 tuples
    (k2) of the form (id, Q) where Q -age 2salary
    is the score
  • Update ratios ?ins1, ?del2, ?upd0
  • Find ?ins_aff and ?del_aff (insertions
    deletions affecting the view)

Customers
V
id name age salary
1 John 18 20
2 Mary 42 25
3 Bill 26 35
4 Peter 57 37
Q
22
8
44
17
name Q
Bill 44
John 22
14
Computation of update rates that affect V
  • Given
  • N4, ?ins1, ?del2, ?upd0
  • We compute the following
  • updates are treated as a combination of deletions
    and insertions
  • from ECDF the probability of a new tuple
    affecting the view
  • Ratios affecting the view

15
Overview of the method
  1. Compute the ratios of the incoming source updates
    that affect the view
  2. Compute kcomp
  3. Fine tune kcomp

16
Computation of kcomp
id Q
  • Compute kcomp
  • such that it will guarantee that the view will
    contain at least k tuples, k kcomp, with
    probability p, after a period of operation T
  • that is of the form kcomp k ?k

Customers
V
id name age salary
1 John 18 20
2 Mary 42 25
3 Bill 26 35
4 Peter 57 37
Q
22
8
44
17
name Q
Bill 44
John 22
Peter 17
17
Computation of kcomp
Customers
V
id name age salary
1 John 18 20
2 Mary 42 25
3 Bill 26 35
4 Peter 57 37
5 Kate 25 30
Q
22
8
44
17
25
name Q
Bill 44
Kate 25
John 22
Peter 17
  • There is 1 insertion and 2 deletions affecting
    the view
  • Tuple (5, Kate, 25, 30) is inserted and
  • Tuples (3, Bill, 26, 35) and (4, Peter, 57, 37)
    are deleted from the view
  • The view will contain 2 tuples, as initially
    needed

18
Overview of the method
  1. Compute the ratios of the incoming source updates
    that affect the view
  2. Compute kcomp
  3. Fine tune kcomp

19
Fine tune kcomp
  • kcomp is expressed as a formula depending on
  • ?ins_aff and ?del_aff the ratios of insertions
    and deletions affecting the view
  • The probability of a tuple affecting the view may
    vary according to probabilistic properties
  • Fine tune kcomp by adding the appropriate variance

20
Fine tune kcomp
  • The probability of a new tuple z affecting the
    view is p(zgtvalk)
  • Bernoulli experiment with 2 possible events
  • New tuple z affecting the view with probability
    p(z)
  • New tuple z not-affecting the view with
    probability 1-p(z)
  • The number of successes of ?ins Bernoulli
    experiments follow a Binomial distribution with
  • VARIANCE

21
Fine tune kcomp
  • In worst case, in order to guarantee that the
    view will contain at least k tuples with
    confidence 95 kcomp is computed as
  • VARins denotes the variance of the insertions
  • VARdel denotes the variance of the deletions

22
Contents
  • Motivation Problem Definition
  • Overview of our Method
  • Computation of rates affecting the view
  • Computation of kcomp
  • Fine tuning kcomp
  • Experiments
  • Conclusions

23
Experimental methodology
  • Test the following methods
  • kcomp without fine tuning
  • kcomp with fine tuning
  • Yi et al _at_ ICDE03
  • For the following measures
  • Number of tuples ( tuples) deleted from the view
    that fall below the threshold value of k
  • Memory overhead for kcomp with without fine
    tuning as number of extra tuples needed to keep
    in the view
  • Number of extra tuples for kcomp with without
    fine tuning compared to the number of extra
    tuples of the related work

24
Experimental methodology
  • Experimental parameters

Size of source table R (tuples) R 1x105, 5x105, 1x106, 2x106
Size of mat. View (tuples) k 5, 10, 100, 1000
Size of update stream (pct over R) ? 1/1000, 1/100
Deletion rate over insertion rate (ratio) D/I 1.0, 1.5, 2.0
  • Synthetic data sets
  • Gaussian distribution with mean µ50 and variance
    s10
  • Negative exponential distribution with parameters
    a1.0 for X and a2.0 for Y
  • Zipf distribution with parameter a2.1

25
Max average misses
  • kcomp without fine tuning
  • Gaussian distribution
  • As a function of R and ? As a function of k and
    D/I

26
Memory overhead
  • Number of extra tuples as a function of R and D/I

27
Comparison with related work
  • Number of extra tuples of kcomp with fine tuning
    compared with kmax of the related work as a
    function of R

28
Comparison with related work
  • Number of extra tuples of kcomp with fine tuning
    compared with kmax of the related work as a
    function of k

29
Contents
  • Motivation Problem Definition
  • Overview of our Method
  • Computation of rates affecting the view
  • Computation of kcomp
  • Fine tuning kcomp
  • Experiments
  • Conclusions

30
Conclusions
  • We handled the problem of maintaining
    materialized top-k views in the presence of high
    deletion rates
  • The method comprises the following steps
  • a computation of the rate that actually affects
    the materialized view,
  • a computation of the necessary extension to k in
    order to handle the augmented number of deletions
    that occur and
  • a fine tuning part that adjusts this value to
    take the fluctuation of the statistical
    properties of this value into consideration

31
Thank you for your attention!
  • many thanks to our hosts!

This research was co-funded by the European Union
in the framework of the program Pythagoras I?
of the Operational Program for Education and
Initial Vocational Training of the 3rd Community
Support Framework of the Hellenic Ministry of
Education, funded by 25 from national sources
and by 75 from the European Social Fund (ESF).
32
Auxiliary slidesFormulas for kcomp
33
Time to build top-k view in microseconds
N K Gauss Negative exponential Zipf
100K 5 328000 348500 242000
100K 10 333000 345667 239667
100K 100 335500 343000 239667
100K 1000 395333 406000 299500
500K 5 1650667 1715500 1216333
500K 10 1650667 1713000 1208333
500K 100 1653167 1710500 1205667
500K 1000 1736667 1796167 1291833
1M 5 3298667 3429000 2427167
1M 10 3301333 3426667 2429667
1M 100 3304000 3439500 2422167
1M 1000 3403167 3520500 2606667
2M 5 6650667 6900500 5406333
2M 10 6653167 6900833 4909000
2M 100 6747167 6906000 4906500
2M 1000 6895500 7082833 4992167
Write a Comment
User Comments (0)
About PowerShow.com