Title: Tuning the top-k view update process
1Tuning the top-k view update process
- Eftychia Baikousi
- Panos Vassiliadis
2Forecast
- Problem of maintaining materialized top-k views,
when updates occur in the base relation - Extra difficulty address the problem in the
presence of high deletion rates - The crux of the approach is to materialize an
appropriate number of extra tuples kcomp to
sustain the deletion rates that are drastically
higher than average - The correct estimation fine tuning of kcomp is
not obvious - We use appropriate probabilistic methods
3Contents
- Motivation Problem Definition
- Overview of our Method
- Computation of rates affecting the view
- Computation of kcomp
- Fine tuning kcomp
- Experiments
- Conclusions
4Contents
- Motivation Problem Definition
- Overview of our Method
- Computation of rates affecting the view
- Computation of kcomp
- Fine tuning kcomp
- Experiments
- Conclusions
5Top-k query
- Given
- a relation R (id, x1, x2, x3) and
- a query Q, sum(x1, x2, x3)
- Find k tuples with highest grades according to Q
R
id x1 x2 x3
a 0.3 0.6 0.7
b 0.2 0.3 0.4
c 0.4 0.5 0.9
d 0.7 0.6 0.1
sum
1.6
0.9
1.8
1.4
Top-2 tuples
6Motivating Example
- Shopping Center
- Customers sign in with a palmtop (PDA)
- Need for advertisements Special offers to
Customers - Given
- relation Customers (id, name, age, salary, )
- materialized view V of the top-2 (Younger and
Highly paid Customers) according to the query Q
- age 2salary - Maintain the view V
- Customers sign in and out (e.g., train
departures, working hours)
id name age salary
1 John 18 20
2 Mary 42 25
3 Bill 26 35
4 Peter 57 37
Q
22
8
44
17
name Q
Bill 44
John 22
Customers
V
7Problem definition
- Given
- a base relation R (ID, X, Y) that originally
contains N tuples, - a materialized view V
- that contains top-k tuples of the form (id, val)
where val is the score according to a function
Q(x,y)ax by and a, b are constant parameters, - the update ratios ?ins, ?del and ?upd for
insertions, deletions and updates respectively
over the base relation R, - Compute
- kcomp
- that is of the form kcomp k ?k
- Such that
- the view will contain at least k tuples,
- k kcomp, with probability p, after a period T
V
id Q
8Related Work
- Ke Yi, Hai Yu, Jun Yang, Gangqiang Xia, Yuguo
Chen - Efficient Maintenance of Materialized Top-k
Views, ICDE 03 - Maintain a materialized top-k view when updates
occur in the base table - Compute a kmax (instead of the necessary k)
- adjusted at runtime so a refill query is rarely
needed - formulates the problem through a random walk model
- The method is theoretically guaranteed to work
well only when the probabilities - of insertions and deletions are equal, pinspdel
- of insertions are more frequent than deletions
pinsgtpdel - There is no quality-of-service guarantee when
- deletions are more probable than insertions,
pinsltpdel
9Motivating Example
- Customers sign in and out
- Due to train departures, working hours
- At certain time periods, deletions are more
probable than insertions pinsltpdel - The view will not contain at least k tuples
Customers
id name age salary
1 John 18 20
2 Mary 42 25
3 Bill 26 35
4 Peter 57 37
Q
22
8
44
17
V
name Q
Bill 44
John 22
10Contents
- Motivation Problem Definition
- Overview of our Method
- Computation of rates affecting the view
- Computation of kcomp
- Fine tuning kcomp
- Experiments
- Conclusions
11Overview of the method
- Compute the ratios of the incoming source updates
that affect the view - Compute kcomp
- Fine tune kcomp
12Empirical Cumulative Distribution Function ECDF
- ECDF is a non parametric cumulative distribution
function that adapts itself to the data - Definition
-
- Fn(x)
- represents the proportion of observations in a
sample less than or equal to x - assigns the probability 1/n to each of n
observations in the sample - estimates the true population proportion F(x)
13Computation of update rates that affect V
- Given
- a relation Customers (id, name, age, salary, )
having N4 tuples - a materialized view V containing top-2 tuples
(k2) of the form (id, Q) where Q -age 2salary
is the score - Update ratios ?ins1, ?del2, ?upd0
- Find ?ins_aff and ?del_aff (insertions
deletions affecting the view)
Customers
V
id name age salary
1 John 18 20
2 Mary 42 25
3 Bill 26 35
4 Peter 57 37
Q
22
8
44
17
name Q
Bill 44
John 22
14Computation of update rates that affect V
- Given
- N4, ?ins1, ?del2, ?upd0
- We compute the following
- updates are treated as a combination of deletions
and insertions -
-
- from ECDF the probability of a new tuple
affecting the view - Ratios affecting the view
15Overview of the method
- Compute the ratios of the incoming source updates
that affect the view - Compute kcomp
- Fine tune kcomp
16Computation of kcomp
id Q
- Compute kcomp
- such that it will guarantee that the view will
contain at least k tuples, k kcomp, with
probability p, after a period of operation T - that is of the form kcomp k ?k
Customers
V
id name age salary
1 John 18 20
2 Mary 42 25
3 Bill 26 35
4 Peter 57 37
Q
22
8
44
17
name Q
Bill 44
John 22
Peter 17
17Computation of kcomp
Customers
V
id name age salary
1 John 18 20
2 Mary 42 25
3 Bill 26 35
4 Peter 57 37
5 Kate 25 30
Q
22
8
44
17
25
name Q
Bill 44
Kate 25
John 22
Peter 17
- There is 1 insertion and 2 deletions affecting
the view - Tuple (5, Kate, 25, 30) is inserted and
- Tuples (3, Bill, 26, 35) and (4, Peter, 57, 37)
are deleted from the view - The view will contain 2 tuples, as initially
needed
18Overview of the method
- Compute the ratios of the incoming source updates
that affect the view - Compute kcomp
- Fine tune kcomp
19Fine tune kcomp
- kcomp is expressed as a formula depending on
- ?ins_aff and ?del_aff the ratios of insertions
and deletions affecting the view - The probability of a tuple affecting the view may
vary according to probabilistic properties - Fine tune kcomp by adding the appropriate variance
20Fine tune kcomp
- The probability of a new tuple z affecting the
view is p(zgtvalk) - Bernoulli experiment with 2 possible events
- New tuple z affecting the view with probability
p(z) - New tuple z not-affecting the view with
probability 1-p(z) -
- The number of successes of ?ins Bernoulli
experiments follow a Binomial distribution with - VARIANCE
21Fine tune kcomp
- In worst case, in order to guarantee that the
view will contain at least k tuples with
confidence 95 kcomp is computed as -
- VARins denotes the variance of the insertions
- VARdel denotes the variance of the deletions
22Contents
- Motivation Problem Definition
- Overview of our Method
- Computation of rates affecting the view
- Computation of kcomp
- Fine tuning kcomp
- Experiments
- Conclusions
23Experimental methodology
- Test the following methods
- kcomp without fine tuning
- kcomp with fine tuning
- Yi et al _at_ ICDE03
- For the following measures
- Number of tuples ( tuples) deleted from the view
that fall below the threshold value of k - Memory overhead for kcomp with without fine
tuning as number of extra tuples needed to keep
in the view - Number of extra tuples for kcomp with without
fine tuning compared to the number of extra
tuples of the related work
24Experimental methodology
Size of source table R (tuples) R 1x105, 5x105, 1x106, 2x106
Size of mat. View (tuples) k 5, 10, 100, 1000
Size of update stream (pct over R) ? 1/1000, 1/100
Deletion rate over insertion rate (ratio) D/I 1.0, 1.5, 2.0
- Synthetic data sets
- Gaussian distribution with mean µ50 and variance
s10 - Negative exponential distribution with parameters
a1.0 for X and a2.0 for Y - Zipf distribution with parameter a2.1
25Max average misses
- kcomp without fine tuning
- Gaussian distribution
- As a function of R and ? As a function of k and
D/I
26Memory overhead
- Number of extra tuples as a function of R and D/I
27Comparison with related work
- Number of extra tuples of kcomp with fine tuning
compared with kmax of the related work as a
function of R
28Comparison with related work
- Number of extra tuples of kcomp with fine tuning
compared with kmax of the related work as a
function of k
29Contents
- Motivation Problem Definition
- Overview of our Method
- Computation of rates affecting the view
- Computation of kcomp
- Fine tuning kcomp
- Experiments
- Conclusions
30Conclusions
- We handled the problem of maintaining
materialized top-k views in the presence of high
deletion rates - The method comprises the following steps
- a computation of the rate that actually affects
the materialized view, - a computation of the necessary extension to k in
order to handle the augmented number of deletions
that occur and - a fine tuning part that adjusts this value to
take the fluctuation of the statistical
properties of this value into consideration
31Thank you for your attention!
- many thanks to our hosts!
This research was co-funded by the European Union
in the framework of the program Pythagoras I?
of the Operational Program for Education and
Initial Vocational Training of the 3rd Community
Support Framework of the Hellenic Ministry of
Education, funded by 25 from national sources
and by 75 from the European Social Fund (ESF).
32Auxiliary slidesFormulas for kcomp
33Time to build top-k view in microseconds
N K Gauss Negative exponential Zipf
100K 5 328000 348500 242000
100K 10 333000 345667 239667
100K 100 335500 343000 239667
100K 1000 395333 406000 299500
500K 5 1650667 1715500 1216333
500K 10 1650667 1713000 1208333
500K 100 1653167 1710500 1205667
500K 1000 1736667 1796167 1291833
1M 5 3298667 3429000 2427167
1M 10 3301333 3426667 2429667
1M 100 3304000 3439500 2422167
1M 1000 3403167 3520500 2606667
2M 5 6650667 6900500 5406333
2M 10 6653167 6900833 4909000
2M 100 6747167 6906000 4906500
2M 1000 6895500 7082833 4992167