View by Category

Loading...

PPT – Advanced Database System PowerPoint presentation | free to view

The Adobe Flash plugin is needed to view this content

About This Presentation

Write a Comment

User Comments (0)

Transcript and Presenter's Notes

Advanced Database System

- CS 641
- Lecture 13

Cost-based enumeration

- When we turn the preferred logical query plan

into a physical plan, we need to consider

different physical plans and evaluate or estimate

the cost of each - We pick the physical query plan with the least

cost and pass this plan to the query execution

engine.

Generating physical plans

- To estimate each physical plan, we select
- An order and grouping for associative-and-commutat

ive operations (joins, unions and intersections) - An algorithm for each operator in the logical

plan - Additional operators that are needed
- The way in which arguments are passed from one

operator to the next.

Estimate sizes of the intermediate relations

- We need
- Given accurate estimates
- Easy to compute
- Are logically consistent

Estimate the size of a projection

- For projection, the size of the result is

computable. - It produces a result tuple for every argument

tuple. - The only change is the length of tuples
- Consider as bag projection
- For set projection, d operator need to be used.

Example

- R(a,b,c)
- a, b 4 bytes integer c 100 bytes string
- Tuple header 12 bytes
- 120 bytes per tuple
- Block 1024 bytes, block header 24 bytes.
- T(R)10,000 B(R)1250
- Spab, c(R)
- 116 bytes per tuple, T(S)10,000, B(S)1250
- Upa, b(R)
- 20 bytes per tuple, T(U)10,000, B(U)200

Estimate the size of a selection

- Selection operation reduces the number of tuples,

but the size of each tuple remains the same - SsAc(R)
- Estimation T(S)T(R)/V(R,a)
- Ssaltc(R)
- Estimation T(S)T(R)/3
- SsAc(R)
- Estimation T(S)T(R) or T(S)T(R)(V(R,a)-1)/V(R,a

) - For condition C is the AND of several equalities

and inequalities, the effect will be the

multiplication of the selectivity factor for each

condition.

Example 16.21

- R(a,b,c), Ssa10 AND blt20(R)
- T(R)10,000, V(R,a)50
- Estimation T(R)/(503)
- Special cases
- Ssa10 AND agt20(R)
- Estimation T(R)/(503)
- actual result 0
- Query optimizer have to deal with special cases.

OR of conditions

- SsC1 OR C2(R)
- If assume no tuple will satisfy both conditions
- The size will be the sum of the tuples satisfy

each - It will even larger than size of R!
- We can take the smaller one as the estimation
- Another estimation (C1 and C2 are independent)
- n(1-(1-m1/n)(1-m2/n))
- n number of tuples
- m1 number of tuples satisfy C1
- m2 number of tuples satisfy C2

Example 16.22

- R(a,b), T(R)10,000, V(R,a)50
- Ssa10 OR blt20(R)
- n10,000 m1200, m23333
- Estimation
- 10000(1-(1-200/10000)(1-3333/10000))

NOT in a selection

- For NOT C, the estimation number of tuples
- T(R) the estimated number that satisfy C

Estimation the size of a join

- R(X,Y) S(Y,Z), Y is a single attribute
- According to Y values
- Disjointed T(R S)0
- Y might be the key of S and a foreign key of R

T(R S)T(R) - Almost all the tuples of R and S could have the

same Y-values T(R S)T(R)T(S)

Simplified assumption

- Containment of value sets
- If Y appearing in many relations, then each

relation get its values from a common set

y1,y2,y3,. Thus, if V(R,Y)V(S,Y), every Y-value

of R will be a Y-value in S. - Preservation of value sets
- An attribute which is not the join attribute does

not lose values from its set of possible values.

For example, A V(R S, A)V(R,A) as well as V(S

R, A)V(R,A)

Estimation under assumptions

- R(X,Y) S(Y,Z), Y is a single attribute
- Let V(R,Y)V(S,Y)
- For each tuple t in R, it has 1/(V(S,Y) chance to

join with a given tuple in S, thus expected

number of tuples that t joins is T(S)/V(S,Y) - The estimated size of R S will be

T(R)T(S)/V(S,Y) - If V(R,Y)V(S,Y)
- The estimated size of R S will be

T(R)T(S)/V(R,Y) - In general
- Size T(R S) will be T(R)T(S)/max(V(R,Y),

V(S,Y))

Example 16.23

Compute R S U

Example (Cont.)

- (R S) U
- T(R S) T(R)T(S)/max(V(R,b), V(S,b))40,000
- Next
- T(R S)T(U)/max(V(R S,c), V(U,c))400,000
- R (S U)
- T(S U) T(S)T(U)/max(V(S,c), V(U,c))20,000
- Next
- T(R)T(S U)/max(V(R,b), V(S U,b))400,000
- Coincidence
- NO!

Multiple join attributes

- R(X,Y) S(Y,Z), Y contains several attributes
- Suppose R(x,y1,y2) S(y1,y2,z)
- For a give tuple r in R, the possibility to agree

on y1 with a tuple s in S is - 1/max(V(R,y1), V(S,y1))
- For y2 1/max(V(R,y2), V(S,y2))
- Thus, the expected joined pairs
- T(R)T(S)/(max(V(R,y1),V(S,y1)) max(V(R,y2),

V(S,y2)))

General rule

- The size of natural join between two relations
- The estimate of the size of R S is computed by

multiplying T(R) by T(S) and dividing by the

larger of V(R,y) and V(S,y) for each attribute y

that is common to R and S.

Example 16.24

Compute

The estimate size is T(R)T(S)/max(V(R,b),V(S,d))m

ax(V(R,c),V(S,e))400

Example 16.25

Example 16.25

- R(a,b) U(c,d)
- Size T(R)T(U)5,000,000
- Then, join S(b,c)
- T(R)T(U)T(S)/(max(V(R,b),V(S,b))max(V(U,c),V(S,c)

))400,000 - Same result generated!

Equijoins

- The number of tuples in the result of an equijoin

can be computed exactly as for a natural join,

after accounting for the change in variable names.

Other theta-joins

- A selection following a product, and
- The number of tuples in a product is the number

of tuples in the relations involved - Equality and inequality comparisons can be

estimated using the techniques developed for

selections

Joins of many relations

- SR1R2R3Rn, attribute A appears in k of the

Ris, and the number of its sets of values in

these k relations viV(Ri,A) are v1v2v3

vk - Suppose pick a tuple from each relation, what is

the probability that all tuples selected agree on

A - Answer
- 1/v2v3...vk
- start with the product of the number of tuples in

each relation. Then, for each attribute A

appearing at least twice, divided by all but the

least of these V(R,A)s. - The number of values that remains in A after join

will be the least of these V(R,A)s

Example 16.26

- R(a,b,c)S(b,c,d)U(b,e)

Example 16.26 (Cont.)

- The estimate size
- T(R)T(S)T(U)/(( two largest of (V(R,b), V(S,b),

V(U,b)max(V(R,c),V(S,c)100020005000/((50200)

200)5000 - The number of values for each of the attribute

the least value count - a 100 b20 c100 d400 e500

Conclusion

- No matter how to order the natural joins of n

relations - The estimate rules, applied to each join

individually, yields the same estimate size, and - This size is the same as we apply the rules for

the join all n relations as a whole.

The size of Union

- Bag union
- The sum
- Set union
- The larger of two relations to the sum
- Using something in the middle

The size of intersection

- result
- Maybe 0 to the smaller of the two arguments
- Estimation
- Half of the smaller
- Consider as a special case of the natural join
- Not good! (Sometimes for bags)
- E.g. R(a,b) B S(a,b), R 2 copies of (0,1), S 3

copies - V(R,a)V(S,a)V(R,b)V(S,b)1
- T(R)2, T(S)3
- Estimation will be 2x3/(max(1,1)max(1,1))6, not

correct.

Difference

- R-S
- Result between T(R) to T(R)-T(S)
- Estimation T(R)-(1/2)T(S)

Duplication elimination

- Result
- Between 1 to the size of R.
- Estimation smaller of (1/2)T(R) and the product

of all the V(R,ai)s

Grouping and aggregation

- Similar to duplication elimination
- Estimation smaller of (1/2)T(R) or the product

of V(R,A)s

Exercises

- 16.4.1 a) b) g) i)
- 16.4.2

Estimate the cost of certain expressions

- The cost (number of I/Os) is influenced by
- The particular logical operator chosen to

implement the query - The size of intermediate results
- The physical operators used to implement logical

operators - The ordering of similar operations
- The method of passing arguments from one physical

operator to the next

Statistical information

- DBMS have to keep important information such as

T(R), B(R), V(R,a) etc. - Histogram used to compute values for a given

attribute - Equal-wdith a width w is chosen, along with a

constant v0(the smallest value). Counts for

v0vltv0w, v0wvltv02w, - Equal-height common percentiles, pick some

fraction p, and list the lowest value, the

fraction 2p, and so on - Most-frequent-values list most common values and

their numbers of occurrences. Can be combined

with previous two mechanisms

Example 16.27

- R(a,b)S(b,c)
- R.b 1200, 0150, 5100, others 550
- S.b 0100, 180, 270, others 250
- V(R,b)14, V(S,b)13.
- R 50 occurrence of each other value
- S 25 occurrence of each other value
- For 0 15010015,000
- For 1 200 7016,000
- For others
- 2 70503500 5 100252500
- Other nine values 5025911,250
- Total 1500016000350025001125048250
- compare to simple estimation 1000500/1435714

Example 16.28

- Jan(day, temp)
- July(day, temp)
- Query
- SELECT Jan.day, July.day
- FROM Jan, July
- WHERE Jan.tempJuly.temp
- Using equal-width histogram.

Example 16.28 (Cont.)

Example 16.28 (Cont.)

- We calculate for each bands
- V10 for each band
- 40-49 105/105
- 50-59 520/1010
- All other ranges have 0 joined tuples
- Total 51015 tuples
- If no histogram information
- V100 for 0-100
- 245245/100600 tuples

Computation of statistics

- Recomputation too often is not good
- Statistics tend not to change radically in a

short time - Somehow, inaccurate statistics are still useful
- Keeping up-to-date will cause hot spot
- Recomputation should be triggered automatically

after some period of time or after some number of

updates - Database administrator can issue recomputation

command directly if poor-performing query plans

are observed. - Recomputation is complex and time-consuming.
- Sometimes, we can sample a fraction of data when

doing recomputation.

Heuristics for reducing cost of logical query plan

- Estimating cost both before and after a

transformation can reduce the cost or avoid the

transformation otherwise.

Example 16.29

Example 16.29 (Cont.)

Example 16.29 (Cont.)

- To compare two plans, we add the estimated sizes

for all the nodes except the root and the leaves. - (a) 1005010001150
- (b) 10010001100
- Different answer if we have a smaller V(R,b) or

V(S,b)

Enumerating physical plans

- Baseline approach
- Exhaustive consider all the combinations of

choices, and choose the one with the smallest

cost - Other approaches are more efficient
- Top-down work down the tree of the logical query

plan from the root. - Bottom-up for each subexpression, compute the

costs of all possible ways. - Both approaches can limit the choices effectively

Bottom-up approaches

- Dynamic programming
- Simplified bottom-up method
- Consider only the best plan for each

subexpression - Not guarantee to yield the best plan, but it

often does - Selinger-style(system-R-style)
- Exploit additional properties that some of the

plans for a subexpression may have - Produce optimal overall plan.

Heuristic selection

- If the logical plan calls for a selection sAc

(R), and R has an index on A, then perform an

index-scan to obtain the tuples with Ac - More generally, for selections involving Ac and

other conditions, implement as the index-scan

selection followed by a further selection. - If an argument of a join has an index on the join

attribute(s), use an index-join with that

relation in the inner loop - If one argument of a join is sorted on the join

attribute(s), prefer a sort-join to a hash-join,

but not necessarily to an index-join - Computing the union or intersection of three or

more relations, group the smallest relations

first

Branch-and-bound enumeration

- Using heuristics to find a physical plan first,

let its cost be C - Consider other plans for subqueries
- If cost gt C, ignore
- If cost lt C, replace the corresponding

subexpression - Advantage
- We can choose to cut off the search and take the

best plan found so far. Reduce the calculation

time

Hill climbing

- Start with a heuristically selected physical plan
- Then, make small changes to the plan
- Replacing one method for an operator by another
- Reordering join, etc.
- If find a plan with smaller cost, change
- If no small modification can yield lower cost,

choose the current plan

Dynamic programming

- Keep for each subexpression only the plan of

least cost - As we work up the tree, consider possible

implementations of each node, assuming the best

plan for each subexpression is also used

Selinger-style optimization

- Improvement on dynamic programming
- Keeping other plans that have higher cost for

each subexpression, and produce a result that is

sorted in an order that may useful in higher

layers - The attribute(s) specified in a sort operator at

the root - The grouping attribute(s) of a later group-by

operator - The join attribute(s) of a later join

Exercises

- 16.5.2
- 16.5.3
- 16.5.4 a) b)

About PowerShow.com

PowerShow.com is a leading presentation/slideshow sharing website. Whether your application is business, how-to, education, medicine, school, church, sales, marketing, online training or just for fun, PowerShow.com is a great resource. And, best of all, most of its cool features are free and easy to use.

You can use PowerShow.com to find and download example online PowerPoint ppt presentations on just about any topic you can imagine so you can learn how to improve your own slides and presentations for free. Or use it to find and download high-quality how-to PowerPoint ppt presentations with illustrated or animated slides that will teach you how to do something new, also for free. Or use it to upload your own PowerPoint slides so you can share them with your teachers, class, students, bosses, employees, customers, potential investors or the world. Or use it to create really cool photo slideshows - with 2D and 3D transitions, animation, and your choice of music - that you can share with your Facebook friends or Google+ circles. That's all free as well!

For a small fee you can get the industry's best online privacy or publicly promote your presentations and slide shows with top rankings. But aside from that it's free. We'll even convert your presentations and slide shows into the universal Flash format with all their original multimedia glory, including animation, 2D and 3D transition effects, embedded music or other audio, or even video embedded in slides. All for free. Most of the presentations and slideshows on PowerShow.com are free to view, many are even free to download. (You can choose whether to allow people to download your original PowerPoint presentations and photo slideshows for a fee or free or not at all.) Check out PowerShow.com today - for FREE. There is truly something for everyone!

You can use PowerShow.com to find and download example online PowerPoint ppt presentations on just about any topic you can imagine so you can learn how to improve your own slides and presentations for free. Or use it to find and download high-quality how-to PowerPoint ppt presentations with illustrated or animated slides that will teach you how to do something new, also for free. Or use it to upload your own PowerPoint slides so you can share them with your teachers, class, students, bosses, employees, customers, potential investors or the world. Or use it to create really cool photo slideshows - with 2D and 3D transitions, animation, and your choice of music - that you can share with your Facebook friends or Google+ circles. That's all free as well!

For a small fee you can get the industry's best online privacy or publicly promote your presentations and slide shows with top rankings. But aside from that it's free. We'll even convert your presentations and slide shows into the universal Flash format with all their original multimedia glory, including animation, 2D and 3D transition effects, embedded music or other audio, or even video embedded in slides. All for free. Most of the presentations and slideshows on PowerShow.com are free to view, many are even free to download. (You can choose whether to allow people to download your original PowerPoint presentations and photo slideshows for a fee or free or not at all.) Check out PowerShow.com today - for FREE. There is truly something for everyone!

presentations for free. Or use it to find and download high-quality how-to PowerPoint ppt presentations with illustrated or animated slides that will teach you how to do something new, also for free. Or use it to upload your own PowerPoint slides so you can share them with your teachers, class, students, bosses, employees, customers, potential investors or the world. Or use it to create really cool photo slideshows - with 2D and 3D transitions, animation, and your choice of music - that you can share with your Facebook friends or Google+ circles. That's all free as well!

For a small fee you can get the industry's best online privacy or publicly promote your presentations and slide shows with top rankings. But aside from that it's free. We'll even convert your presentations and slide shows into the universal Flash format with all their original multimedia glory, including animation, 2D and 3D transition effects, embedded music or other audio, or even video embedded in slides. All for free. Most of the presentations and slideshows on PowerShow.com are free to view, many are even free to download. (You can choose whether to allow people to download your original PowerPoint presentations and photo slideshows for a fee or free or not at all.) Check out PowerShow.com today - for FREE. There is truly something for everyone!

For a small fee you can get the industry's best online privacy or publicly promote your presentations and slide shows with top rankings. But aside from that it's free. We'll even convert your presentations and slide shows into the universal Flash format with all their original multimedia glory, including animation, 2D and 3D transition effects, embedded music or other audio, or even video embedded in slides. All for free. Most of the presentations and slideshows on PowerShow.com are free to view, many are even free to download. (You can choose whether to allow people to download your original PowerPoint presentations and photo slideshows for a fee or free or not at all.) Check out PowerShow.com today - for FREE. There is truly something for everyone!

Recommended

«

/ »

«

/ »

Promoted Presentations

Related Presentations

CrystalGraphics Sales Tel: (800) 394-0700 x 1 or Send an email

Home About Us Terms and Conditions Privacy Policy Contact Us Send Us Feedback

Copyright 2014 CrystalGraphics, Inc. — All rights Reserved. PowerShow.com is a trademark of CrystalGraphics, Inc.

Copyright 2014 CrystalGraphics, Inc. — All rights Reserved. PowerShow.com is a trademark of CrystalGraphics, Inc.

The PowerPoint PPT presentation: "Advanced Database System" is the property of its rightful owner.

Do you have PowerPoint slides to share? If so, share your PPT presentation slides online with PowerShow.com. It's FREE!

Committed to assisting Suffolk University and other schools with their online training by sharing educational presentations for free