ConfidenceAware Join Algorithms - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

ConfidenceAware Join Algorithms

Description:

Confidence-Aware Join Algorithms. Parag Agrawal, Jennifer ... Fagin's NRA Algorithm, Rank-Aware Joins. Monotonicity condition. C(p, q) C(r, s) if p r and q s ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 21
Provided by: Par6153
Category:

less

Transcript and Presenter's Notes

Title: ConfidenceAware Join Algorithms


1
Confidence-Aware Join Algorithms
  • Parag Agrawal, Jennifer Widom
  • Stanford University

2
Uncertain Databases
  • Tuples have confidences
  • Result confidence computation
  • Combine input confidences
  • Assume independence

0.8
0.6
0.48
3
Queries
  • High confidence result tuples are more important
  • Query results based on tuple confidence
  • Threshold
  • Top-k
  • Sorted
  • Efficient algorithms for join queries
  • IO cost

4
Traditional Approach
  • Leave it to the optimizer
  • Treat confidence values as another column
  • Result confidence computed in query
  • Threshold in WHERE clause
  • SELECT R.A, S.C, R.conf ? S.conf as conf
  • FROM R, S
  • WHERE R.B S.B AND R.conf ? S.conf gt
    threshold
  • Sorted using ORDER-BY
  • Top-k using ORDER-BY and LIMIT

5
Can Do Better
  • Exploit monotonicity of combining function
  • Fagins NRA Algorithm, Rank-Aware Joins
  • Monotonicity condition
  • C(p, q) C(r, s) if p r and q s
  • Assume sorted access by confidence
  • Limited memory
  • In contrast to previous work

6
Outline
  • Introduction
  • Algorithms and guarantees
  • Threshold
  • Top-k
  • Sorted
  • Experiments
  • Conclusion

7
Join Visualization
  • Nested Block Join
  • Memory size M
  • Repeat
  • Load part of R into memory
  • Scan S and evaluate
  • Explore cross-product
  • IO Cost
  • Number of tuples read
  • Load
  • Scan

R
M
S
Cost R (R M) ? S
8
Threshold
  • Sorted access
  • Monotonicity
  • Threshold Stair
  • Explore pruned area
  • Algorithm
  • Threshold1

0.0
lt threshold
R
?
threshold
M
1.0
1.0
0.0
S
9
Threshold1 Guarantee
  • IO Cost less than 2 times of optimal
  • Assuming no indexes
  • Bad case

Threshold1
Optimal
Cost a?M a?M
Cost M a?M
10
Threshold2
  • Picks which relation to load in each step
  • Longer scan
  • Optimality Ratio 3/2
  • Close to optimal if pruned area
  • is large compared to memory

11
Top-k
  • Threshold confidence of kth tuple
  • Need to explore shaded region
  • Threshold value is not known
  • Explore in (approximate)
  • order of result confidence

k 3
?
?
?
?
?
0.65
0.62
?
?
0.55
0.75
12
Top-k
  • Scan length L
  • Explore blocks (M x L)
  • In order of max confidence
  • Top-k tuples maintained during algorithm
  • Treat confidence of current
  • kth tuple as threshold
  • Exit when stair explored

0.6
0.7
0.63
0.72
0.8
1.0
0.9
0.6
13
Top-k
  • Optimality ratio
  • 3 in general
  • 2 if pruned area is large compared to memory
  • Large L (large blocks)
  • Good Explore efficiently (Area / IO)
  • Bad May explore unnecessary area
  • Experiment
  • Effect of L

14
Top-k Parameter L
Cost (Number of Tuples) (105)
Parameter L (103)
15
Experiments
  • Synthetic data-sets
  • 1M 10M tuples in each relation
  • Various confidence distributions
  • Algorithms perform well
  • Not affected by confidence distribution
  • Results in paper
  • Sorted experiment

16
Sorted
  • Explores blocks like top-k
  • Result memory buffer (priority queue)
  • Emit a result tuple after exploring corresponding
    stair
  • Non-blocking operator
  • Experiment
  • Non-blocking behavior
  • Result memory buffer size
  • Effect of L

17
Sorted Non-Blocking
Cost (Number of Tuples) (106)
Result Tuples Emitted (105)
18
Sorted Result Memory Buffer
Result Memory Buffer (104)
Result Tuples Emitted (105)
19
Future Work
  • Use as operator in query plans
  • Parameter L (block size)
  • Memory allocation
  • Cost estimation
  • Non-independence
  • Interval approximations
  • Monte-Carlo simulations

20
Conclusion
  • Join algorithms using threshold techniques
  • Limited memory
  • Theoretical guarantees
  • Algorithms apply in any setting with monotonic
    combining function and sorted access
  • Middleware, Multimedia Databases, IR

Thank You!
Write a Comment
User Comments (0)
About PowerShow.com