A Black-Box Approach to Query Cardinality Estimation - PowerPoint PPT Presentation

About This Presentation

Title:

A Black-Box Approach to Query Cardinality Estimation

Description:

Does not build a model. The Black Box approach. Data independent in both inputs and estimation ... Compact models, summary data structures ... – PowerPoint PPT presentation

Number of Views:79

Avg rating:3.0/5.0

Slides: 15

Provided by: zacharyp

Learn more at: https://database.cs.wisc.edu

Category:

Tags: approach | black | box | cardinality | estimation | models | query

Transcript and Presenter's Notes

Title: A Black-Box Approach to Query Cardinality Estimation

1
A Black-Box Approach to Query Cardinality
Estimation

Tanu Malik, Randal Burns
The Johns Hopkins University
Nitesh V. Chawla
Notre Dame University

2
The Black Box Approach

Estimate query result sizes without knowledge of
Underlying data distributions
Query execution plan
Machine learning techniques
Group queries into syntactic families (templates)
Learn in a high-dimension, complex input space
Attributes, operators, function arguments,
aggregates
Partition input space
Learn regression functions in each partition
Self-tuning, self-correcting models
When compared with bottom-up estimation
Produces accurate, highly compact, and fast
models
Lose ability to evaluate sub-plans

3
Are new techniques needed?

Working with federated and remote data sources
No access to data (privacy and performance
concerns)
Many data sources (cant keep estimates for all)
Our motivation caching in federations

Ask the DB optimizer?
Other applications
Replica maintenance
Grid workflow
Distributed query schedulers

4
Astronomy Example

Typical query
User-defined functions
Mathematical expressions
Sample bottom-up plan
Many sub-estimates

5
The Spatial Function

Executed at the backend database
Data distribution and queries in attribute
domains
Function computes a range query

6
Workload Observed at Cache

Point queries in 3-dimensional space
2-d projection on attributes shown
Query result-size (log cardinality)

7
Learning

Query yields are k-means clustered into classes
Two-shown, typically 4-8

8
Learning

Query yields are k-means clustered into classes
Class boundaries and regression functions
Learning techniques model trees, classification
and regression, and locally-weighted regression

9
Virtues of the Black Box

No errors from modeling assumptions, because it
makes no assumptions
Conditional independence
Join distributions
Accurate estimates for complex queries
User-defined functions
High-dimensional queries
Multi-way joins
Point queries
Performance (later)

10
Drawbacks of the Black Box

Semantic losses
Does not use indexes, uniqueness, constraints
When available, treat as exceptions
Not integrated with query execution plans
No sub-plan estimates
No what-if scenarios can be explored
Parallel execution
Operator re-ordering
Not naturally suited to the database optimization
Its a middleware technique

11
Overview of Results

How many trees?
How accurate?

12
Space and Time

How big?
How fast?

13
A Black-Box Approach to Query Cardinality
Estimation

Tanu Malik, Randal Burns
The Johns Hopkins University
Nitesh V. Chawla
Notre Dame University

14
Quick Comparison

Self-tuning histograms, e.g. STHoles, STGrid,
others
Machine learning, self-tuning, based on observed
workload
Produce an estimated data distribution
Histograms limited to range queries
Costing User-Defined Functions He et al. 2005
Estimate based on weighted nearest k-neighbors
Restricted to function arguments
Does not build a model
The Black Box approach
Data independent in both inputs and estimation
Rich input space enumerated domains, operators,
and aggregates
Compact models, summary data structures

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

Featured Presentations

Related Books