A Black-Box Approach to Query Cardinality Estimation - PowerPoint PPT Presentation

About This Presentation
Title:

A Black-Box Approach to Query Cardinality Estimation

Description:

Does not build a model. The Black Box approach. Data independent in both inputs and estimation ... Compact models, summary data structures ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 15
Provided by: zacharyp
Category:

less

Transcript and Presenter's Notes

Title: A Black-Box Approach to Query Cardinality Estimation


1
A Black-Box Approach to Query Cardinality
Estimation
  • Tanu Malik, Randal Burns
  • The Johns Hopkins University
  • Nitesh V. Chawla
  • Notre Dame University

2
The Black Box Approach
  • Estimate query result sizes without knowledge of
  • Underlying data distributions
  • Query execution plan
  • Machine learning techniques
  • Group queries into syntactic families (templates)
  • Learn in a high-dimension, complex input space
  • Attributes, operators, function arguments,
    aggregates
  • Partition input space
  • Learn regression functions in each partition
  • Self-tuning, self-correcting models
  • When compared with bottom-up estimation
  • Produces accurate, highly compact, and fast
    models
  • Lose ability to evaluate sub-plans

3
Are new techniques needed?
  • Working with federated and remote data sources
  • No access to data (privacy and performance
    concerns)
  • Many data sources (cant keep estimates for all)
  • Our motivation caching in federations
  • Ask the DB optimizer?
  • Other applications
  • Replica maintenance
  • Grid workflow
  • Distributed query schedulers

4
Astronomy Example
  • Typical query
  • User-defined functions
  • Mathematical expressions
  • Sample bottom-up plan
  • Many sub-estimates

5
The Spatial Function
  • Executed at the backend database
  • Data distribution and queries in attribute
    domains
  • Function computes a range query

6
Workload Observed at Cache
  • Point queries in 3-dimensional space
  • 2-d projection on attributes shown
  • Query result-size (log cardinality)

7
Learning
  • Query yields are k-means clustered into classes
  • Two-shown, typically 4-8

8
Learning
  • Query yields are k-means clustered into classes
  • Class boundaries and regression functions
  • Learning techniques model trees, classification
    and regression, and locally-weighted regression

9
Virtues of the Black Box
  • No errors from modeling assumptions, because it
    makes no assumptions
  • Conditional independence
  • Join distributions
  • Accurate estimates for complex queries
  • User-defined functions
  • High-dimensional queries
  • Multi-way joins
  • Point queries
  • Performance (later)

10
Drawbacks of the Black Box
  • Semantic losses
  • Does not use indexes, uniqueness, constraints
  • When available, treat as exceptions
  • Not integrated with query execution plans
  • No sub-plan estimates
  • No what-if scenarios can be explored
  • Parallel execution
  • Operator re-ordering
  • Not naturally suited to the database optimization
  • Its a middleware technique

11
Overview of Results
  • How many trees?
  • How accurate?

12
Space and Time
  • How big?
  • How fast?

13
A Black-Box Approach to Query Cardinality
Estimation
  • Tanu Malik, Randal Burns
  • The Johns Hopkins University
  • Nitesh V. Chawla
  • Notre Dame University

14
Quick Comparison
  • Self-tuning histograms, e.g. STHoles, STGrid,
    others
  • Machine learning, self-tuning, based on observed
    workload
  • Produce an estimated data distribution
  • Histograms limited to range queries
  • Costing User-Defined Functions He et al. 2005
  • Estimate based on weighted nearest k-neighbors
  • Restricted to function arguments
  • Does not build a model
  • The Black Box approach
  • Data independent in both inputs and estimation
  • Rich input space enumerated domains, operators,
    and aggregates
  • Compact models, summary data structures
Write a Comment
User Comments (0)
About PowerShow.com