Efficient Evaluation of Queries with Mining Predicates by Chaudhuri, Narasayya, and Sarawagi

About This Presentation

Title:

Efficient Evaluation of Queries with Mining Predicates by Chaudhuri, Narasayya, and Sarawagi

Description:

Efficient Evaluation of Queries with Mining Predicates by Chaudhuri, Narasayya, and Sarawagi CSci 8701 Group G07 Charles Braxmeier Problem Statement Find more ... – PowerPoint PPT presentation

Number of Views:121

Avg rating:3.0/5.0

Slides: 11

Provided by: Charle661

Category:

more less

Transcript and Presenter's Notes

Title: Efficient Evaluation of Queries with Mining Predicates by Chaudhuri, Narasayya, and Sarawagi

1
Efficient Evaluation of Queries with Mining
Predicatesby Chaudhuri, Narasayya, and Sarawagi

CSci 8701 Group G07
Charles Braxmeier

2
Problem Statement

Find more efficient ways to execute queries where
one or more of the predicates are the results of
data mining decisions
Example Query Find fans who went to a Minnesota
hockey game last year who may be football fans as
well

3
Contributions of the Paper

Great detail about different types of mining
models (clustering, decision trees, etc.)
Discussion regarding the different ways mining
predicate(s) can be joined within a query
Analysis on the experiments done to test theories
regarding query optimization based on the
structure of mining model

4
Key Concepts

Upper Envelope Predicate
Tightness of the Querys Predicates
Mining Model
Decision Tree
Naïve Bayes Classifiers
Bottom-up
Top-Down

5
Key Concepts (contd.)

Mining Model (continued)
Clustering
Centroid-based
Model-based
Boundary-based

6
Validation Methodology

Experimentation based on the theories posed
regarding query reorganization
Twenty (20) different data sets used. Data sets
vary based on
Data set size
Number of dimensions in data set
Size of data set used to train the mining model

7
Validation Methodology (contd.)

Analysis of Experiment Results
65 of query access paths affected by
re-arranging the query based on the upper
envelope predicate
Average run-time decreased by 65 by re-arranging
the query based on the upper envelope predicate
More variance in run-time decrease than access
paths affected

8
Assumptions

Clustering can be evaluated via Bayes classifiers
Therefore, not too much background info on
clustering and how its experiments were different
than Bayes experiments
Continuous data sets are split into discrete data
sets to assist in mining predictions
Not necessarily realistic
Example, latitude / longitude

9
Possible Revisions to Paper