Efficient Evaluation of Queries with Mining Predicates by Chaudhuri, Narasayya, and Sarawagi - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Efficient Evaluation of Queries with Mining Predicates by Chaudhuri, Narasayya, and Sarawagi

Description:

Efficient Evaluation of Queries with Mining Predicates by Chaudhuri, Narasayya, and Sarawagi CSci 8701 Group G07 Charles Braxmeier Problem Statement Find more ... – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 11
Provided by: Charle661
Category:

less

Transcript and Presenter's Notes

Title: Efficient Evaluation of Queries with Mining Predicates by Chaudhuri, Narasayya, and Sarawagi


1
Efficient Evaluation of Queries with Mining
Predicatesby Chaudhuri, Narasayya, and Sarawagi
  • CSci 8701 Group G07
  • Charles Braxmeier

2
Problem Statement
  • Find more efficient ways to execute queries where
    one or more of the predicates are the results of
    data mining decisions
  • Example Query Find fans who went to a Minnesota
    hockey game last year who may be football fans as
    well

3
Contributions of the Paper
  • Great detail about different types of mining
    models (clustering, decision trees, etc.)
  • Discussion regarding the different ways mining
    predicate(s) can be joined within a query
  • Analysis on the experiments done to test theories
    regarding query optimization based on the
    structure of mining model

4
Key Concepts
  • Upper Envelope Predicate
  • Tightness of the Querys Predicates
  • Mining Model
  • Decision Tree
  • Naïve Bayes Classifiers
  • Bottom-up
  • Top-Down

5
Key Concepts (contd.)
  • Mining Model (continued)
  • Clustering
  • Centroid-based
  • Model-based
  • Boundary-based

6
Validation Methodology
  • Experimentation based on the theories posed
    regarding query reorganization
  • Twenty (20) different data sets used. Data sets
    vary based on
  • Data set size
  • Number of dimensions in data set
  • Size of data set used to train the mining model

7
Validation Methodology (contd.)
  • Analysis of Experiment Results
  • 65 of query access paths affected by
    re-arranging the query based on the upper
    envelope predicate
  • Average run-time decreased by 65 by re-arranging
    the query based on the upper envelope predicate
  • More variance in run-time decrease than access
    paths affected

8
Assumptions
  • Clustering can be evaluated via Bayes classifiers
  • Therefore, not too much background info on
    clustering and how its experiments were different
    than Bayes experiments
  • Continuous data sets are split into discrete data
    sets to assist in mining predictions
  • Not necessarily realistic
  • Example, latitude / longitude

9
Possible Revisions to Paper
  • Spend more time on analysis of experiments and
    results, rather than the background info
  • Background information took up approximately 60
    of the paper

10
Questions?
Write a Comment
User Comments (0)
About PowerShow.com