Treatment Learning: Implementation and Application - PowerPoint PPT Presentation

About This Presentation
Title:

Treatment Learning: Implementation and Application

Description:

Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 19
Provided by: Zhu116
Category:

less

Transcript and Presenter's Notes

Title: Treatment Learning: Implementation and Application


1
Treatment LearningImplementation and Application
  • Ying Hu
  • Electrical Computer Engineering
  • University of British Columbia

2
Outline
  • An example
  • Background Review
  • TAR2 Treatment Learner
  • TARZAN Tim Menzies
  • TAR2 Ying Hu Tim Menzies
  • TAR3 improved tar2
  • TAR3 Ying Hu
  • Evaluation of treatment learning
  • Application of Treatment Learning
  • Conclusion

3
First Impression
  • Boston Housing Dataset
  • (506 examples, 4 classes)

4
Review Background
  • What is KDD ?
  • KDD Knowledge Discovery in Database fayyad96
  • Data mining one step in KDD process
  • Machine learning learning algorithms
  • Common data mining tasks
  • Classification
  • Decision tree induction (C4.5) quinlan86
  • Nearest neighbors cover67
  • Neural networks rosenblatt62
  • Naive Bayes classifier duda73
  • Association rule mining
  • APRIORI algorithm agrawal93
  • Variants of APRIORI

5
Treatment Learning Definition
  • Input classified dataset
  • Assume classes are ordered
  • Output Rxconjunction of attribute-value pairs
  • Size of Rx of pairs in the Rx
  • confidence(Rx w.r.t Class) P(ClassRx)
  • Goal to find Rx that have different level of
    confidence across classes
  • Evaluate Rx lift
  • Visualization form of output

6
Motivation Narrow Funnel Effect
  • When is enough learning enough?
  • Attributes lt 50, accuracy decrease 3-5
    shavlik91
  • 1-level decision tree is comparable to C4
    Holte93
  • Data engineering ignoring 81 features result in
    2 increase of accuracy kohavi97
  • Scheduling random sampling outperforms complete
    search (depth-first) crawford94
  • Narrow funnel effect
  • Control variables vs. derived variables
  • Treatment learning finding funnel variables

7
TAR2 The Algorithm
  • Search attribute utility estimation
  • Estimation heuristic Confidence1
  • Search depth-first search
  • Search space confidence1 gt threshold
  • Discretization equal width interval binning
  • Reporting Rx
  • Lift(Rx) gt threshold
  • Software package and online distribution

8
The Pilot Case Study
  • Requirement optimization
  • Goal optimal set of mitigations in a cost
    effective manner

Risks
Cost
relates
Requirements
incur
reduce
achieve
Mitigations
Benefit
  • Iterative learning cycle

9
The Pilot Study (continue)
  • Cost-benefit distribution (30/99 mitigations)

10
Problem of TAR2
  • Runtime vs. Rx size
  • To generate Rx of size r
  • To generate Rx from size 1..N

11
TAR3 the improvement
  • Random sampling
  • Key idea
  • Confidence1 distribution probability
    distribution
  • sample Rx from confidence1 distribution
  • Steps
  • Place item (ai) in increasing order according to
    confidence1 value
  • Compute CDF of each ai
  • Sample a uniform value u in 0..1
  • The sample is the least ai whose CDFgtu
  • Repeat till we get a Rx of given size

12
Comparison of Efficiency
13
Comparison of Results
  • 10 UCI domains, identical best Rx
  • Final Rx TAR219, TAR320

14
External Evaluation
C4.5 Naive Bayes
  • FSS framework

All attributes (10 UCI datasets)
Feature subset selector TAR2less
15
The Results
  • Number of attributes
  • Accuracy using C4.5
  • (avg decrease 0.9)
  • Accuracy using Naïve Bayes

(Avg increase 0.8 )
16
Compare to other FSS methods
  • of attribute selected (C4.5 )
  • of attribute selected (Naive Bayes)
  • 17/20, fewest attributes selected
  • Another evidence for funnels

17
Applications of Treatment Learning
  • Downloading site http//www.ece.ubc.ca/yingh/
  • Collaborators JPL, WV, Portland, Miami
  • Application examples
  • pair programming vs. conventional programming
  • identify software matrix that are superior error
    indicators
  • identify attributes that make FSMs easy to test
  • find the best software inspection policy for a
    particular software development organization
  • Other applications
  • 1 journal, 4 conference, 6 workshop papers

18
Main Contributions
  • New learning approach
  • A novel mining algorithm
  • Algorithm optimization
  • Complete package and online distribution
  • Narrow funnel effect
  • Treatment learner as FSS
  • Application on various research domains
Write a Comment
User Comments (0)
About PowerShow.com