CS%20461:%20Machine%20Learning%20Lecture%209 - PowerPoint PPT Presentation

About This Presentation
Title:

CS%20461:%20Machine%20Learning%20Lecture%209

Description:

Bagging. Train L learners on L bootstrap samples. Combine outputs by voting. 3/1/08 ... Re-use data set (like bagging) ... Key difference from bagging: ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 22
Provided by: kiriwa
Category:

less

Transcript and Presenter's Notes

Title: CS%20461:%20Machine%20Learning%20Lecture%209


1
CS 461 Machine LearningLecture 9
  • Dr. Kiri Wagstaff
  • kiri.wagstaff_at_calstatela.edu

2
Plan for Today
  • Review Reinforcement Learning
  • Ensemble Learning
  • How to combine forces?
  • Voting
  • Error-Correcting Output Codes
  • Bagging
  • Boosting
  • Homework 5
  • Evaluations

3
Review from Lecture 8
  • Reinforcement Learning
  • How different from supervised, unsupervised?
  • Key components
  • Actions, states, transition probs, rewards
  • Markov Decision Process
  • Episodic vs. continuing tasks
  • Value functions, optimal value functions
  • Learn policy (based on V, Q)
  • Model-based value iteration, policy iteration
  • TD learning
  • Deterministic backup rules (max)
  • Nondeterministic TD learning, Q-learning
    (running avg)

4
Ensemble Learning
  • Chapter 15

5
What is Ensemble Learning?
  • No Free Lunch Theorem
  • No single algorithm wins all the time!
  • Ensemble collection of base learners
  • Combine the strengths of each to make a
    super-learner
  • Also considered meta-learning
  • How can you get different learners?
  • How can you combine learners?

6
Where do Learners come from?
  • Different learning algorithms
  • Algorithms with different choice for parameters
  • Data set with different features
  • Data set different subsets
  • Different sub-tasks

7
Combine Learners Voting
  • Linear combination(weighted vote)
  • Classification

Alpaydin 2004 ? The MIT Press
8
Exercise xs and os
9
Different Learners ECOC
  • Error-Correcting Output Code
  • how to define sub-tasks to get different
    learners
  • Maybe use the same base learner, maybe not
  • Key want to be able to detect errors!
  • Example dance steps to convey secret command
  • Three valid commands

Attack Retreat Wait
R L R L L R R R R
Attack Retreat Wait
R L R L L L R R L
10
Error-Correcting Output Code
  • Specifies how to interpret (and detect errors in)
    learner outputs
  • K classes, L learners
  • One learner per class, LK

Column defines task for learner l
Row encoding of class k
Alpaydin 2004 ? The MIT Press
11
ECOC Pairwise Classification
  • L K(K-1)/2
  • 0 dont care

Alpaydin 2004 ? The MIT Press
12
ECOC Full Code
  • Total columns 2(K-1) - 1
  • For K4
  • Goal choose L sub-tasks (columns)
  • Maximize row dist detect errors
  • Maximize column dist different sub-tasks
  • Combine outputs by weighted voting

Alpaydin 2004 ? The MIT Press
13
Different Learners Bagging
  • Bagging bootstrap aggregation
  • Bootstrap draw N items from X with replacement
  • Want unstable learners
  • Unstable high variance
  • Decision trees and ANNs are unstable
  • K-NN is stable
  • Bagging
  • Train L learners on L bootstrap samples
  • Combine outputs by voting

14
Different Learners Boosting
  • Boosting train next learner on mistakes made by
    previous learner(s)
  • Want weak learners
  • Weak P(correct) gt 50, but not necessarily by a
    lot
  • Idea solve easy problems with simple model
  • Save complex model for hard problems

15
Original Boosting
  • Split data X into X1, X2, X3
  • Train L1 on X1
  • Test L1 on X2
  • Train L2 on L1s mistakes on X2 (plus some right)
  • Test L1 and L2 on X3
  • Train L3 on disagreements between L1 and L2
  • Testing apply L1 and L2 if disagree, use L3
  • Drawback need large X

16
AdaBoost Adaptive Boosting
  • Arbitrary number of base learners
  • Re-use data set (like bagging)
  • Use errors to adjust probability of drawing
    samples for next learner
  • Reduce probability if its correct
  • Testing vote, weighted by training accuracy
  • Key difference from bagging
  • Data sets not chosen by chance instead use
    performance of previous learners to select data

17
AdaBoost
Alpaydin 2004 ? The MIT Press
18
AdaBoost Applet
  • http//www.cs.ucsd.edu/yfreund/adaboost/index.htm
    l

19
Summary Key Points for Today
  • No Free Lunch theorem
  • Ensemble combine learners
  • Voting
  • Error-Correcting Output Codes
  • Bagging
  • Boosting

20
Homework 5
21
Next Time
  • Final Project Presentations(no reading
    assignment!)
  • Use order on website
  • Submit slides on CSNS by midnight March 7
  • No, really
  • You may not be able to present if you dont
  • Reports are due to CSNS midnight March 8
  • Early submission March 1
Write a Comment
User Comments (0)
About PowerShow.com