Final Project: Project 9 Part 1: Neural Networks Part 2: Overview of Classifiers - PowerPoint PPT Presentation

About This Presentation
Title:

Final Project: Project 9 Part 1: Neural Networks Part 2: Overview of Classifiers

Description:

Final Project: Project 9 Part 1: Neural Networks Part 2: Overview of Classifiers Aparna S. Varde April 28, 2005 CS539: Machine Learning Course Instructor: Prof ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 28
Provided by: Apa71
Learn more at: http://web.cs.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: Final Project: Project 9 Part 1: Neural Networks Part 2: Overview of Classifiers


1
Final Project Project 9Part 1 Neural
NetworksPart 2 Overview of Classifiers
  • Aparna S. Varde
  • April 28, 2005
  • CS539 Machine Learning
  • Course Instructor Prof. Carolina Ruiz

2
Part 1 Neural Networks
  • Data The data sets used in this project are as
    follows.
  • CPU Data Set
  • Attributes describe features of computer CPUs
    such as vendors.
  • Target attribute denotes CPU performance real.
  • Other attributes are mixture of real, nominal.
  • 8 attributes, 209 instances.
  • Iris Data Set
  • Attributes describe features of Iris flowers such
    as sepals and petals.
  • Target attribute denotes species of Iris
    nominal.
  • 5 attributes, 150 instances.
  • Covtype Set
  • Attributes describe features of forests such as
    soil type, elevation.
  • Target attribute denotes covertype of Forest
    nominal.
  • 55 attributes, approximately 58,000 instances.
  • Attributes describe 12 features, some are Boolean
    namely type1, type 2 etc., so 54 attributes, plus
    target.

3
Preprocessing of Data
  • Resampling
  • WEKA instance-based unsupervised filter used as a
    preprocessing step for the Covtype data to select
    a subset of instances for running experiments.
  • 3 subsets selected, with 5000, 3000 instances and
    1000 instances respectively.
  • This was done to observe the impact of the neural
    network on data sets of different sizes.
  • Supervised Discretization
  • To convert continuous attributes to ranges for
    the Iris data, to observe impact on accuracy.
  • Supervised discretization done with default MDL
    since the target class is nominal.
  • Done using the WEKA attribute-based supervised
    preprocessing filter.
  • Followed by nominal to binary conversion which is
    directly done in the neural net. This will be
    discussed in the experiments section.
  • Unsupervised Discretization
  • Done for the CPU data set only to be able to
    compare it with the other classifiers such as
    decision trees and Zero R.
  • Target initially not nominal, so simple
    discretization done by binning. Discretization
    done for this data because J4.8 classifier used
    for comparison works with nominal targets only.
  • Discretiztion done using the WEKA attribute-based
    unsupervised preprocessing filter.

4
Experiments with Covertype Data
  • Experiments conducted 4-fold-cv used for
    testing.
  • Data set size 1000, 3000, 5000 other parameters
    default.
  • Learning Rate 0.1, 0.2, 0.3 1.0 with best
    settings from above.
  • Momentum 0.1, 0.2, 0.3 1.0 with best settings
    from above.
  • Number of epochs 100, 200, 300 1000 with best
    settings from above.
  • Validation Set 0, 5, 10, 15 . 50 of data
    set, best settings above.
  • Validation Threshold 10, 20, 30 . 100 epochs,
    best settings above.
  • Number of Hidden Layer units a, i, o, t, with
    best settings.
  • i number of input values
  • o number of output classes
  • t io
  • a (io)/2
  • Two Hidden Layers x,a x,t where x is best
    setting from above.
  • Normalization True/False, experiment for
    default/best settings.

5
Experiment 1 Effect of Data Set Size
  • Default Settings Learning Rate 0.3, Momentum
    0.2, Number of epochs 500,
  • Validation Set 0, Number of Hidden Layer
    Units a, Normalization True.
  • The highest accuracy is obtained for the data set
    with 1000 instances.
  • This 1000 instances data set also requires the
    least time to model.
  • The lowest accuracy is obtained for the 3000
    instances data set.
  • The highest time to model was with the 5000
    instances data set.
  • Based on this, 1000 instances data set selected
    for remaining experiments.

6
Experiment 2 Effect of Learning Rate
  • Settings Data Size 1000, Momentum 0.2,
    Number of epochs 500,
  • Validation Set 0, Number of Hidden Layer
    Units a, Normalization True.
  • Maximum time to model is 202.59 seconds for
    Learning Rate of 0.1
  • Minimum time to model is 184.57 seconds for
    Learning Rate of 0.3
  • The lowest accuracy 86 is for Learning Rate of
    1.0
  • In general as learning rate increases, accuracy
    tends to reduce. Also time model is less though
    the drop in time is not as steady as the drop in
    accuracy.
  • The highest accuracy is 87.85 obtained for
    learning rates of 0.2 and 0.4
  • However, time to model is less for learning rate
    of 0.4 than for 0.2
  • Thus the learning rate of 0.4 is selected as the
    setting for further experiments.

7
Experiment 3 Effect of Momentum
  • Settings Data Size 1000, Learning Rate
    0.4, Number of epochs 500,
  • Validation Set 0, Number of Hidden Layer
    Units a, Normalization True.
  • The lowest accuracy is 34.28 obtained for
    momentum of 0.9
  • The highest accuracy is 87.85 obtained for
    momentum of 0.2
  • The longest time to model is 204.36 seconds for
    momentum of 0.8
  • The shortest time to model is 178.06 seconds for
    momentum of 1
  • In general accuracy drops down after momentum 0.7
    and gets really low for momentum of 1
  • The setting selected for further experiments is
    with momentum of 0.2 since it gives the highest
    accuracy of 87.85

8
Experiment 4 Effect of Number of Epochs
  • Settings Data Size 1000, Learning Rate
    0.4, Momentum 0.2,
  • Validation Set 0, Number of Hidden Layer
    Units a, Normalization True.
  • Accuracy increases as the number of epochs
    increase
  • The time to model obviously increases as the
    number of epochs increase
  • The best accuracy of 88.17 is obtained for
    number of epochs 900
  • The lowest accuracy is 84 for number of epochs
    100
  • The setting used for further experiments is
    Number of Epochs 900

9
Experiment 5 Effect of Validation Set Size
  • Settings Data Size 1000, Learning Rate
    0.4, Momentum 0.2,
  • Number of Epochs 900, Number of Hidden Layer
    Units a, Normalization True.
  • As the size of the validation set increases, the
    training time tends to go down for most cases.
  • The accuracy tends to go down as the validation
    set size increases.
  • The best accuracy is actually obtained for
    validation set size of 0, i.e., no validation
    set. However this model has the risk of
    overfitting the training data.
  • Hence the setting selected for further
    experiments is one that is likely to avoid
    overfitting, i.e., one with a validation set.
  • With a validation set of 50, the accuracy is as
    low as 81, and this could be due to the fact
    that less data is available for training.
  • Setting selected is with a validation set of 10.
  • First of all, this model gives very high
    accuracy.
  • Secondly, since this model is considerably fast
    compared to the others.

10
Experiment 6 Effect of Validation Threshold
  • Settings Data Size 1000, Learning Rate 0.4,
    Momentum 0.2, Validation Set 10
  • Number of Epochs 900, Number of Hidden Layer
    Units a, Normalization True.
  • The accuracy stays constant at and after
    validation threshold 50
  • The time to model is also more or less the same
    after validation threshold 50
  • Validation thresholds of 10 and 20 require
    distinctly less time to model than others.
  • The setting selected for further experiments is
    the one that gives highest accuracy 87.79, with
    validation threshold of 20 and time to model
    32.48 seconds.

11
Experiment 7 Effect of Number of Units
  • Settings Data Size 1000, Learning Rate
    0.4, Momentum 0.2, Validation Set 10
  • Number of Epochs 900, Validation Threshold
    20, Normalization True.
  • The number of hidden units t i o requires
    the longest time to model and gives the lowest
    accuracy.
  • The number of hidden units a (i o)/2
    requires the shortest time to model and gives the
    highest accuracy.
  • The setting selected for the next experiments is
    the one with number of hidden units a, which
    gives accuracy of 87.79 and time to model
    32.48 seconds

12
Experiment 8 Effect of 2 Hidden Layers
  • Settings Data Size 1000, Learning Rate 0.4,
    Momentum 0.2, Validation Set 10
  • Number of Epochs 900, Validation Threshold
    20, Normalization True.
  • The highest time to model and also the lowest
    accuracy is obtained for a,i, which means a
    units in the first layer and i units in the
    second.
  • The fastest model is obtained with a,o
    topology.
  • The highest accuracy is with the a,a topology.
    This also happens to be the highest accuracy in
    all the experiments so far.
  • Hence this is considered as the best overall
    setting and is used for the next experiment.

13
Experiment 9 Effect of Normalized Attributes
  • Default Settings Learning Rate 0.3, Momentum
    0.2, Number of epochs 500,
  • Validation Set 0, Number of Hidden Layer
    Units a
  • Best Settings Learning Rate 0.4, Momentum
    0.2, Validation Set 10, Number of Epochs
    900, Validation Threshold 20, Hidden Units
    a,a.
  • The settings without normalization give
    distinctly lower accuracy than those with
    normalization, implying that normalization
    favorably affects accuracy.
  • However the settings with normalization require
    much more time to model, implying that
    normalization makes learning slower.
  • The best accuracy obtained in all the covertype
    experiments with neural nets so far is 89.14
    with the best settings from the previous
    experiments and with normalization. The time
    required to build this model is 508.72 seconds.

14
Experiments with CPU and Iris Data
  • CPU
  • Exp 1 Learning Rate varied from 0.1 to 1.0,
    other parameters default
  • Exp 2 Momentum varied from 0.1 to 1.0, best
    settings from above
  • Exp 3 Normalize Numeric Class, True / False with
    default and best settings
  • Iris
  • Exp 1 Number of Units in 1 hidden layer as a,
    i, o, t, with other parameters default
  • Exp 2 Number of Units in 2 hidden layers, with
    1st layer having best settings from above
  • Exp 3 Nominal to Binary Conversion, True / False
    with default settings and best settings overall

15
CPU Experiment 1 Effect of Learning Rate
  • Settings Momentum 0.2, Number of epochs
    500, Validation Set 0,
  • Number of Hidden Layer Units a, Normalize
    Numeric Class True.
  • The highest correlation coefficient is observed
    for Learning Rate 0.1
  • The lowest correlation coefficient is for
    Learning Rate 1.0
  • In general correlation coefficient decreases as
    learning rate increases
  • The time to model is almost the same for this
    data set and is very fast compared to the CPU
    data set.
  • For the next experiment, the setting selected is
    the one that gives the best correlation, i.e. the
    one with learning rate 0.1

16
CPU Experiment 2 Effect of Momentum
Settings Learning Rate 0.1, Number of epochs
500, Validation Set 0, Number of Hidden
Layer Units a, Normalize Numeric Class
True.
  • The highest correlation is achieved for momentum
    0.1
  • The lowest correlation is achieved for momentum
    1.0
  • For most cases, correlation coefficient has a
    tendency to decrease as the momentum increases
  • The best setting is selected as the one that
    shows the highest correlation. This is for
    momentum 0.1

17
CPU Experiment 3 Effect of Normalizing Numeric
Class
  • Default Settings Learning Rate 0.3, Momentum
    0.2
  • Best Settings Learning Rate 0.1, Momentum
    0.1
  • The default settings with no normalization give a
    negative correlation coefficient implying that
    the attributes are not well correlated
  • For both the settings, correlation coefficient
    increases with normalization.
  • The best overall setting for the CPU data set is
    selected as the last one in the above table,
    i.e., with learning rate 0.1, momentum 0.1,
    normalize numeric class true and other
    parameters default.

18
Iris Experiment 1 Effect of Units in One Hidden
Layer
  • Settings Learning Rate 0.3, Momentum 0.2,
    Number of epochs 500,
  • Validation Set 0, Nominal to Binary True
  • The highest accuracy is 98 observed for number
    of units i
  • The lowest accuracy and also the longest time to
    model is observed for number of units t.
  • The shortest time to model is for number of units
    o
  • The best setting selected is the one with number
    of units i because it gives the highest
    accuracy of 98

19
Iris Experiment 2 Effect of Units in Two Hidden
Layers
  • Settings Learning Rate 0.3, Momentum 0.2,
    Number of epochs 500,
  • Validation Set 0, Units in 1st Hidden Layer
    i, Nominal to Binary True
  • In general two hidden layers give lower accuracy
    than one hidden layer for this data set.
  • The best accuracy obtained is for the i,a and
    i,o settings, however this is still lower than
    the best accuracy with 1 hidden layer
  • The lowest accuracy is for the i,i and i,t
    topologies.
  • The time to model is the longest with i,t
    topology
  • The fastest time to model is with i,a topology

20
Iris Experiment 3 Effect of Nominal To Binary
Conversion
  • Default Topology Hidden Units a
  • Best Topology Hidden Units i
  • Data Discretized Data Set
  • The best accuracy obtained is 96 which is still
    lower than the best one with the raw data set.
    This is without nominal to binary conversion
  • The lowest accuracy is obtained for the same
    settings with nominal to binary conversion

21
Summary of Results
Best Models Obtained
  • L.R. is Learning Rate, M is Momentum, H is hidden
    units, V is validation set size percent and T is
    validation threshold.
  • Covertype has longest time to model, Iris has
    shortest
  • Iris gives highest accuracy

22
Summary (Contd.)
Comparison with Other Classifiers
  • CPU data set shows a negative correlation for
    Zero R, while the best neural net model shows a
    very high positive correlation of 0.9967.
  • The best accuracy for Iris is 98 with neural
    networks which is better than that with decision
    trees.
  • Covertype gives a very high accuracy with
    decision trees, but best model with neural nets
    gives accuracy of 89.14 which is even higher.

23
Part 2 Overview of Classifiers
  • Decision Trees
  • Neural Networks
  • Bayesian Classifiers
  • Genetic Algorithms
  • Instance-Based Learning
  • Classification Rules
  • Final Project Neural Networks Improved

24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
Conclusions
  • Machine Learning Very good course
  • Excellent Professor
  • Great Classmates
  • Very Interactive, Learned a Lot
  • Thank you
Write a Comment
User Comments (0)
About PowerShow.com