Data%20Stream%20Classification%20and%20Novel%20Class%20Detection - PowerPoint PPT Presentation

About This Presentation
Title:

Data%20Stream%20Classification%20and%20Novel%20Class%20Detection

Description:

Data Stream Classification and Novel Class Detection Mehedy Masud, Latifur Khan, Qing Chen and Bhavani Thuraisingham Department of Computer Science , University of ... – PowerPoint PPT presentation

Number of Views:378
Avg rating:3.0/5.0
Slides: 110
Provided by: UTD6
Category:

less

Transcript and Presenter's Notes

Title: Data%20Stream%20Classification%20and%20Novel%20Class%20Detection


1
Data Stream Classification and Novel Class
Detection
  • Mehedy Masud, Latifur Khan, Qing Chen
  • and Bhavani Thuraisingham
  • Department of Computer Science , University of
    Texas at Dallas
  • Jing Gao, Jiawei Han
  • Department of Computer Science , University of
    Illionois at Urbana Champaign
  • Charu Aggarwal
  • IBM T. J. Watson

This work was funded in part by
2
Outline of The Presentation
  • Background
  • Data Stream Classification
  • Novel Class Detection

3
Introduction
  • Characteristics of Data streams are
  • Examples

Network traffic
Sensor data
Call center records
4
Data Stream Classification
  • Uses past labeled data to build classification
    model
  • Predicts the labels of future instances using the
    model
  • Helps decision making

5
Data Stream Classification (cont..)
  • What are the applications?
  • Security Monitoring
  • Network monitoring and traffic engineering.
  • Business credit card transaction flows.
  • Telecommunication calling records.
  • Web logs and web page click streams.

6
Challenges
  • Infinite length
  • Concept-drift
  • Concept-evolution
  • Feature Evolution

7
Infinite Length
  • Impractical to store and use all historical data
  • Requires infinite storage
  • And running time

8
Concept-Drift
A data chunk
Negative instance
Instances victim of concept-drift
Positive instance
9
Concept-Evolution
y
  • - - - - -
  • - - - - - - - - - -

D
y1
C
A
  • - - - - - - -
  • - - - - - - - - - - - - - - - -
  • - - - - - - - - - - - - - - -
  • - - - - - - - - - - - - - - -
  • - - - - - - -- - - - -




y2
B

x1
x
Classification rules R1. if (x gt x1 and y lt y2)
or (x lt x1 and y lt y1) then class R2. if (x gt
x1 and y gt y2) or (x lt x1 and y gt y1) then class
-
Existing classification models misclassify novel
class instances
10
Dynamic Features
  • Why new features evolving
  • Infinite data stream
  • Normally, global feature set is unknown
  • New features may appear
  • Concept drift
  • As concept drifting, new features may appear
  • Concept evolution
  • New type of class normally holds new set of
    features

Different chunks may have different feature sets
11
Dynamic Features
ith chunk and i 1st chunk and models have
different feature sets
runway, climb
runway, clear, ramp
Feature Set
ith chunk
runway, ground, ramp
Current model
Feature Space Conversion
Classification Novel Class Detection
Training New Model
Existing classification models need complete
fixed features and apply to all the chunks.
Global features are difficult to predict. One
solution is using all English words and generate
vector. Dimension of the vector will be too high.
12
Outline of The Presentation
  • Introduction
  • Data Stream Classification
  • Novel Class Detection

13
DataStream Classification (cont..)
  • Single Model Incremental Classification
  • Ensemble model based classification
  • Supervised
  • Semi-supervised
  • Active learning

14
Overview
  • Single Model Incremental Classification
  • Ensemble model based classification
  • Data Selection
  • Semi-supervised
  • Skewed Data

I
15
Ensemble of Classifiers
C1

C2


x,?
C3
-
input
Individual outputs
voting
Ensemble output
Classifier
16
Ensemble Classification of Data Streams
  • Divide the data stream into equal sized chunks
  • Train a classifier from each data chunk
  • Keep the best L such classifier-ensemble
  • Example for L 3

Note Di may contain data points from different
classes
D4
D5
D6
Labeled chunk
Data chunks
Unlabeled chunk
Addresses infinite length and concept-drift
C4
C5
Classifiers
C1
C2
C3
C4
C5
Ensemble
17
Concept-Evolution Problem
ECSMiner
  • A completely new class of data arrives in the
    stream

y
  • - - - - -
  • - - - - - - - - - -

xltx1
D
y1
F
T
C
ylty2
ylty1
  • - - - - - - -
  • - - - - - - - - - - - - - - - -
  • - - - - - - - - - - - - - - -
  • - - - - - - - - - - - - - - -
  • - - - - - - -- - - - -

F
F
T
T
-
-

y2
C
B
D
B

x1
x
(a)
(b)
(a) A decision tree, (b) corresponding feature
space partitioning
18
ECSMiner Overview
ECSMiner
Data Stream
Older instances (labeled)
Newer instances (unlabeled)
Last labeled chunk
Buffer?
Ensemble of L models
ML
. . .
M1
M2
Overview of ECSMiner algorithm
Based on Mohammad M. Masud, Jing Gao, Latifur
Khan, Jiawei Han, and Bhavani Thuraisingham.
Integrating Novel Class Detection with
Classification for Concept-Drifting Data
Streams. In Proceedings of 2009 European Conf.
on Machine Learning and Principles and Practice
of Knowledge Discovery in Databases
(ECML/PKDD09), Bled, Slovenia, 7-11 Sept, 2009,
pp 79-94 (extended version appeared in IEEE
Transaction on Knowledge and Data Engineering
(TKDE)).
19
Algorithm
ECSMiner
20
Novel Class Detection
ECSMiner
  • Non parametric
  • does not assume any underlying model of existing
    classes
  • Steps
  • Creating and saving decision boundary during
    training
  • Detecting and filtering outliers
  • Measuring cohesion and separation among test and
    training instances

21
Training Creating Decision Boundary
ECSMiner
Raw training data
Clusters are created
y
  • - - - -
  • - -
  • - - - - - - -

D
y1
C
A
  • - - - - - - - - - -
  • - - - - - - - - - - -
  • - - - - - - - - - - -
  • - - - - - - - - - - -




y2
B

x1
x
Addresses Infinite length problem
22
Outlier Detection and Filtering
ECSMiner
Test instance inside decision boundary (not
outlier)
Test instance outside decision boundary Raw
outlier or Routlier
y
x
D
y1
C
A
Routlier
Routlier
Routlier
x
X is an existing class instance
AND
False
y2
True
B
X is a filtered outlier (Foutlier) (potential
novel class instance)
x1
x
Routliers may appear as a result of novel class,
concept-drift, or noise. Therefore, they are
filtered to reduce noise as much as possible.
23
Novel Class Detection
ECSMiner
q-NSCgt0 for qgtq Foutliers with all models?
(Step 1)
(Step 4)
Routlier
N
Treat as existing class
Routlier
Routlier
X is an existing class instance
AND
(Step 2)
False
True
X is a filtered outlier (Foutlier) (potential
novel class instance)
Compute q-NSC with all models and other Foutliers
Y
Novel class found
(Step 3)
24
Computing Cohesion Separation
ECSMiner
? o,5(x)
a(x)
x
?-,5(x)
?,5(x)
b(x)
b-(x)
  • - -
  • - -

  • -
  • - -

  • a(x) mean distance from an Foutlier x to the
    instances in ?o,q(x)
  • bmin(x) minimum among all bc(x) (e.g. b(x) in
    figure)
  • q-Neighborhood Silhouette Coefficient (q-NSC)
  • If q-NSC(x) is positive, it means x is closer to
    Foutliers than any other class.

25
Speeding Up
  • Computing N-NSC for every Foutlier instance x
    takes quadratic time in the number of Foutliers.
  • In order to make the computation faster,
  • We create Ko pseudopoints (Fpseudopoints) from
    Foutliers using K-means clustering,
  • where Ko (No/S) K. Here S is the chunk size
    and No is the number of Foutliers.
  • perform the computations on the Fpseudopoints
  • Thus, the time complexity
  • to compute the N-NSC of all of the Fpseudopoints
    is O(Ko(KoK))
  • which is constant, since both Ko and K are
    independent of the input size.
  • However, by gaining speed we lose some precision,
    although the loss is negligible (to be analyzed
    shortly)

26
Algorithm To Detect Novel Class
ECSMiner
27
Speedup Penalty
  • As discussed earlier
  • by speeding up computation in step 3, we lose
    some precision since the result deviates from
    exact result
  • This analysis shows that the deviation is
    negligible

(x-?i)2
?i
?i
x
(?i-?j)2
?j
(x-?j)2
?j
Figure 6. Illustrating the computation of
deviation. ?i is an Fpseudopoint, i,e., a cluster
of Foutliers, and ?j is an existing class
Pseudopoint, i.e., a cluster of existing class
instances. In this particular example, all
instances in i belong to a novel class.
28
Speedup Penalty
Approximate
Exact
Deviation
29
Experiments - Datasets
  • We evaluated our approach on two synthetic and
    two real datasets
  • SynC Synthetic data with only concept-drift.
    Generated using hyperplane equation. 2 classes,
    10 attributes, 250K instances
  • SynCN Synthetic data with concept-drift and
    novel class. Generated using Gaussian
    distribution. 20 classes, 40 attributes, 400K
    instances
  • KDD cup 1999 intrusion detection (10 version)
    real dataset. 23 classes, 34 attributes, 490K
    instances
  • Forest cover real dataset. 7 classes, 54
    attributes, 581K instances

30
Experiments - Setup
  • Development
  • Language Java
  • H/W
  • Intel P-IV with
  • 2GB memory and
  • 3GHz dual processor CPU.
  • Parameter settings
  • K (number of pseudopoints per chunk) 50
  • N (minimum number of instances required to
    declare novel class) 50
  • M (ensemble size) 6
  • S (chunk size) 2,000

31
Experiments - Baseline
  • Competing approaches
  • i) MineClass (MC) our approach
  • ii) WCE-OLINDDA_Parallel (W-OP)
  • iii) WCE-OLINDDA_Single (W-OS) Where WCE-OLINDDA
    is a combination of the Weighted Classifier
    Ensemble (WCE) and novel class detector OLINDDA,
    with default parameter settings for WCE and
    OLINDDA
  • We use this combination since to the best of our
    knowledge there is no approach that Can classify
    and detect novel classes simultaneously
  • OLINDDA assumes there is only one normal class,
    and all other classes are novel
  • Therefore, we apply two variations
  • W-OP keeps parallel OLINDDA models, one for each
    class
  • W-OS keeps a single model that absorbs a novel
    class when encountered

32
Experiments - Results
  • Evaluation metrics
  • Mnew of novel class instances Misclassified
    as existing class
    Fn100/Nc
  • Fnew of existing class instances Falsely
    identified as novel class
    Fp100/ (N-Nc)
  • ERR Total misclassification error ()(including
    Mnew and Fnew) (FpFnFe)100/N
  • where Fn total novel class instances
    misclassified as existing class,
  • Fp total existing class instances misclassified
    as novel class,
  • Fe total existing class instances misclassified
    (other than Fp),
  • Nc total novel class instances in the stream,
  • N total instances the stream.

33
Experiments - Results
Forest Cover
KDD cup
SynCN
34
Experiments - Results
35
Experiments Parameter Sensitivity
36
Experiments Runtime
37
Dynamic Features
  • Solution
  • Global Features
  • Local Features
  • Union
  • Mohammad Masud, Qing Chen, Latifur Khan, Jing
    Gao, Jiawei Han, and Bhavani Thuraisingham,
    Classification and Novel Class Detection of Data
    Streams in A Dynamic Feature Space, in Proc. of
    Machine Learning and Knowledge Discovery in
    Databases, European Conference, ECML PKDD 2010,
    Barcelona, Spain, Sept 2010, Springer, Page
    337-352

38
Feature Mapping Across Models and Test Data
Points
  • Feature set varies in different chunks.
    Especially, when new class appears, new features
    should be selected and added to the feature set.
  • Strategy 1 Lossy fixed (Lossy-F) conversion /
    Global
  • Use the same fixed feature in the entire stream.
  • We call this a lossy conversion because future
    model and instances may lose important features
    due to this mapping.
  • Strategy 2 Lossy local (Lossy-L) conversion /
    Local
  • We call this lossy conversion because it may loss
    feature values during mapping.
  • Strategy 3 Dimension preserving (D-Preserving)
    Mapping / Union

39
Feature Space Conversion Lossy-L Mapping (Local)
  • Assume that each data chunk has different feature
    vectors
  • When a classification model is trained, we save
    the feature vector with the model
  • When an instance is tested, its feature vector is
    mapped (i.e., projected) to the models feature
    vector.

40
Feature Space Conversion Lossy-L Mapping
  • For example,
  • Suppose the model has two features (x,y)
  • The instance has two features (y,z)
  • When testing, assume the instance has two
    features (x,y)
  • Where x 0, and y value is kept as it is

41
Conversion Strategy II Lossy-L Mapping
  • Graphically

42
Conversion Strategy III D-Preserving Mapping
  • When an instance is tested, both the models
    feature vector and the instances feature vector
    are mapped (i.e., projected) to the union of
    their feature vectors.
  • The feature dimension is increased.
  • In the mapping, both the features in the testing
    instance and model are preserved. The extra
    features are filled with all 0s.

43
Conversion Strategy III D-Preserving Mapping
  • For example,
  • suppose the model has three features (a,b,c)
  • The instance has four features (b,c,d,e)
  • When testing, we project both the models feature
    vector and the instances feature vector to
    (a,b,c,d,e)
  • Therefore, in the model, d, and e will be
    considered 0s and in the instance, a will be
    considered 0

44
Conversion Strategy III D-Preserving Mapping
  • Previous Example

45
Discussion
  • Local does not favor novel class, it favors
    existing classes.
  • Local features will be enough to model existing
    classes.
  • Union favors novel class.
  • New features may be discriminating for novel
    class, hence Union works.

46
Comparison
  • Which strategy is the better?
  • Assumption lossless conversion (union) preserves
    the properties of a novel class.
  • In other words, if an instance belongs to a novel
    class, it remains outside the decision boundary
    of any model Mi of the ensemble M in the
    converted feature space. Lemma
  • If a test point x belongs to a novel class, it
    will be miss-classified by the ensemble M as an
    existing class instance under certain conditions
    when the Lossy-L conversion is used.

47
Comparison
  • Proof
  • Let X1,,XL,XL1,,XM be the dimensions of the
    model and
  • Let X1,,XL,XM1,,XN be the dimensions of the
    test point
  • Suppose the radius of the closest cluster (in the
    higher dimension) is R
  • Also, let the test point be a novel class
    instance.
  • Combined feature space X1,,XL,XL1,,XM,XM1,,
    XN

48
Comparison
  • Proof (continued)
  • Combined feature space X1,,XL,XL1,,XM,XM1,,
    XN
  • Centroid of the cluster (original space)
    X1x1,,XLxL,XL1xL1,,XMxM i.e., x1,,xL,
    xL1,,xM
  • Centroid of the cluster (combined space)
    x1,,xL, xL1,,xM , 0,,0
  • Test point (original space)
  • X1x1,,XLxL,XM1xM1,,XNxN i.e.,
    x1,,xL, xM1,,xN
  • Test point (combined space) x1,,xL,
    0,,0, xM1,,xN

49
Comparison
  • Proof (continued)
  • Centroid (combined spc) x1,,xL, xL1,,xM
    , 0 ,, 0
  • Test point (combined space) x1,,xL, 0,,
    0, xM1,,xN
  • R2lt ((x1 x1)2,, (xL xL)2 x2L1x2M)
    (x2M1x2N)
  • R2lt a2
    b2
  • R2 a2 b2 - e2 (e2 gt0)
  • a2 R2 (e2 b2)
  • a2 lt R2 (provided that e2 lt b2)
  • Therefore, in Lossy-L conversion, the test point
    will not be an outlier

50
Baseline Approaches
  • WCE is Weighted Classifier Ensemble1, which
    addresses multi-class ensemble classifier.
  • OLINDDA is a novel class detector 2 works only
    for binary class.
  • FAE algorithm is an ensemble classifier that
    addresses feature evolution3 and concept drift.
  • ECSMiner is a multi-class ensemble classifier
    that addresses concept drift and concept
    evolution4.

51
Approaches Comparison
Proposed techniques Challenges Challenges Challenges Challenges
Proposed techniques Infinite length Concept-drift Concept-evolution Dynamic Features
OLINDDA
WCE
FAE
ECSMiner
DXMiner
52
Experiments Datasets
  • We evaluated our approach on different datasets

Data Set Concept Drift Concept Evolution Dynamic Feature of Instance of Class
KDD 492K 7
Forest Cover 387K 7
NASA 140K 21
Twitter 335K 21
53
Experiments Results
  • Evaluation metrics let
  • Fn total novel class instances misclassified as
    existing class,
  • Fp total existing class instances misclassified
    as novel class,
  • Fe total existing class instances misclassified
    (other than Fp),
  • Nc total novel class instances in the stream,
  • N total instances the stream

54
Experiments Results
  • We use the following performance metrics to
    evaluate our technique
  • Mnew of novel class instances Misclassified
    as existing class, i.e,
  • Fnew of existing class instances Falsely
    identified as novel class, i.e.,
  • ERR Total misclassification error ()(including
    Mnew and Fnew), i.e.,

55
Experiments Setup
  • Development
  • Language Java
  • H/W
  • Intel P-IV with
  • 3GB memory and
  • 3GHz dual processor CPU.
  • Parameter settings
  • K (number of pseudo points per chunk) 50
  • q (minimum number of instances required to
    declare novel class) 50
  • L (ensemble size) 6
  • S (chunk size) 1,000

56
Experiments Baseline
  • Competing approaches
  • i) DXMiner (DXM) our approach- 4 variations
  • Lossy-F conversion
  • Lossy-L conversion
  • D-Preserving conversion
  • ii) FAE-WCE-OLINDDA_Parallel (W-OP)
  • Assumes there is only one normal class, and all
    other classes are novel . W-OP keeps parallel
    OLINDDA models, one for each class
  • We use this combination since to the best of our
    knowledge there is no approach that can classify
    and detect novel classes simultaneously with
    feature evolution.
  • iii) FAE-ECSMiner

57
Twitter Results
58
Twitter Results
D-preserving Lossy -Local Lossy-Global O-F
AUC 0.88 0.83 0.76 0.56
59
NASA Dataset
Deviation Info Gain O-F
AUC 0.996 0.967 0.876
60
Forest Cover Results
61
Forest Cover Results
D-preserving O-F
AUC 0.97 0.74
62
KDD Results
63
KDD Results
D-preserving FAE-Olindda
AUC 0.98 0.96
64
Summary Results
65
Novel Class Detection Failures
Proposed Methods
  • False Positive
  • An existing class instance is misclassified as a
    novel class instance
  • False Negative
  • A novel class instance is misclassified as an
    existing class instance
  • Novel Class detection model is a two step
    process
  • Step 1 Indentify the instances that are outside
    of existing model clusters and buffered them as
    outliers
  • Step 2 Analysis the buffered outlier instances
    and calculate outliers cohesion and separation to
    existing model clusters, and then use the model
    clusters to vote for novel class decision.

Failures may occur in both steps
66
Novel Class Detection Failures
Proposed Methods
  • Proposed solutions
  • In step 1, an existing class instance may occur
    outside of model cluster due to data noise.
  • We need to select a proper model cluster range,
    to reduce the existing class instance in outliers
    dynamic OUTTH.
  • In step 2, outliers may not be novel class
    instances.
  • Data noise and concept drift may cause existing
    class instance to be outliers
  • Build statistic model to filter out noisy and
    concept drift data from the outliers.

67
Improved Outlier Detection and Multiple Novel
Class Detection
Proposed Methods
  • Challenges
  • High false positive (FP) (existing classes
    detected as novel) and false negative (FN)
    (missed novel classes) rates
  • Two or more novel classes arrive at a time
  • Solutions1
  • Dynamic decision boundary based on previous
    mistakes
  • Inflate the decision boundary if high FP, deflate
    if high FN
  • Build statistical model to filter out noise data
    and concept drift from the outliers.
  • Multiple novel classes are detected by
  • Constructing a graph where outlier cluster is a
    vertex
  • Merging the vertices based on silhouette
    coefficient
  • Counting the number of connected components in
    the resultant (i.e., merged) graph

1 Mohammad M. Masud, Qing Chen, Jing Gao, Latifur
Khan, Charu Aggarwal, Jiawei Han, and Bhavani
Thuraisingham, Addressing Concept-Evolution in
Concept-Drifting Data Streams, In Proc ICDM 10,
Sydney, Australia, Dec 14-17, 2010.
68
Outlier Threshold (OUTTH)
Proposed Methods
  • To declare a testing instance being an outlier,
    using cluster radius r is not enough because of
    the data noise
  • So, beyond the radius r, a threshold (OUTTH) will
    be setup, so that most noisy data around model
    cluster will be classified immediately

? o,5(x)
a(x)
x
b(x)
?,5(x)

69
Outlier Threshold (OUTTH)
Proposed Methods
  • Every instance outside the cluster range has a
    weight
  • If wt(x) gt OUTTH, this instance will be consider
    as existing class.
  • If wt(x) lt OUTTH, this instance will be an
    outlier.
  • Pros
  • Noisy data will be classified immediately
  • Cons
  • OUTTH is hard to be determined
  • Noisy data and novel class instance may occur
    simultaneously
  • Different dataset may have different OUTTH

70
Outlier Threshold (OUTTH)
Proposed Methods
? o,5(x)
a(x)
x
b(x)
?,5(x)

OUTTH ?
  • If threshold is too high, noisy data may become
    outlier
  • FP rate will go up
  • If threshold is too low, novel class instance
    will be labeled as existing class
  • FN rate will go up

We need to balance on these two
71
  • Introduction
  • Data Stream Classification
  • Clustering
  • Novel Class Detection
  • Finer Grain Novel Class Detection
  • Dynamic Novel Class Detection
  • Multiple Novel Class Detection

72
Dynamic threshold setting
Proposed Methods
a(x)
Marginal FN
x

Marginal FP
  • Defer approach
  • After a testing chunk has been labeled, based on
    the marginal FP and FN rate of the this testing
    chunk update the OUTTH, and then apply the new
    OUTTH to the next testing chunk
  • Eager approach
  • What is marginal FP or marginal FN
  • Once a marginal FP or marginal FN instance
    detected, update OUTTH with step function, and
    apply the updated OUTTH to the next testing
    instance

73
Dynamic threshold setting
Proposed Methods
74
Defer approach and Eager approach comparison
Proposed Methods
  • In Defer approach, OUTTH updates after a data
    chunk is labeled
  • Too late In the testing chunk, many marginal FP
    or FN may occur due to an improper OUTTH
    threshold
  • Overreact If there are many marginal FP or FN
    instances in the labeled testing chunk, the OUTTH
    update may overreact for the next testing chunk
  • In Eager approach, OUTTH updates aggressively
    whenever marginal FP or FN happens.
  • The model is more tolerate to noisy data and
    concept drift.
  • The model is more sensitive to novel class
    instances.

75
Outliers Statistics
Proposed Methods
  • For each outlier instance, we calculate the
    novelty probability Pnov
  • If Pnov is large (close to 1), indicates that the
    outlier has a high probability of being a novel
    instance.
  • Pnov contains two parts
  • The first part measures how far the outlier being
    away from the model cluster
  • The second part Psc is the Silhouette
    Coefficient, measures the cohesion and
    separation to the model cluster of the
    q-Neighbors of the outlier

76
Outliers Statistics
Proposed Methods
  • Noise Data
  • Concept Drift
  • Novel Class

Three scenarios may occur simultaneously
77
Outlier Statistics Gini Analysis
Proposed Methods
  • The Gini coefficient is a measure of statistical
    inequality. The discrete Gini coefficient is
  • If we divide 01 into n equal size bin, and put
    all outlier Pnov into corresponding bin, then we
    can get cdf yi
  • If all Pnov is very low, to an extreme cdf yi
    1
  • If all Pnov are very high, to an extreme cdf yi
    0 except yn1

78
Outlier Statistics Gini Analysis
Proposed Methods
  • If all outlier Pnov distribute evenly, yi i/n

79
Outlier Statistics Gini Analysis Limitation
Proposed Methods
  • To an extreme, it is impossible the differentiate
    concept drift and concept evolution by Gini
    coefficient, when concept drift is just looks
    like concept evolution.

80
  • Introduction
  • Data Stream Classification
  • Clustering
  • Novel Class Detection
  • Finer Grain Novel Class Detection
  • Dynamic Novel Class Detection
  • Multiple Novel Class Detection

81
Multi Novel Class Detection
Proposed Methods
Positive Instance
Data Stream
Novel class A
Negative Instance
Novel class B
Novel Instance
y
y1
y2
y2
y2
x1
x
x1
x
If we always assume novel instances belong to one
novel type, one type of novel instances, either A
or B, will be misclassified.
82
Multi Novel Class Detection
Proposed Methods
  • The main idea in detecting multiple novel classes
    is to construct a graph, and identify the
    connected components in the graph.
  • The number of connected components determines the
    number of novel classes.

83
Multi Novel Class Detection
Proposed Methods
  • Two Phases
  • Building the connected graph
  • Build directed nearest neighbor graph. From each
    vertex (outlier cluster), add edge from this
    vertex to its nearest neighbor.
  • Silhouette coefficient from the vertex to its
    nearest neighbor is larger than some threshold,
    the edge will be removed.
  • Problem Linkage Circle
  • Component merging phase
  • Gaussian distribution centric decision

84
Multi Novel Class Detection
Proposed Methods
  • Component merging phase
  • In probability theory, the normal (or Gaussian)
    distribution, is a continuous probability
    distribution that is often used as a first
    approximation to describe real-valued random
    variables that tend to cluster around a single
    mean value 1
  • If two Gaussian Distribution variables (g1, g2)
    can be separated, the following condition will be
    hold
  • Since µ is proportion to s, if the two
    variables (components) will remain separated
    otherwise, these two components will be merged.
  1. Amari Shunichi, Nagaoka Hiroshi. Methods of
    information geometry. Oxford University Press.
    ISBN 0-8218-0531-2, 2000.

85
Experiments Datasets
Experiment Results
  • We evaluated our approach on different datasets

Data Set Concept Drift Concept Evolution Dynamic Feature of Instance of Class
KDD 492K 7
Forest Cover 387K 7
NASA 140K 21
Twitter 335K 21
SynED 400K 20
86
Experiments Setup
Experiment Results
  • Development
  • Language Java
  • H/W
  • Intel P-IV with
  • 3GB memory and
  • 3GHz dual processor CPU.
  • Parameter settings
  • K (number of pseudo points per chunk) 50
  • q (minimum number of instances required to
    declare novel class) 50
  • L (ensemble size) 6
  • S (chunk size) 1,000

87
Experiments Baseline
  • Competing approaches
  • i) DEMminer our approach- 5 variations
  • Lossy-F conversion
  • Lossy-L conversion
  • Lossless conversion - DEMminer
  • Dynamic OUTTH Lossless conversion -
    DEMminer-Ex (without Gini)
  • Dynamic OUTTH Gini Lossless conversion -
    DEMminer-Ex
  • ii) WCE-OLINDDA (O-W)
  • iii) FAE-WCE-OLINDDA_Parallel (O-F)
  • We use this combination since to the best of our
    knowledge there is no approach that can classify
    and detect novel classes simultaneously with
    feature evolution.

88
Experiments Results
Experiment Results
  • Evaluation metrics
  • Fn total novel class instances misclassified as
    existing class,
  • Fp total existing class instances misclassified
    as novel class,
  • Fe total existing class instances misclassified
    (other than Fp),
  • Nc total novel class instances in the stream,
  • N total instances the stream

89
Twitter Results
Experiment Results
90
Twitter Results
Experiment Results
DEMminer Lossy -L Lossy-F O-F
AUC 0.88 0.83 0.76 0.56
91
Twitter Results
Experiment Results
92
Twitter Results
Experiment Results
DEMminer-Ex DEMminer OW
AUC 0.94 0.88 0.56
93
Forest Cover Results
Experiment Results
94
Forest Cover Results
Experiment Results
DEMminer DEMminer-Ex (without Gini) DEMminer-Ex OW
AUC 0.97 0.99 0.97 0.74
95
NASA Dataset
Experiment Results
96
NASA Dataset
Experiment Results
Deviation Info Gain FAE
AUC 0.996 0.967 0.876
97
KDD Results
Experiment Results
98
KDD Results
Experiment Results
DEMminer O-F
AUC 0.98 0.96
99
Result Summary
Experiment Results
Dataset Method ERR Mnew Fnew AUC FP FN
Twitter DEMminer Lossy-F Lossy-L O-F 4.2 30.5 0.8 32.5 0.0 32.6 1.6 82.0 0.0 3.4 96.7 1.6 0.877 0.834 0.764 0.557 - - - - - - - -
ASRS DEMminer DEMminer(info-gain) O-F 0.02 - - 1.4 - - 3.4 - - 0.996 0.967 0.876 0.00 0.1 0.04 10.3 0.00 24.7
Forest Cover DEMminer O-F 3.6 8.4 1.3 5.9 20.6 1.1 0.973 0.743 - - - -
KDD DEMminer O-F 1.2 5.9 0.9 4.7 9.6 4.4 0.986 0.967 - - - -
100
Result Summary
Experiment Results
Dataset Method ERR Mnew Fnew AUC
Twitter DEMminer DEMminer-Ex OW 4.2 30.5 0.8 1.8 0.7 0.6 3.4 96.7 1.6 0.877 0.944 0.557
Forest Cover DEMminer DEMminer-Ex OW 3.6 8.4 1.3 3.1 4.0 0.68 5.9 20.6 1.1 0.974 0.990 0.743
101
Running Time Comparison
Experiment Results
Dataset Time(sec)1/K Time(sec)1/K Time(sec)1/K Points/sec Points/sec Points/sec Speed gain
Dataset DEMminer Lossy-F O-F DEMminer Lossy-F O-F DEMminer over O-F
Twitter 23 3.5 66.7 43 289 15 2.9
ASRS 21 4.3 38.5 47 233 26 1.8
Forest Cover 1.0 1.0 4.7 967 1003 212 4.7
KDD 1.2 1.2 3.3 858 812 334 2.5
102
Multi Novel Detection Results
Experiment Results
103
Multi Novel Detection Results
Experiment Results
104
Conclusion
Experiment Results
  • Our data stream classification technique
    addresses
  • Infinite length
  • Concept-drift
  • Concept-evolution
  • Feature-evolution
  • Existing approaches only address first two issues
  • Applicable to many domains such as
  • Intrusion/malware detection
  • Text categorization
  • Fault detection etc.

105
References
  • J. Gehrke, V. Ganti, R. Ramakrishnan, and W. Loh.
    BOAT-Optimistic Decision Tree Construction. In
    Proc. SIGMOD, 1999.
  • P. Domingos and G. Hulten, Mining high-speed
    data streams. In Proc. SIGKDD, pages 71-80,
    2000.
  • Wenerstrom, B., Giraud-Carrier, C., Temporal
    data mining in dynamic feature spaces. In
    Perner, P. (ed.) ICDM 2006. LNCS (LNAI), vol.
    4065, pp. 1141.1145. Springer, Heidelberg (2006)
  • E. J. Spinosa, A. P. de Leon F. de Carvalho, and
    J. Gama. Cluster-based novel concept detection
    in data streams applied to intrusion detection in
    computer networks. In Proc. 2008 ACM symposium
    on Applied computing, pages 976980, (2008).
  • M. Scholz and R. Klinkenberg. An ensemble
    classifier for drifting concepts. In Proc.
    ICML/PKDD Workshop in Knowledge Discovery in Data
    Streams., 2005.

106
References (contd.)
  • Brutlag, J.(2000). Aberrant behavior detection
    in time series for network monitoring. In Proc.
    Usenix Fourteenth System Admin. Conf. LISA XIV,
    New Orleans, LA. (Dec 2000)
  • Eskin, E., Arnold, A., Prerau, M., Portnoy, L.,
    Stolfo, S. A geometric framework for
    unsupervised anomaly detection Detection
    intrusions in unlabeled data. Applications of
    Data Mining in Computer Security, Kluwer (2002).
  • Fan, W. Systematic data selection to mine
    concept-drifting data streams. In Proc. KDD 04
  • Gao, J, Wei Fan, and Jiawei Han. (2007a). "On
    Appropriate Assumptions to Mine Data Streams
  • Gao, J. Wei Fan, Jiawei Han, Philip S. Yu.
    (2007b). A General Framework for Mining
    Concept-Drifting Data Streams with Skewed
    Distributions. SDM 2007
  • Goebel, J. and T. Holz. Rishi Identify bot
    contaminated hosts by irc nickname evaluation. In
    Usenix/Hotbots 07 Workshop, 2007.
  • Grizzard, J. B., V. Sharma, C. Nunnery, B. B.
    Kang, and D. Dagon (2007). Peer-to-peer botnets
    Overview and case study. In Usenix/Hotbots 07
    Workshop.

107
References (contd.)
  • Keogh Pazzani, (2000) E.J., J., P.M. Scaling
    up dynamic time warping for data mining
    applications. In ACM SIGKDD. (2000)
  • Lemos, R. (2006) Bot software looks to improve
    peerage. SecurityFocus. http//www.securityfocus.c
    om/news/11390 (2006).
  • Livadas, C., B.Walsh, D. Lapsley, and T. Strayer
    (2006) Using machine learning techniques to
    identify botnet traffic. In 2nd IEEE LCN
    Workshop on Network Security (WoNS2006),
    November 2006.
  • LURHQ Threat Intelligence Group (2004). Sinit p2p
    trojan analysis. http//www.lurhq.com/sinit.html
    (2004)
  • Rajab, M. A. J. Zarfoss, F. Monrose, and A.
    Terzis (2006) A multifaceted approach to
    understanding the botnet phenomenon. In
    Proceedings of the 6th ACM SIGCOMM on Internet
    Measurement Conference (IMC), 2006.
  • Kagan Tumar and Joydeep ghosh (1996).Error
    correlation and error reduction in ensemble
    classifiers (Connection sciece), 8(3-4)385-403

108
References (contd.)
  • Mohammad Masud, Jing Gao, Latifur Khan, Jiawei
    Han, and Bhavani Thuraisingham, A
    Multi-Partition Multi-Chunk Ensemble Technique to
    Classify Concept-Drifting Data Streams. In
    Proc, of 13th Pacific-Asia Conference on
    Knowledge Discovery and Data Mining (PAKDD-09),
    Page 363-375, Bangkok, Thailand, April 2009.
  • Mohammad Masud, Jing Gao, Latifur Khan, Jiawei
    Han, and Bhavani Thuraisingham, A Practical
    Approach to Classify Evolving Data Streams
    Training with Limited Amount of Labeled Data. In
    Proc. of  2008 IEEE International Conference on
    Data Mining  (ICDM 2008), Pisa, Italy, Page
    929-934, December, 2008.
  • Clay Woolam, Mohammed Masud, and Latifur Khan ,
    Lacking Labels In The Stream Classifying
    Evolving Stream Data With Few Labels. In Proc.
    of 18th International Symposium on Methodologies
    for Intelligent Systems (ISMIS), Page 552-562,
    September 2009 Prague, Czech Republic

109
References (contd.)
  • Mohammad Masud, Qing Chen, Latifur Khan, Charu
    Aggarwal, Jing Gao, Jiawei Han, and Bhavani
    Thuraisingham,  Addressing Concept-Evolution in
    Concept-Drifting Data Streams.  In Proc. of 2010
    10th IEEE International Conference on Data Mining
    (ICDM 2010), Sydney, Australia, Dec 2010.
  • Mohammad M. Masud, Qing Chen, Jing Gao, Latifur
    Khan, Jiawei Han, Bhavani Thuraisingham ,
    Classification and Novel Class Detection of Data
    Streams in a Dynamic Feature Space. In Proc. of
    European Conference on Machine Learning and
    Knowledge Discovery in Databases, ECML PKDD 2010,
    Barcelona, Spain, September 20- 24, 2010,
    Springer 2010, ISBN 978-3-642-15882-7, Page
    337-352.
  • Mohammad M. Masud, Jing Gao, Latifur Khan, Jiawei
    Han, and Bhavani Thuraisingham, Classification
    and Novel Class Detection in Data Streams with
    Active Mining. In Proc of 14th Pacific-Asia
    Conference on Knowledge Discovery and Data
    Mining, 21-24 June, 2010, Page 311-324, -
    Hyderabad, India.

110
References (contd.)
  • Mohammad M. Masud, Jing Gao, Latifur Khan, Jiawei
    Han, and Bhavani Thuraisingham, Classification
    and Novel Class Detection in Concept-Drifting
    Data Streams under Time Constraints" , IEEE
    Transactions on Knowledge Data Engineering
    (TKDE), 2011,  IEEE Computer Society, June 2011,
    Vol. 23, No. 6, Page  859-874.
  • Charu C. Aggarwal, Jiawei Han, Jianyong Wang,
    Philip S. Yu, A Framework for Clustering
    Evolving Data streams Published in Proceedings
    VLDB 03 proceedings of the 29th international
    conference on Very Large Data Bases-Volume 29
  • H. Wang, W. Fan, P. S. Yu, and J. Han. Mining
    concept-drifting data streams using ensemble
    classifiers. In Proc. ninth ACM SIGKDD
    international conference on Knowledge discovery
    and data mining, pages 226235, Washington, DC,
    USA, Aug, 2003. ACM.
  • Mohammad M. Masud, Jing Gao, Latifur Khan, Jiawei
    Han, and Bhavani Thuraisingham. Integrating
    Novel Class Detection with Classification for
    Concept-Drifting Data Streams. In Proceedings of
    2009 European Conf. on Machine Learning and
    Principles and Practice of Knowledge Discovery in
    Databases (ECML/PKDD09), Bled, Slovenia, 7-11
    Sept, 2009.

111
  • Questions
Write a Comment
User Comments (0)
About PowerShow.com