Title: Data%20Stream%20Classification%20and%20Novel%20Class%20Detection
1Data Stream Classification and Novel Class
Detection
- Mehedy Masud, Latifur Khan, Qing Chen
- and Bhavani Thuraisingham
- Department of Computer Science , University of
Texas at Dallas - Jing Gao, Jiawei Han
- Department of Computer Science , University of
Illionois at Urbana Champaign - Charu Aggarwal
- IBM T. J. Watson
This work was funded in part by
2Outline of The Presentation
- Data Stream Classification
3Introduction
- Characteristics of Data streams are
Network traffic
Sensor data
Call center records
4Data Stream Classification
- Uses past labeled data to build classification
model - Predicts the labels of future instances using the
model - Helps decision making
5Data Stream Classification (cont..)
- What are the applications?
- Security Monitoring
- Network monitoring and traffic engineering.
- Business credit card transaction flows.
- Telecommunication calling records.
- Web logs and web page click streams.
6Challenges
- Infinite length
- Concept-drift
- Concept-evolution
- Feature Evolution
7Infinite Length
- Impractical to store and use all historical data
- Requires infinite storage
- And running time
8Concept-Drift
A data chunk
Negative instance
Instances victim of concept-drift
Positive instance
9Concept-Evolution
y
- - - - - -
- - - - - - - - - - -
D
y1
C
A
-
- - - - - - - -
- - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - -
- - - - - - - -- - - - -
y2
B
x1
x
Classification rules R1. if (x gt x1 and y lt y2)
or (x lt x1 and y lt y1) then class R2. if (x gt
x1 and y gt y2) or (x lt x1 and y gt y1) then class
-
Existing classification models misclassify novel
class instances
10Dynamic Features
- Why new features evolving
- Infinite data stream
- Normally, global feature set is unknown
- New features may appear
- Concept drift
- As concept drifting, new features may appear
- Concept evolution
- New type of class normally holds new set of
features
Different chunks may have different feature sets
11Dynamic Features
ith chunk and i 1st chunk and models have
different feature sets
runway, climb
runway, clear, ramp
Feature Set
ith chunk
runway, ground, ramp
Current model
Feature Space Conversion
Classification Novel Class Detection
Training New Model
Existing classification models need complete
fixed features and apply to all the chunks.
Global features are difficult to predict. One
solution is using all English words and generate
vector. Dimension of the vector will be too high.
12Outline of The Presentation
- Data Stream Classification
13DataStream Classification (cont..)
- Single Model Incremental Classification
- Ensemble model based classification
- Supervised
- Semi-supervised
- Active learning
14Overview
- Single Model Incremental Classification
- Ensemble model based classification
- Data Selection
- Semi-supervised
- Skewed Data
I
15Ensemble of Classifiers
C1
C2
x,?
C3
-
input
Individual outputs
voting
Ensemble output
Classifier
16Ensemble Classification of Data Streams
- Divide the data stream into equal sized chunks
- Train a classifier from each data chunk
- Keep the best L such classifier-ensemble
- Example for L 3
Note Di may contain data points from different
classes
D4
D5
D6
Labeled chunk
Data chunks
Unlabeled chunk
Addresses infinite length and concept-drift
C4
C5
Classifiers
C1
C2
C3
C4
C5
Ensemble
17Concept-Evolution Problem
ECSMiner
- A completely new class of data arrives in the
stream
y
- - - - - -
- - - - - - - - - - -
xltx1
D
y1
F
T
C
ylty2
ylty1
-
- - - - - - - -
- - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - -
- - - - - - - -- - - - -
F
F
T
T
-
-
y2
C
B
D
B
x1
x
(a)
(b)
(a) A decision tree, (b) corresponding feature
space partitioning
18ECSMiner Overview
ECSMiner
Data Stream
Older instances (labeled)
Newer instances (unlabeled)
Last labeled chunk
Buffer?
Ensemble of L models
ML
. . .
M1
M2
Overview of ECSMiner algorithm
Based on Mohammad M. Masud, Jing Gao, Latifur
Khan, Jiawei Han, and Bhavani Thuraisingham.
Integrating Novel Class Detection with
Classification for Concept-Drifting Data
Streams. In Proceedings of 2009 European Conf.
on Machine Learning and Principles and Practice
of Knowledge Discovery in Databases
(ECML/PKDD09), Bled, Slovenia, 7-11 Sept, 2009,
pp 79-94 (extended version appeared in IEEE
Transaction on Knowledge and Data Engineering
(TKDE)).
19Algorithm
ECSMiner
20Novel Class Detection
ECSMiner
- Non parametric
- does not assume any underlying model of existing
classes - Steps
- Creating and saving decision boundary during
training - Detecting and filtering outliers
- Measuring cohesion and separation among test and
training instances
21Training Creating Decision Boundary
ECSMiner
Raw training data
Clusters are created
y
- - - - -
- - -
- - - - - - - -
D
y1
C
A
-
- - - - - - - - - - -
- - - - - - - - - - - -
- - - - - - - - - - - -
- - - - - - - - - - - -
y2
B
x1
x
Addresses Infinite length problem
22Outlier Detection and Filtering
ECSMiner
Test instance inside decision boundary (not
outlier)
Test instance outside decision boundary Raw
outlier or Routlier
y
x
D
y1
C
A
Routlier
Routlier
Routlier
x
X is an existing class instance
AND
False
y2
True
B
X is a filtered outlier (Foutlier) (potential
novel class instance)
x1
x
Routliers may appear as a result of novel class,
concept-drift, or noise. Therefore, they are
filtered to reduce noise as much as possible.
23Novel Class Detection
ECSMiner
q-NSCgt0 for qgtq Foutliers with all models?
(Step 1)
(Step 4)
Routlier
N
Treat as existing class
Routlier
Routlier
X is an existing class instance
AND
(Step 2)
False
True
X is a filtered outlier (Foutlier) (potential
novel class instance)
Compute q-NSC with all models and other Foutliers
Y
Novel class found
(Step 3)
24Computing Cohesion Separation
ECSMiner
? o,5(x)
a(x)
x
?-,5(x)
?,5(x)
b(x)
b-(x)
- a(x) mean distance from an Foutlier x to the
instances in ?o,q(x) - bmin(x) minimum among all bc(x) (e.g. b(x) in
figure) - q-Neighborhood Silhouette Coefficient (q-NSC)
- If q-NSC(x) is positive, it means x is closer to
Foutliers than any other class.
25Speeding Up
- Computing N-NSC for every Foutlier instance x
takes quadratic time in the number of Foutliers. - In order to make the computation faster,
- We create Ko pseudopoints (Fpseudopoints) from
Foutliers using K-means clustering, - where Ko (No/S) K. Here S is the chunk size
and No is the number of Foutliers. - perform the computations on the Fpseudopoints
- Thus, the time complexity
- to compute the N-NSC of all of the Fpseudopoints
is O(Ko(KoK)) - which is constant, since both Ko and K are
independent of the input size. - However, by gaining speed we lose some precision,
although the loss is negligible (to be analyzed
shortly)
26Algorithm To Detect Novel Class
ECSMiner
27Speedup Penalty
- As discussed earlier
- by speeding up computation in step 3, we lose
some precision since the result deviates from
exact result - This analysis shows that the deviation is
negligible
(x-?i)2
?i
?i
x
(?i-?j)2
?j
(x-?j)2
?j
Figure 6. Illustrating the computation of
deviation. ?i is an Fpseudopoint, i,e., a cluster
of Foutliers, and ?j is an existing class
Pseudopoint, i.e., a cluster of existing class
instances. In this particular example, all
instances in i belong to a novel class.
28Speedup Penalty
Approximate
Exact
Deviation
29Experiments - Datasets
- We evaluated our approach on two synthetic and
two real datasets - SynC Synthetic data with only concept-drift.
Generated using hyperplane equation. 2 classes,
10 attributes, 250K instances - SynCN Synthetic data with concept-drift and
novel class. Generated using Gaussian
distribution. 20 classes, 40 attributes, 400K
instances - KDD cup 1999 intrusion detection (10 version)
real dataset. 23 classes, 34 attributes, 490K
instances - Forest cover real dataset. 7 classes, 54
attributes, 581K instances
30Experiments - Setup
- Development
- Language Java
- H/W
- Intel P-IV with
- 2GB memory and
- 3GHz dual processor CPU.
- Parameter settings
- K (number of pseudopoints per chunk) 50
- N (minimum number of instances required to
declare novel class) 50 - M (ensemble size) 6
- S (chunk size) 2,000
31Experiments - Baseline
- Competing approaches
- i) MineClass (MC) our approach
- ii) WCE-OLINDDA_Parallel (W-OP)
- iii) WCE-OLINDDA_Single (W-OS) Where WCE-OLINDDA
is a combination of the Weighted Classifier
Ensemble (WCE) and novel class detector OLINDDA,
with default parameter settings for WCE and
OLINDDA - We use this combination since to the best of our
knowledge there is no approach that Can classify
and detect novel classes simultaneously - OLINDDA assumes there is only one normal class,
and all other classes are novel - Therefore, we apply two variations
- W-OP keeps parallel OLINDDA models, one for each
class - W-OS keeps a single model that absorbs a novel
class when encountered
32Experiments - Results
- Evaluation metrics
- Mnew of novel class instances Misclassified
as existing class
Fn100/Nc - Fnew of existing class instances Falsely
identified as novel class
Fp100/ (N-Nc) - ERR Total misclassification error ()(including
Mnew and Fnew) (FpFnFe)100/N - where Fn total novel class instances
misclassified as existing class, - Fp total existing class instances misclassified
as novel class, - Fe total existing class instances misclassified
(other than Fp), - Nc total novel class instances in the stream,
- N total instances the stream.
33Experiments - Results
Forest Cover
KDD cup
SynCN
34Experiments - Results
35Experiments Parameter Sensitivity
36Experiments Runtime
37Dynamic Features
- Solution
- Global Features
- Local Features
- Union
- Mohammad Masud, Qing Chen, Latifur Khan, Jing
Gao, Jiawei Han, and Bhavani Thuraisingham,
Classification and Novel Class Detection of Data
Streams in A Dynamic Feature Space, in Proc. of
Machine Learning and Knowledge Discovery in
Databases, European Conference, ECML PKDD 2010,
Barcelona, Spain, Sept 2010, Springer, Page
337-352
38Feature Mapping Across Models and Test Data
Points
- Feature set varies in different chunks.
Especially, when new class appears, new features
should be selected and added to the feature set. - Strategy 1 Lossy fixed (Lossy-F) conversion /
Global - Use the same fixed feature in the entire stream.
- We call this a lossy conversion because future
model and instances may lose important features
due to this mapping. - Strategy 2 Lossy local (Lossy-L) conversion /
Local - We call this lossy conversion because it may loss
feature values during mapping. - Strategy 3 Dimension preserving (D-Preserving)
Mapping / Union
39Feature Space Conversion Lossy-L Mapping (Local)
- Assume that each data chunk has different feature
vectors - When a classification model is trained, we save
the feature vector with the model - When an instance is tested, its feature vector is
mapped (i.e., projected) to the models feature
vector.
40Feature Space Conversion Lossy-L Mapping
- For example,
- Suppose the model has two features (x,y)
- The instance has two features (y,z)
- When testing, assume the instance has two
features (x,y) - Where x 0, and y value is kept as it is
41Conversion Strategy II Lossy-L Mapping
42Conversion Strategy III D-Preserving Mapping
- When an instance is tested, both the models
feature vector and the instances feature vector
are mapped (i.e., projected) to the union of
their feature vectors. - The feature dimension is increased.
- In the mapping, both the features in the testing
instance and model are preserved. The extra
features are filled with all 0s.
43Conversion Strategy III D-Preserving Mapping
- For example,
- suppose the model has three features (a,b,c)
- The instance has four features (b,c,d,e)
- When testing, we project both the models feature
vector and the instances feature vector to
(a,b,c,d,e) - Therefore, in the model, d, and e will be
considered 0s and in the instance, a will be
considered 0
44Conversion Strategy III D-Preserving Mapping
45Discussion
- Local does not favor novel class, it favors
existing classes. - Local features will be enough to model existing
classes. - Union favors novel class.
- New features may be discriminating for novel
class, hence Union works.
46Comparison
- Which strategy is the better?
- Assumption lossless conversion (union) preserves
the properties of a novel class. - In other words, if an instance belongs to a novel
class, it remains outside the decision boundary
of any model Mi of the ensemble M in the
converted feature space. Lemma - If a test point x belongs to a novel class, it
will be miss-classified by the ensemble M as an
existing class instance under certain conditions
when the Lossy-L conversion is used.
47Comparison
- Proof
- Let X1,,XL,XL1,,XM be the dimensions of the
model and - Let X1,,XL,XM1,,XN be the dimensions of the
test point - Suppose the radius of the closest cluster (in the
higher dimension) is R - Also, let the test point be a novel class
instance. - Combined feature space X1,,XL,XL1,,XM,XM1,,
XN
48Comparison
- Proof (continued)
- Combined feature space X1,,XL,XL1,,XM,XM1,,
XN - Centroid of the cluster (original space)
X1x1,,XLxL,XL1xL1,,XMxM i.e., x1,,xL,
xL1,,xM - Centroid of the cluster (combined space)
x1,,xL, xL1,,xM , 0,,0 - Test point (original space)
- X1x1,,XLxL,XM1xM1,,XNxN i.e.,
x1,,xL, xM1,,xN - Test point (combined space) x1,,xL,
0,,0, xM1,,xN
49Comparison
- Proof (continued)
- Centroid (combined spc) x1,,xL, xL1,,xM
, 0 ,, 0 - Test point (combined space) x1,,xL, 0,,
0, xM1,,xN - R2lt ((x1 x1)2,, (xL xL)2 x2L1x2M)
(x2M1x2N) - R2lt a2
b2 - R2 a2 b2 - e2 (e2 gt0)
- a2 R2 (e2 b2)
- a2 lt R2 (provided that e2 lt b2)
- Therefore, in Lossy-L conversion, the test point
will not be an outlier
50Baseline Approaches
- WCE is Weighted Classifier Ensemble1, which
addresses multi-class ensemble classifier. - OLINDDA is a novel class detector 2 works only
for binary class. - FAE algorithm is an ensemble classifier that
addresses feature evolution3 and concept drift. - ECSMiner is a multi-class ensemble classifier
that addresses concept drift and concept
evolution4.
51Approaches Comparison
Proposed techniques Challenges Challenges Challenges Challenges
Proposed techniques Infinite length Concept-drift Concept-evolution Dynamic Features
OLINDDA
WCE
FAE
ECSMiner
DXMiner
52Experiments Datasets
- We evaluated our approach on different datasets
Data Set Concept Drift Concept Evolution Dynamic Feature of Instance of Class
KDD 492K 7
Forest Cover 387K 7
NASA 140K 21
Twitter 335K 21
53Experiments Results
- Evaluation metrics let
- Fn total novel class instances misclassified as
existing class, - Fp total existing class instances misclassified
as novel class, - Fe total existing class instances misclassified
(other than Fp), - Nc total novel class instances in the stream,
- N total instances the stream
54Experiments Results
- We use the following performance metrics to
evaluate our technique - Mnew of novel class instances Misclassified
as existing class, i.e, - Fnew of existing class instances Falsely
identified as novel class, i.e., - ERR Total misclassification error ()(including
Mnew and Fnew), i.e.,
55Experiments Setup
- Development
- Language Java
- H/W
- Intel P-IV with
- 3GB memory and
- 3GHz dual processor CPU.
- Parameter settings
- K (number of pseudo points per chunk) 50
- q (minimum number of instances required to
declare novel class) 50 - L (ensemble size) 6
- S (chunk size) 1,000
56Experiments Baseline
- Competing approaches
- i) DXMiner (DXM) our approach- 4 variations
- Lossy-F conversion
- Lossy-L conversion
- D-Preserving conversion
- ii) FAE-WCE-OLINDDA_Parallel (W-OP)
- Assumes there is only one normal class, and all
other classes are novel . W-OP keeps parallel
OLINDDA models, one for each class - We use this combination since to the best of our
knowledge there is no approach that can classify
and detect novel classes simultaneously with
feature evolution. - iii) FAE-ECSMiner
57Twitter Results
58Twitter Results
D-preserving Lossy -Local Lossy-Global O-F
AUC 0.88 0.83 0.76 0.56
59NASA Dataset
Deviation Info Gain O-F
AUC 0.996 0.967 0.876
60Forest Cover Results
61Forest Cover Results
D-preserving O-F
AUC 0.97 0.74
62KDD Results
63KDD Results
D-preserving FAE-Olindda
AUC 0.98 0.96
64Summary Results
65 Novel Class Detection Failures
Proposed Methods
- False Positive
- An existing class instance is misclassified as a
novel class instance - False Negative
- A novel class instance is misclassified as an
existing class instance - Novel Class detection model is a two step
process - Step 1 Indentify the instances that are outside
of existing model clusters and buffered them as
outliers - Step 2 Analysis the buffered outlier instances
and calculate outliers cohesion and separation to
existing model clusters, and then use the model
clusters to vote for novel class decision.
Failures may occur in both steps
66Novel Class Detection Failures
Proposed Methods
- Proposed solutions
- In step 1, an existing class instance may occur
outside of model cluster due to data noise. - We need to select a proper model cluster range,
to reduce the existing class instance in outliers
dynamic OUTTH. - In step 2, outliers may not be novel class
instances. - Data noise and concept drift may cause existing
class instance to be outliers - Build statistic model to filter out noisy and
concept drift data from the outliers.
67Improved Outlier Detection and Multiple Novel
Class Detection
Proposed Methods
- Challenges
- High false positive (FP) (existing classes
detected as novel) and false negative (FN)
(missed novel classes) rates - Two or more novel classes arrive at a time
- Solutions1
- Dynamic decision boundary based on previous
mistakes - Inflate the decision boundary if high FP, deflate
if high FN - Build statistical model to filter out noise data
and concept drift from the outliers. - Multiple novel classes are detected by
- Constructing a graph where outlier cluster is a
vertex - Merging the vertices based on silhouette
coefficient - Counting the number of connected components in
the resultant (i.e., merged) graph
1 Mohammad M. Masud, Qing Chen, Jing Gao, Latifur
Khan, Charu Aggarwal, Jiawei Han, and Bhavani
Thuraisingham, Addressing Concept-Evolution in
Concept-Drifting Data Streams, In Proc ICDM 10,
Sydney, Australia, Dec 14-17, 2010.
68Outlier Threshold (OUTTH)
Proposed Methods
- To declare a testing instance being an outlier,
using cluster radius r is not enough because of
the data noise
- So, beyond the radius r, a threshold (OUTTH) will
be setup, so that most noisy data around model
cluster will be classified immediately
? o,5(x)
a(x)
x
b(x)
?,5(x)
69Outlier Threshold (OUTTH)
Proposed Methods
- Every instance outside the cluster range has a
weight - If wt(x) gt OUTTH, this instance will be consider
as existing class. - If wt(x) lt OUTTH, this instance will be an
outlier. - Pros
- Noisy data will be classified immediately
- Cons
- OUTTH is hard to be determined
- Noisy data and novel class instance may occur
simultaneously - Different dataset may have different OUTTH
70Outlier Threshold (OUTTH)
Proposed Methods
? o,5(x)
a(x)
x
b(x)
?,5(x)
OUTTH ?
- If threshold is too high, noisy data may become
outlier - FP rate will go up
- If threshold is too low, novel class instance
will be labeled as existing class - FN rate will go up
We need to balance on these two
71- Data Stream Classification
- Finer Grain Novel Class Detection
- Dynamic Novel Class Detection
- Multiple Novel Class Detection
72Dynamic threshold setting
Proposed Methods
a(x)
Marginal FN
x
Marginal FP
- Defer approach
- After a testing chunk has been labeled, based on
the marginal FP and FN rate of the this testing
chunk update the OUTTH, and then apply the new
OUTTH to the next testing chunk - Eager approach
- What is marginal FP or marginal FN
- Once a marginal FP or marginal FN instance
detected, update OUTTH with step function, and
apply the updated OUTTH to the next testing
instance
73Dynamic threshold setting
Proposed Methods
74Defer approach and Eager approach comparison
Proposed Methods
- In Defer approach, OUTTH updates after a data
chunk is labeled - Too late In the testing chunk, many marginal FP
or FN may occur due to an improper OUTTH
threshold - Overreact If there are many marginal FP or FN
instances in the labeled testing chunk, the OUTTH
update may overreact for the next testing chunk - In Eager approach, OUTTH updates aggressively
whenever marginal FP or FN happens. - The model is more tolerate to noisy data and
concept drift. - The model is more sensitive to novel class
instances.
75Outliers Statistics
Proposed Methods
- For each outlier instance, we calculate the
novelty probability Pnov - If Pnov is large (close to 1), indicates that the
outlier has a high probability of being a novel
instance. - Pnov contains two parts
- The first part measures how far the outlier being
away from the model cluster - The second part Psc is the Silhouette
Coefficient, measures the cohesion and
separation to the model cluster of the
q-Neighbors of the outlier
76Outliers Statistics
Proposed Methods
Three scenarios may occur simultaneously
77Outlier Statistics Gini Analysis
Proposed Methods
- The Gini coefficient is a measure of statistical
inequality. The discrete Gini coefficient is - If we divide 01 into n equal size bin, and put
all outlier Pnov into corresponding bin, then we
can get cdf yi - If all Pnov is very low, to an extreme cdf yi
1 - If all Pnov are very high, to an extreme cdf yi
0 except yn1
78Outlier Statistics Gini Analysis
Proposed Methods
- If all outlier Pnov distribute evenly, yi i/n
79Outlier Statistics Gini Analysis Limitation
Proposed Methods
- To an extreme, it is impossible the differentiate
concept drift and concept evolution by Gini
coefficient, when concept drift is just looks
like concept evolution.
80- Data Stream Classification
- Finer Grain Novel Class Detection
- Dynamic Novel Class Detection
- Multiple Novel Class Detection
81Multi Novel Class Detection
Proposed Methods
Positive Instance
Data Stream
Novel class A
Negative Instance
Novel class B
Novel Instance
y
y1
y2
y2
y2
x1
x
x1
x
If we always assume novel instances belong to one
novel type, one type of novel instances, either A
or B, will be misclassified.
82Multi Novel Class Detection
Proposed Methods
- The main idea in detecting multiple novel classes
is to construct a graph, and identify the
connected components in the graph. - The number of connected components determines the
number of novel classes.
83Multi Novel Class Detection
Proposed Methods
- Two Phases
- Building the connected graph
- Build directed nearest neighbor graph. From each
vertex (outlier cluster), add edge from this
vertex to its nearest neighbor. - Silhouette coefficient from the vertex to its
nearest neighbor is larger than some threshold,
the edge will be removed. - Problem Linkage Circle
- Component merging phase
- Gaussian distribution centric decision
84Multi Novel Class Detection
Proposed Methods
- Component merging phase
- In probability theory, the normal (or Gaussian)
distribution, is a continuous probability
distribution that is often used as a first
approximation to describe real-valued random
variables that tend to cluster around a single
mean value 1 - If two Gaussian Distribution variables (g1, g2)
can be separated, the following condition will be
hold -
- Since µ is proportion to s, if the two
variables (components) will remain separated
otherwise, these two components will be merged. -
- Amari Shunichi, Nagaoka Hiroshi. Methods of
information geometry. Oxford University Press.
ISBN 0-8218-0531-2, 2000.
85Experiments Datasets
Experiment Results
- We evaluated our approach on different datasets
Data Set Concept Drift Concept Evolution Dynamic Feature of Instance of Class
KDD 492K 7
Forest Cover 387K 7
NASA 140K 21
Twitter 335K 21
SynED 400K 20
86Experiments Setup
Experiment Results
- Development
- Language Java
- H/W
- Intel P-IV with
- 3GB memory and
- 3GHz dual processor CPU.
- Parameter settings
- K (number of pseudo points per chunk) 50
- q (minimum number of instances required to
declare novel class) 50 - L (ensemble size) 6
- S (chunk size) 1,000
87Experiments Baseline
- Competing approaches
- i) DEMminer our approach- 5 variations
- Lossy-F conversion
- Lossy-L conversion
- Lossless conversion - DEMminer
- Dynamic OUTTH Lossless conversion -
DEMminer-Ex (without Gini) - Dynamic OUTTH Gini Lossless conversion -
DEMminer-Ex - ii) WCE-OLINDDA (O-W)
- iii) FAE-WCE-OLINDDA_Parallel (O-F)
- We use this combination since to the best of our
knowledge there is no approach that can classify
and detect novel classes simultaneously with
feature evolution.
88Experiments Results
Experiment Results
- Evaluation metrics
- Fn total novel class instances misclassified as
existing class, - Fp total existing class instances misclassified
as novel class, - Fe total existing class instances misclassified
(other than Fp), - Nc total novel class instances in the stream,
- N total instances the stream
89Twitter Results
Experiment Results
90Twitter Results
Experiment Results
DEMminer Lossy -L Lossy-F O-F
AUC 0.88 0.83 0.76 0.56
91Twitter Results
Experiment Results
92Twitter Results
Experiment Results
DEMminer-Ex DEMminer OW
AUC 0.94 0.88 0.56
93Forest Cover Results
Experiment Results
94Forest Cover Results
Experiment Results
DEMminer DEMminer-Ex (without Gini) DEMminer-Ex OW
AUC 0.97 0.99 0.97 0.74
95NASA Dataset
Experiment Results
96NASA Dataset
Experiment Results
Deviation Info Gain FAE
AUC 0.996 0.967 0.876
97KDD Results
Experiment Results
98KDD Results
Experiment Results
DEMminer O-F
AUC 0.98 0.96
99Result Summary
Experiment Results
Dataset Method ERR Mnew Fnew AUC FP FN
Twitter DEMminer Lossy-F Lossy-L O-F 4.2 30.5 0.8 32.5 0.0 32.6 1.6 82.0 0.0 3.4 96.7 1.6 0.877 0.834 0.764 0.557 - - - - - - - -
ASRS DEMminer DEMminer(info-gain) O-F 0.02 - - 1.4 - - 3.4 - - 0.996 0.967 0.876 0.00 0.1 0.04 10.3 0.00 24.7
Forest Cover DEMminer O-F 3.6 8.4 1.3 5.9 20.6 1.1 0.973 0.743 - - - -
KDD DEMminer O-F 1.2 5.9 0.9 4.7 9.6 4.4 0.986 0.967 - - - -
100Result Summary
Experiment Results
Dataset Method ERR Mnew Fnew AUC
Twitter DEMminer DEMminer-Ex OW 4.2 30.5 0.8 1.8 0.7 0.6 3.4 96.7 1.6 0.877 0.944 0.557
Forest Cover DEMminer DEMminer-Ex OW 3.6 8.4 1.3 3.1 4.0 0.68 5.9 20.6 1.1 0.974 0.990 0.743
101Running Time Comparison
Experiment Results
Dataset Time(sec)1/K Time(sec)1/K Time(sec)1/K Points/sec Points/sec Points/sec Speed gain
Dataset DEMminer Lossy-F O-F DEMminer Lossy-F O-F DEMminer over O-F
Twitter 23 3.5 66.7 43 289 15 2.9
ASRS 21 4.3 38.5 47 233 26 1.8
Forest Cover 1.0 1.0 4.7 967 1003 212 4.7
KDD 1.2 1.2 3.3 858 812 334 2.5
102Multi Novel Detection Results
Experiment Results
103Multi Novel Detection Results
Experiment Results
104Conclusion
Experiment Results
- Our data stream classification technique
addresses - Infinite length
- Concept-drift
- Concept-evolution
- Feature-evolution
- Existing approaches only address first two issues
- Applicable to many domains such as
- Intrusion/malware detection
- Text categorization
- Fault detection etc.
105References
- J. Gehrke, V. Ganti, R. Ramakrishnan, and W. Loh.
BOAT-Optimistic Decision Tree Construction. In
Proc. SIGMOD, 1999. - P. Domingos and G. Hulten, Mining high-speed
data streams. In Proc. SIGKDD, pages 71-80,
2000. - Wenerstrom, B., Giraud-Carrier, C., Temporal
data mining in dynamic feature spaces. In
Perner, P. (ed.) ICDM 2006. LNCS (LNAI), vol.
4065, pp. 1141.1145. Springer, Heidelberg (2006) - E. J. Spinosa, A. P. de Leon F. de Carvalho, and
J. Gama. Cluster-based novel concept detection
in data streams applied to intrusion detection in
computer networks. In Proc. 2008 ACM symposium
on Applied computing, pages 976980, (2008). - M. Scholz and R. Klinkenberg. An ensemble
classifier for drifting concepts. In Proc.
ICML/PKDD Workshop in Knowledge Discovery in Data
Streams., 2005.
106References (contd.)
- Brutlag, J.(2000). Aberrant behavior detection
in time series for network monitoring. In Proc.
Usenix Fourteenth System Admin. Conf. LISA XIV,
New Orleans, LA. (Dec 2000) - Eskin, E., Arnold, A., Prerau, M., Portnoy, L.,
Stolfo, S. A geometric framework for
unsupervised anomaly detection Detection
intrusions in unlabeled data. Applications of
Data Mining in Computer Security, Kluwer (2002). - Fan, W. Systematic data selection to mine
concept-drifting data streams. In Proc. KDD 04 - Gao, J, Wei Fan, and Jiawei Han. (2007a). "On
Appropriate Assumptions to Mine Data Streams - Gao, J. Wei Fan, Jiawei Han, Philip S. Yu.
(2007b). A General Framework for Mining
Concept-Drifting Data Streams with Skewed
Distributions. SDM 2007 - Goebel, J. and T. Holz. Rishi Identify bot
contaminated hosts by irc nickname evaluation. In
Usenix/Hotbots 07 Workshop, 2007. - Grizzard, J. B., V. Sharma, C. Nunnery, B. B.
Kang, and D. Dagon (2007). Peer-to-peer botnets
Overview and case study. In Usenix/Hotbots 07
Workshop.
107References (contd.)
- Keogh Pazzani, (2000) E.J., J., P.M. Scaling
up dynamic time warping for data mining
applications. In ACM SIGKDD. (2000) - Lemos, R. (2006) Bot software looks to improve
peerage. SecurityFocus. http//www.securityfocus.c
om/news/11390 (2006). - Livadas, C., B.Walsh, D. Lapsley, and T. Strayer
(2006) Using machine learning techniques to
identify botnet traffic. In 2nd IEEE LCN
Workshop on Network Security (WoNS2006),
November 2006. - LURHQ Threat Intelligence Group (2004). Sinit p2p
trojan analysis. http//www.lurhq.com/sinit.html
(2004) - Rajab, M. A. J. Zarfoss, F. Monrose, and A.
Terzis (2006) A multifaceted approach to
understanding the botnet phenomenon. In
Proceedings of the 6th ACM SIGCOMM on Internet
Measurement Conference (IMC), 2006. - Kagan Tumar and Joydeep ghosh (1996).Error
correlation and error reduction in ensemble
classifiers (Connection sciece), 8(3-4)385-403
108References (contd.)
- Mohammad Masud, Jing Gao, Latifur Khan, Jiawei
Han, and Bhavani Thuraisingham, A
Multi-Partition Multi-Chunk Ensemble Technique to
Classify Concept-Drifting Data Streams. In
Proc, of 13th Pacific-Asia Conference on
Knowledge Discovery and Data Mining (PAKDD-09),
Page 363-375, Bangkok, Thailand, April 2009. - Mohammad Masud, Jing Gao, Latifur Khan, Jiawei
Han, and Bhavani Thuraisingham, A Practical
Approach to Classify Evolving Data Streams
Training with Limited Amount of Labeled Data. In
Proc. of 2008 IEEE International Conference on
Data Mining (ICDM 2008), Pisa, Italy, Page
929-934, December, 2008. - Clay Woolam, Mohammed Masud, and Latifur Khan ,
Lacking Labels In The Stream Classifying
Evolving Stream Data With Few Labels. In Proc.
of 18th International Symposium on Methodologies
for Intelligent Systems (ISMIS), Page 552-562,
September 2009 Prague, Czech Republic
109References (contd.)
- Mohammad Masud, Qing Chen, Latifur Khan, Charu
Aggarwal, Jing Gao, Jiawei Han, and Bhavani
Thuraisingham, Addressing Concept-Evolution in
Concept-Drifting Data Streams. In Proc. of 2010
10th IEEE International Conference on Data Mining
(ICDM 2010), Sydney, Australia, Dec 2010. - Mohammad M. Masud, Qing Chen, Jing Gao, Latifur
Khan, Jiawei Han, Bhavani Thuraisingham ,
Classification and Novel Class Detection of Data
Streams in a Dynamic Feature Space. In Proc. of
European Conference on Machine Learning and
Knowledge Discovery in Databases, ECML PKDD 2010,
Barcelona, Spain, September 20- 24, 2010,
Springer 2010, ISBN 978-3-642-15882-7, Page
337-352. - Mohammad M. Masud, Jing Gao, Latifur Khan, Jiawei
Han, and Bhavani Thuraisingham, Classification
and Novel Class Detection in Data Streams with
Active Mining. In Proc of 14th Pacific-Asia
Conference on Knowledge Discovery and Data
Mining, 21-24 June, 2010, Page 311-324, -
Hyderabad, India.
110References (contd.)
- Mohammad M. Masud, Jing Gao, Latifur Khan, Jiawei
Han, and Bhavani Thuraisingham, Classification
and Novel Class Detection in Concept-Drifting
Data Streams under Time Constraints" , IEEE
Transactions on Knowledge Data Engineering
(TKDE), 2011, IEEE Computer Society, June 2011,
Vol. 23, No. 6, Page 859-874. - Charu C. Aggarwal, Jiawei Han, Jianyong Wang,
Philip S. Yu, A Framework for Clustering
Evolving Data streams Published in Proceedings
VLDB 03 proceedings of the 29th international
conference on Very Large Data Bases-Volume 29 - H. Wang, W. Fan, P. S. Yu, and J. Han. Mining
concept-drifting data streams using ensemble
classifiers. In Proc. ninth ACM SIGKDD
international conference on Knowledge discovery
and data mining, pages 226235, Washington, DC,
USA, Aug, 2003. ACM. - Mohammad M. Masud, Jing Gao, Latifur Khan, Jiawei
Han, and Bhavani Thuraisingham. Integrating
Novel Class Detection with Classification for
Concept-Drifting Data Streams. In Proceedings of
2009 European Conf. on Machine Learning and
Principles and Practice of Knowledge Discovery in
Databases (ECML/PKDD09), Bled, Slovenia, 7-11
Sept, 2009. -
111