CSE%20881:%20Data%20Mining - PowerPoint PPT Presentation

About This Presentation

Title:

CSE%20881:%20Data%20Mining

Description:

CSE 881: Data Mining Lecture 22: Anomaly Detection – PowerPoint PPT presentation

Number of Views:214

Avg rating:3.0/5.0

Slides: 46

Provided by: Comput665

Learn more at: http://www.cse.msu.edu

Category:

more less

Transcript and Presenter's Notes

Title: CSE%20881:%20Data%20Mining

1
CSE 881 Data Mining

Lecture 22 Anomaly Detection

2
Anomaly/Outlier Detection

What are anomalies/outliers?
Data points whose characteristics are
considerably different than the remainder of the
data
Applications
Credit card fraud detection
telecommunication fraud detection
network intrusion detection
fault detection

3
Examples of Anomalies

Data from different classes
An object may be different from other objects
because it is of a different type or class
Natural (random) variation in data
Many data sets can be modeled by statistical
distributions (e.g., Gaussian distribution)
Probability of an object decreases rapidly as its
distance from the center of the distribution
increases
Chebyshev inequality
Data measurement or collection errors

4
Importance of Anomaly Detection

Ozone Depletion History
In 1985 three researchers (Farman, Gardinar and
Shanklin) were puzzled by data gathered by the
British Antarctic Survey showing that ozone
levels for Antarctica had dropped 10 below
normal levels
Why did the Nimbus 7 satellite, which had
instruments aboard for recording ozone levels,
not record similarly low ozone concentrations?
The ozone concentrations recorded by the
satellite were so low they were being treated as
outliers by a computer program and discarded!

Sources http//exploringdata.cqu.edu.au/ozon
e.html http//www.epa.gov/ozone/science/hole
/size.html
5
Anomalies

General characteristics
Rare occurrence
Deviant behavior compared to the majority of the
data
Distribution
Natural variation
uniform distribution
Data from different classes
distribution may be clustered

6
Anomaly Detection

Challenges
Method is (mostly) unsupervised
Validation can be quite challenging (just like
for clustering)
Small number of anomalies
Finding needle in a haystack

7
Anomaly Detection Schemes

General Steps
Build a profile of the normal behavior
Profile can be patterns or summary statistics for
the normal population
Use the normal profile to detect anomalies
Anomalies are observations whose
characteristicsdiffer significantly from the
normal profile
Types of anomaly detection schemes
Graphical Statistical-based
Distance-based

8
Graphical Approaches

Boxplot (1-D), Scatter plot (2-D), Spin plot
(3-D)
Limitations
Time consuming
Subjective

9
Convex Hull Method

Extreme points are assumed to be outliers
Use convex hull method to detect extreme values
What if the outlier occurs in the middle of the
data?

10
Statistical Approaches

Assume a parametric model describing the
distribution of the data (e.g., normal
distribution)
Apply a statistical test that depends on
Data distribution
Parameter of distribution (e.g., mean, variance)
Number of expected outliers (confidence limit)

11
Grubbs Test

Detect outliers in univariate data
Assume data comes from normal distribution
Detects one outlier at a time, remove the
outlier, and repeat
H0 There is no outlier in data
HA There is at least one outlier
Grubbs test statistic
Reject H0 if

12
Statistical-based Likelihood Approach

Assume the data set D consists of samples from a
mixture of two probability distributions
M (majority distribution)
A (anomalous distribution)
General Approach
Initially, assume all the data points belong to M
Let Lt(D) be the log likelihood of D
Choose a point xt that belongs to M and move it
to A
Let Lt1 (D) be the new log likelihood.
Compute the difference, ? Lt(D) Lt1 (D)
If ? gt c (some threshold), then xt is declared
an anomaly and is moved permanently from M to A

13
Statistical-based Likelihood Approach

Data distribution, D (1 ?) M ? A
M is a probability distribution estimated from
data
Can be based on any modeling method (naïve Bayes,
maximum entropy, etc)
A is often assumed to be uniform distribution
Likelihood at time t

14
Limitations of Statistical Approaches

Most of the tests are for a single attribute
In many cases, the data distribution may not be
known
For high dimensional data, it may be difficult to
estimate the true distribution

15
Distance-based Approaches

Data is represented as a vector of features
Three approaches
Nearest-neighbor based
Density based
Clustering based

16
Nearest-Neighbor Based Approach

Approach
Compute the distance between every pair of data
points
There are various ways to define outliers
Data points with fewer than p points within a
neighborhood of radius D
Data points whose distance to the kth nearest
neighbor is among the highest
Data points whose average distance to the k
nearest neighbors is among the highest

17
Outliers in Lower Dimensional Projection

In high-dimensional space, data is sparse and
notion of proximity becomes meaningless
Every point is an almost equally good outlier
from the perspective of proximity-based
definitions
Lower-dimensional projection methods
A point is an outlier if in some lower
dimensional projection, it is present in a local
region of abnormally low density

18
Outliers in Lower Dimensional Projection

Divide each attribute into ? equal-depth
intervals
Each interval contains a fraction f 1/? of the
records
Consider a k-dimensional cube created by picking
grid ranges from k different dimensions
If attributes are independent, we expect region
to contain a fraction fk of the records
If there are N points, we can measure sparsity of
a cube D as
Negative sparsity indicates cube contains smaller
number of points than expected

19
Example

N100, ? 5, f 1/5 0.2, N ? f2 4

20
Density-based LOF approach

For each point, compute the density of its local
neighborhood
Compute local outlier factor (LOF) of a sample p
as the average of the ratios of the density of
sample p and the density of its nearest neighbors
Outliers are points with largest LOF value

In the NN approach, p2 is not considered as
outlier, while LOF approach find both p1 and p2
as outliers
21
Clustering-Based

Basic idea
Cluster the data into groups of different density
Choose points in small cluster as candidate
outliers
Compute the distance between candidate points and
non-candidate clusters.
If candidate points are far from all other
non-candidate points, they are outliers

22
One-Class SVM

Based on support vector clustering
Extension of SVM approach to clustering
2 key ideas in SVM
It uses the maximal margin principle to find the
linear separating hyperplane
For nonlinearly separable data, it uses a kernel
function to project the data into higher
dimensional space

23
Support Vector Machine (Idea 1)

Maximal margin principle

24
Support Vector Machine (Idea 2)
Original Space
High-dimensional Feature Space
25
Support Vector Clustering
What is the corresponding maximum margin
principle?
Original Space
26
Support Vector Clustering

In SVM
Start with the simplest case first, then make the
problem more complex
Simplest case linearly separable data
Apply same idea to clustering
What is the simplest case?
All the points belong to a single cluster
The cluster is globular (spherical)

27
Support Vector Clustering
SVM
Choose the hyperplane with largest margin
Choose the sphere with smallest radius
28
Support Vector Clustering

Let R be the radius of the sphere
Goal is to
subject to
where
a is the center of the sphere

a
x
29
Support Vector Clustering

Objective function
where ?Is are the Lagrange multipliers
Subject to
?i ? 0

30
Support Vector Clustering

Objective function (dual form)
Find the ?Is that maximizes the expression s.t.

31
Support Vector Clustering

Since
If xi is located in the interior of the sphere,
then ?i 0
If xi is located on the surface of the sphere
then ?i ? 0
Support vectors are the data points located on
the cluster boundary

32
Outliers

Outliers are considered data points located
outside the sphere
Let ?i be the error for xi
Goal is to
subject to

a
?
x
?
33
Outliers

Lagrangian
Subject to

34
Outliers

Dual form
Same as the previous (no outlier) case

35
Outliers

Since
If xi is located in the interior of the sphere,
then ?i 0
If xi is located on the surface of the sphere
then ?i ? 0
Such points are called the support vectors
If xi is located outside of the sphere then ?i
0
Such points are called the bounded support
vectors

36
Irregular Shaped Clusters

What if the cluster have irregular shaped in the
original space?
Instead of using a very large sphere, or a sphere
with large errors (? ?i), project the data into
higher-dimensional space (kernel trick)

?(xi)
xi
?
37
Irregular Shaped Clusters

Objective function (dual form)
Kernel trick
Use kernel function in place of ?(xi)? ?(xj)
Typical kernel function
Gaussian

38
References

Support Vector ClusteringBy Ben-Hur, Horn,
Siegelmann, and Vapnik (Journal of Machine
Learning Research, 2001)
http//citeseer.ist.psu.edu/hur01support.html
Cone Cluster Labeling for Support Vector
ClusteringBy Lee and Daniels (in Proc. of SIAM
Intl Conf on Data Mining, 2006)
http//www.siam.org/meetings/sdm06/proceedings/04
6lees.pdf

39
Graph-based Method

Represent the data as a graph
Objects ? nodes
Similarity ? edges
Apply graph-based method to determine outliers

40
Graph-based Method
Find the most outlying node in the graph gt
Opposite of finding the most central node
41
Graph-based Method

Many measures of node centrality
Degree
Closeness
where d(u,n) is the geodesic distance between u
and n
Geodesic distance is the shortest path distance
Betweenness
where gjk(n) is the number of geodesic paths
from j to k that pass through n
Random walk method

42
Random Walk Method

Random walk model
Randomly pick a starting node, s
Randomly choose a neighboring node linked to s.
Set current node s to be the neighboring node.
Repeat step 2
Compute the probability that you will reach a
particular node in the graph
The higher the probability, the more central
the node is.

43
Random Walk Method