Network Tomography and Anomaly Detection - PowerPoint PPT Presentation

About This Presentation
Title:

Network Tomography and Anomaly Detection

Description:

(opening it up can. disturb the system) too complex to measure everywhere, all the time ... Convergence analysis of [Crisan, Doucet 01; Le Gland, Oudjane 02] applies. ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 51
Provided by: markc48
Category:

less

Transcript and Presenter's Notes

Title: Network Tomography and Anomaly Detection


1
Network Tomography and Anomaly Detection
  • Mark Coates
  • Tarem Ahmed

Network map from www.opte.org
2
Brain mapping (opening it up can disturb the
system)
3
Internet Boom
  • too complex to measure everywhere, all the time
  • traffic measurements expensive (hardware,
    bandwidth)

4
Brain Tomography
counting projection
MRF model
Poisson
5
Link-level Network Tomography
6
Link-level Network Tomography
Solely from edge-based traffic measurements,
infer internal
topology / connectivity
link-level loss probability and delay distribution
7
Application Topology Discovery
  • Challenges
  • 12 never respond,15 multiple interfaces -
    Barford et al (2000)
  • detect level-2 topology invisible to IP layer
    (e.g., switches)

8
Application Overlay Voice-over-IP
  • Multiple paths to choose from
  • select paths with minimal delay or delay variance
  • Send a small number of critical packets (vocal
    transitions) along multiple paths
  • Use these packets to estimate the path delays
    (and the extent of path diversity)

Access Network
Overlay Link
Service Gateway
Autonomous System(s)
9
Network Monitoring
  • Challenges
  • Restricted measurement
  • High volumes and high rates of data (sampling of
    traffic on Gb/s routers)
  • High dimensional data (source/destination IP
    addresses, port numbers)
  • Goals
  • Supply networking protocols with relevant
    performance information.
  • Identify anomalous behaviour and operational
    transitions.
  • Provide network administrators with appropriate
    notification or visualization.

10
Outline
  • Inference about network performance based on
    passive measurements or active probing
  • Two components to the talk
  • Network tomography
  • Network anomaly detection
  • Focus on online, sequential approaches
  • Account for non-stationary behaviour
  • Dont repeat work that has already been done

11
Network Tomography Likelihood Formulation
  • A routing matrix (graph)
  • ? packet loss probabilities
  • or queuing delays
  • for each link
  • y packet losses or delays
  • measured at the edge
  • randomness inherent in
  • traffic measurements

Statistical likelihood function
12
Classical Problem
Solve the linear system
Interesting if A, ?, or ? have special structures
Maximize the likelihood function
or
13
Network Tomography The Basic Idea
sender
receivers
14
Network Tomography The Basic Idea
sender
receivers
15
Packet-pair measurements
cross-traffic
delay
measurement packet pair
packet(1) and packet(2) experience (nearly)
identical losses and/or delays on shared links
16
Modelling time-variations
Cross-traffic
Cross-traffic
  • Nonstationary cross-traffic induces
    time-variation
  • Directly model the dynamics (but not the
    traffic!)
  • Goal is to perform online tracking and prediction
    of network link characteristics

17
Non-stationary behaviour
Introduce time-dependence in parameters
Filtering exercise (track ?t )
(1) Describe dynamic behaviour of ?t
(2) Form estimate
(MMSE)

18
Particle Filtering
19
Delay Distribution Tracking
  • Time-varying delay distribution of window size R
    at time m

Delay unit
  • In each window, R probe measurements.
  • Form estimates of average delay and jitter over
    short time intervals

time
Delay units
20
Dynamic Model
  • Queue/traffic model
  • reflected random walk on
    0,max_del

Probability
Delay units
21
Observations
  • Measurements
  • Observe

22
Estimation of Delay Distributions
  • Sequential Monte Carlo Approximation to
    posterior mean estimate

Message-passing algorithm
Particle weights
  • Estimate of time-varying delay distribution

23
Analysis
  • Complexity per
    measurement

Average Number of Unique Links
Number of Particles
Max. delay units per link
  • Convergence analysis of Crisan, Doucet 01 Le
    Gland, Oudjane 02 applies.
  • The approximation to the posterior mean estimate
    converges to the true estimate as N 8

24
Simulation Results ns2
Delay Distributions
true
tracking
Mean Delay
time
25
Comments
  • Dynamic models allow us to account for
    non-stationarity
  • but realistic models are hard to derive and
    incorporate
  • Particle filtering only appropriate when
    analytical techniques fail
  • non-Gaussian or non-linear dynamics or
    observations
  • Sequential structure allows on-line
    implementation
  • Care must be taken to reduce computation at each
    step

26
Network Anomaly Detection
  • In tomography, a primary challenge is the
    restriction on available measurements.
  • Anomaly detection a primary challenge is the
    abundance of measurements.
  • How can we process data at a sufficient rate?
  • How should we extract relevant information?

27
Netflow Data
  • Records of flows.
  • A flow is defined by (source IP, dest. IP,
    source port , dest. port )
  • Packets are sampled at configurable rates.
  • Exported at 1-minute or 5-minute intervals.

28
Dataset Abilene Network
Abilene Weathermap Indiana University
Thanks to Rick Summerhill and Mark Fullmer at
Abilene for providing access to the data.
29
Principal Component Analysis (PCA)
  • Goal Identify a low-dimensional subspace that
    captures the key components of the feature set
  • Idea If (most of) a measurement does not lie in
    this subspace, then it is anomalous
  • PCA
  • conduct a linear transformation to choose a new
    coordinate system
  • Projection onto first principal component has
    greater variance than any other projection
    (maximum energy).
  • Subsequent principal components capture greatest
    remaining energy

30
PCA (2)
  • Reduce dimensionality by eliminating principal
    components that do not contribute significantly
    to variance in the dataset (small singular value)
  • Not optimized for class separability (linear
    discriminant analysis)
  • Minimizes reconstruction error under L2 norm.

31
Eigenflow Analysis
  • Lakhina et al. (2004, 2004b).
  • PCA analysis of Origin-Destination (OD) Flows
  • Eigenflow set of flows mapped onto a single
    principle component
  • Intrinsic Dimensionality Empirical studies for
    Sprint and Abilene networks indicated that 5-10
    principal components sufficed to capture most of
    the energy.

32
PCA-based Anomaly Detection
  • Perform PCA on block of OD flow measurements
  • Project each measurement onto primary principal
    components
  • Test whether the residual energy exceeds a
    threshold.
  • Squared prediction error (SPE - Q-statistic)
    used to test for unusual flow-types.
  • Prone to Type-I errors (false positives) when
    applied to transient operations.
  • In these cases, the assumption that the source
    data is normally distributed is violated.

33
Online Method
  • Dont need to relearn from scratch when new data
    arrive
  • Computational cost per time step should be
    bounded by constant independent of time
  • Block-based PCA unattractive
  • Alternative method Kernel Recursive Least
    Squares (KRLS)

34
KRLS
  • Represent function as
  • Where xi are training points
  • Desire a sparse solution (storage and time
    savings generalization ability)
  • Effective dimensionality of manifold spanned by
    training feature vectors may be much smaller than
    feature space dimension
  • Identify linearly independent feature vectors
    that approximately span this manifold.

35
KRLS
  • Sequentially sample a stream of input/output
    pairs
  • At time step t, assume we have collected a
    dictionary of samples
  • where by construction are linearly independent
    feature vectors

36
KRLS
  • We encounter a new sample xt.
  • Test whether is approximately linearly dependent
    on feature vectors.
  • If not, add it to dictionary.

Threshold
Dictionary approximation
37
KRLS Properties
  • Provided input set X is compact, then number of
    dictionary elements is finite.
  • Approximate version of kernel PCA
  • eigenvectors with eigenvalues significantly
    larger than are projected almost entirely
    onto the dictionary set.
  • O(m2) memory and O(tm2) time
  • Compare exact kernel PCA O(t2) memory and
    O(t2p) time.

38
Application in Networks
  • Data set is the Origin-Destination Flows (11x11
    matrix 121 dimensional vector per measurement
    interval).
  • Normalized, these comprise the features.
  • We use the total traffic per measurement interval
    as the associated value y

39
Total traffic
No. Packets
Measurement interval
0000 hrs on Aug 10, 2005 to 2359 hrs Aug 21, 2005
at Chicago router. Gives 3456, 5-minute
intervals over the 12-day period.
40
Origin-Destination Flows
t 100
t 1
t 1300
t 3000
41
Building the Dictionary
d
n 0.2
Elements
Gaussian
d
n 0.1
Elements
Linear
Measurement interval
Measurement interval
42
Dictionary Components
Element 6
Element 5
Element 20
Element 22
43
KRLS Anomaly Detection Algorithm
  1. Based on xt , evaluate dt.
  2. If dt lt ?1, green-light traffic.
  3. If dt lt ?2, raise red alarm.
  4. If ?1 lt dt lt ?2 raise orange alarm.
  5. Test usefulness of xt. (Does f(xt) provide good
    support for ensuing vectors).
  6. If yes, add xt to the dictionary.
  7. If no, raise red alarm.
  8. Remove any obsolete dictionary elements

44
Evaluating Usefulness
Normal
Obsolete
Kernel value
Anomalous
Timestep
45
Anomaly Detection
KRLS
PCA
Magnitude of Residual
OCNM
Euclidean distance
KRLS
PCA
OCNM
Timestep
46
PCA versus KRLSAnomaly 1
No. IP flows
Timestep
47
PCA Versus KRLSAnomaly 2
No. IP flows
Magnitude of Projection
Timestep
48
Summary and Challenges
  • Network monitoring presents challenges on
    different fronts
  • Constraints on available measurements
    (reconstruction based on partial views)
  • High-rate, high-dimensional, distributed data
  • (Some of the many) open questions
  • Tomography network models, spatial temporal
    correlations, optimal sampling, multiple source.
  • Anomaly detection thresholds, dictionary
    control, feature space, dataset

49
Fig 3
Detection Rate ()
False Alarm Rate ()
50
Particle filtering
Objective Estimate expectations
with respect to a sequence of distributions known
up to a normalizing constant, i.e.
Monte Carlo Obtain N weighted samples
where
such that
Write a Comment
User Comments (0)
About PowerShow.com