Network Traffic Modeling - PowerPoint PPT Presentation

Loading...

PPT – Network Traffic Modeling PowerPoint presentation | free to download - id: 45bc62-ODViM



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Network Traffic Modeling

Description:

Network Traffic Modeling Mark Crovella Boston University Computer Science ... Concerned with questions such as capacity planning, traffic engineering, anomaly detection. – PowerPoint PPT presentation

Number of Views:1435
Avg rating:3.0/5.0
Slides: 90
Provided by: Lip6
Learn more at: http://kom.aau.dk
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Network Traffic Modeling


1
Network Traffic Modeling
  • Mark Crovella
  • Boston University Computer Science

2
Outline of Day
  • 900 1045 Lecture 1, including break
  • 1045 1200 Exercise Set 1
  • 1200 1330 Lunch
  • 1330 1515 Lecture 2, including break
  • 1515 1700 Exercise Set 2

3
The Big Picture
  • There are two main uses for Traffic Modeling
  • Performance Analysis
  • Concerned with questions such as delay,
    throughput, packet loss.
  • Network Engineering and Management
  • Concerned with questions such as capacity
    planning, traffic engineering, anomaly detection.
  • Some principal differences are that of timescale
    and stationarity.

4
Relevant Timescales
Network Engineering effects happen on long
timescales from an hour to months
Performance effects happen on short
timescales from nanoseconds up to an hour
1 hour
1 day
1 week
1 usec
1 sec
5
Stationarity, informally
  • A stationary process has the property that the
    mean, variance and autocorrelation structure do
    not change over time.
  • Informally we mean a flat looking series,
    without trend, constant variance over time, a
    constant autocorrelation structure over time and
    no periodic fluctuations (seasonality).

NIST/SEMATECH e-Handbook of Statistical Methods
http//www.itl.nist.gov/div898/handbook/
6
The 1-Hour / Stationarity Connection
  • Nonstationarity in traffic is primarily a result
    of varying human behavior over time
  • The biggest trend is diurnal
  • This trend can usually be ignored up to
    timescales of about an hour, especially in the
    busy hour

7
Outline
  • Morning Performance Evaluation
  • Part 0 Stationary assumption
  • Part 1 Models of fine-timescale behavior
  • Part 2 Traffic patterns seen in practice
  • Afternoon Network Engineering
  • Models of long-timescale behavior
  • Part 1 Single Link
  • Part 2 Multiple Links

8
Morning Part 1Traffic Models for Performance
Evaluation
  • Goal Develop models useful for
  • Queueing analysis
  • eg, G/G/1 queues
  • Other analysis
  • eg, traffic shaping
  • Simulation
  • eg, router or network simulation

9
A Reasonable Approach
  • Fully characterizing a stochastic process can be
    impossible
  • Potentially infinite set of properties to capture
  • Some properties can be very hard to estimate
  • A reasonable approach is to concentrate on two
    particular properties
  • marginal distribution and autocorrelation

10
Marginals and Autocorrelation
  • Characterizing a process in terms of these two
    properties gives you
  • a good approximate understanding of the process,
  • without involving a lot of work,
  • or requiring complicated models,
  • or requiring estimation of too many parameters.
  • Hopefully!

11
Marginals
  • Given a stochastic process XXi, we are
    interested in the distribution of any Xi
  • i.e., f(x) P(Xix)
  • Since we assume X is stationary, it doesnt
    matter which Xi we pick.
  • Estimated using a histogram

12
Histograms and CDFs
  • A Histogram is often a poor estimate of the pdf
    f(x) because it involves binning the data
  • The CDF F(x) PXi lt x will have a point for
    each distinct data value can be much more
    accurate

13
Modeling the Marginals
  • We can form a compact summary of a pdf f(x) if we
    find that it is well described by a standard
    distribution eg
  • Gaussian (Normal)
  • Exponential
  • Poisson
  • Pareto
  • Etc
  • Statistical methods exist for
  • asking whether a dataset is well described by a
    particular distribution
  • Estimating the relevant parameters

14
Distributional Tails
  • A particularly important part of a distribution
    is the (upper) tail
  • PXgtx
  • Large values dominate statistics and performance
  • Shape of tail critically important

15
Light Tails, Heavy Tails
  • Light Exponential or faster decline
  • Heavy Slower than any exponential

f1(x) 2 exp(-2(x-1))
f2(x) x-2
16
Examining Tails
  • Best done using log-log complementary CDFs
  • Plot log(1-F(x)) vs log(x)

1-F2(x)
1-F1(x)
17
Heavy Tails Arrive
  • pre-1985 Scattered measurements note high
    variability in computer systems workloads
  • 1985 1992 Detailed measurements note long
    distributional tails
  • File sizes
  • Process lifetimes
  • 1993 1998 Attention focuses specifically on
    (approximately) polynomial tail shape heavy
    tails
  • post-1998 Heavy tails used in standard models

18
Power Tails, Mathematically
We say that a random variable X is power tailed
if
where a b means
Focusing on polynomial shape allows Parsimonious
description Capture of variability in a parameter
19
A Fundamental Shift in Viewpoint
  • Traditional modeling methods have focused on
    distributions with light tails
  • Tails that decline exponentially fast (or faster)
  • Arbitrarily large observations are vanishingly
    rare
  • Heavy tailed models behave quite differently
  • Arbitrarily large observations have
    non-negligible probability
  • Large observations, although rare, can dominate a
    systems performance characteristics

20
Heavy Tails are Surprisingly Common
  • Sizes of data objects in computer systems
  • Files stored on Web servers
  • Data objects/flow lengths traveling through the
    Internet
  • Files stored in general-purpose Unix filesystems
  • I/O traces of filesystem, disk, and tape activity
  • Process/Job lifetimes
  • Node degree in certain graphs
  • Inter-domain and router structure of the Internet
  • Connectivity of WWW pages
  • Zipfs Law

21
Evidence Web File Sizes
Barford et al., World Wide Web, 1999
22
Evidence Process Lifetimes
  • Harchol-Balter and Downey,
  • ACM TOCS,
  • 1997

23
The Bad News
  • Workload metrics following heavy tailed
    distributions are extremely variable
  • For example, for power tails
  • When a ? 2, distribution has infinite variance
  • When a ? 1, distribution has infinite mean
  • In practice, empirical moments are slow to
    converge or nonconvergent
  • To characterize system performance, either
  • Attention must shift to distribution itself, or
  • Attention must be paid to timescale of analysis

24
Heavy Tails in Practice
Power tailswith a0.8
Large observations dominate statistics (e.g.,
sample mean)
25
Autocorrelation
  • Once we have characterized the marginals, we know
    a lot about the process.
  • In fact, if the process consisted of i.i.d.
    samples, we would be done.
  • However, most traffic has the property that its
    measurements are not independent.
  • Lack of independence usually results in
    autocorrelation
  • Autocorrelation is the tendency for two
    measurements to both be greater than, or less
    than, the mean at the same time.

26
Autocorrelation
27
Measuring Autocorrelation
  • Autocorrelation Function (ACF) (assumes
    stationarity)
  • R(k) Cov(Xn,Xnk)
  • EXn Xnk E2X0

28
ACF of i.i.d. samples
29
How Does Autocorrelation Arise?
Network traffic is the superposition of flows
Request
Server
Client
Internet (TCP/IP)
click
Response
30
Why Flows? Sources appear to be ON/OFF
ON
OFF


P1
P2
P3



31
Superposition of ON/OFF sources ? Autocorrelation
P1
P2
P3
32
Morning Part 2 Traffic Patterns Seen in Practice
  • Traffic patterns on a link are strongly affected
    by two factors
  • amount of multiplexing on the link
  • Essentially how many flows are sharing the
    link?
  • Where flows are bottlenecked
  • Is each flows bottleneck on, or off the link?
  • Do all bottlenecks have similar rate?

33
Low Multiplexed Traffic
  • Marginals highly variable
  • Autocorrelation low

34
Highly MultiplexedTraffic
35
High Multiplexed, Bottlenecked Traffic
  • Marginals tending to Gaussian
  • Autocorrelation high

36
Highly Mutiplexed, Mixed-Bottlenecks
dec-pkt-1 (Internet Traffic Archive)
37
Alpha and Beta Traffic
  • ON/OFF model revisited
  • High variability in connection rates (RTTs)

Low rate beta
High rate alpha





stable Levy noise
fractional Gaussian noise
Rice U., SPIN Group
38
Long Range Dependence
Rk k-a 0 lt a lt 1
Rk a-k a gt 1
H1-a/2
39
Correlation and Scaling
  • Long range dependence affects how variability
    scales with timescale
  • Take a traffic timeseries Xn, sum it over blocks
    of size m
  • This is equivalent to observing the original
    process on a longer timescale
  • How do the mean and std dev change?
  • Mean will always grow in proportion to m
  • For i.i.d. data, the std dev will grow in
    proportion to sqrt(m)
  • So, for i.i.d. data, the process is smoother at
    longer timescale

40
Self-similarity unusual scaling of variability
  • Exact self-similarity of a zero-mean, stationary
    process Xn
  • H Hurst parameter 1/2 lt H lt 1
  • H 1/2 for i.i.d. Xn
  • LRD leads to (at least) asymptotic s.s.

41
Self Similarity in Practice
H0.95
H0.50
10ms
1s
100s
42
The Great Wave (Hokusai)
43
How Does Self-Similarity Arise?
Self-similarity ? Autocorrelation ? Flows
Autocorrelation declines like a power law
? Distribution of flow lengths has power law
tail
?
44
Power Tailed ON/OFF sources ? Self-Similarity


ON
OFF
P1
P2
P3
45
Measuring Scaling Properties
  • In principle, one can simply aggregate Xn over
    varying sizes of m, and plot resulting variance
    as a function of m
  • Linear behavior on a log-log plot gives an
    estimate of H (or a).
  • Slope gt -1 indicates LRD

WARNING this method is very sensitive to
violation of assumptions!
46
Better Wavelet-based estimation
Veitch and Abry
47
Optional Material PerformanceImplications of
Self-Similarity
48
Performance implications of S.S.
  • Asymptotic queue length distribution (G/G/1)
  • For SRD Traffic
  • For LRD Traffic
  • Severe - but, realistic?

49
Evaluating Self-Similarity
  • Queueing Models like these are open systems
  • delay does not feed back to source
  • TCP dynamics are not being considered
  • packet losses cause TCP to slow down
  • Better approach
  • Closed network, detailed modeling of TCP dynamics
  • self-similarity traffic generated naturally

50
Simulating Self-Similar Traffic
  • Simulated network with multiple clients, servers
  • Clients alternative between requests, idle times
  • Files drawn from heavy-tailed distribution
  • Vary a to vary self-similarity of traffic
  • Each request is simulated at packet level,
    including detailed TCP flow control
  • Compare with UDP (no flow control) as an example
    of an open system

51
Traffic Characteristics
  • Self-similarity varies smoothly as function of a

52
Performance EffectsPacket Loss
Open Loop UDP
Closed Loop TCP
53
Performance EffectsTCP Throughput
54
Performance EffectsBuffer Queue Lengths
Open Loop UDP
Closed Loop TCP
55
Severity of Packet Delay
56
Performance Implications
  • Self-similarity is present in Web traffic
  • The internets most popular application
  • For the Web, the causes of s.s. can be traced to
    the heavy-tailed distribution of Web file sizes
  • Caching doesnt seem to affect things much
  • Multimedia tends to increase tail weight of Web
    files
  • But, even text files alone appear to be
    heavy-tailed

57
Performance Implications (continued)
  • Knowing how s.s. arises allows us to recreate it
    naturally in simulation
  • Effects of s.s. in simulated TCP networks
  • Packet loss not as high as open-loop models might
    suggest
  • Throughput not a problem
  • Packet delays are the big problem

58
Morning Lab Exercises
  • For each dataset, explore its marginals
  • Histograms
  • CDFs, CCDFs,
  • Log-log CCDFs to look at tails
  • For each dataset, explore its correlations
  • ACFs
  • Logscale diagrams
  • Compare to scrambled dataset
  • Study performance
  • Simple queueing
  • Compare to scrambled dataset

59
Afternoon Network Engineering
  • Moving from stationary domain to nonstationary
  • Goal Traffic models that are useful for
  • capacity planning
  • traffic engineering
  • anomaly / attack detection
  • Two main variants
  • Looking at traffic on a single link at a time
  • Looking at traffic on multiple or all links in a
    network simultaneously

60
Part 1 Single Link Analysis
  • The general modeling framework that is most often
    used is
  • signal noise
  • Sometimes interested in the signal, sometimes the
    noise
  • So, signal processing techniques are common
  • Frequency Domain / Spectral Analysis
  • Generally based on FFT
  • Time-Frequency Analysis
  • Generally based on wavelets

61
A Typical Trace
Notable features periodicity noisiness spikes

62
Capacity Planning
  • Here, mainly interested in the signal
  • Want to predict long-term trends
  • What do we need to remove?
  • Noise
  • Periodicity

K. Papagiannaki et al., Infocom 2003
63
Periodicity Spectral Analysis
64
Denoising with Wavelets
65
Capacity Planning via Forecasting
66
Anomaly detection
  • Goal models that are useful in detecting
  • Network equipment failures
  • Network misconfigurations
  • Flash crowds
  • Attacks
  • Network Abuse

67
Misconfiguration detection via Wavelets
Traffic
High Freq.
Med Freq.
Low Freq.
P. Barford et al., Internet Measurement Workshop
2002
68
Flash Crowd Detection
Long Term Change in Mean Traffic (8 weeks)
69
Detecting DoS attacks
70
Afternoon Part 2 Multiple Links
  • How to analyze traffic from multiple links?
  • Clearly, could treat as a collection of single
    links, and proceed as before
  • But, want more to detect trends and patterns
    across multiple links
  • Observation multiple links share common
    underlying patterns
  • Diurnal variation should be similar across links
  • Many anomalies will span multiple links
  • Problem is one of pattern extraction in high
    dimension
  • Dimension is number of links

71
Example Link Traces from a Single Network
Some have visible structure, some less so
72
High Dimensionality A General Strategy
  • Look for a low-dimensional representation
    preserving the most important features of data
  • Often, a high-dimensional structure may
    explainable in terms of a small number of
    independent variables
  • Commonly used tool Principal Component Analysis
    (PCA)

73
Principal Component Analysis
For any given dataset, PCA finds a new coordinate
system that maps maximum variability in the data
to a minimum number of coordinates New axes are
called Principal Axes or Components
74
Correlation in Space, not Time
Links
time
Traditional Frequency-Domain Analysis
75
PCA on Link Traffic
Links
time
XVS-1U XUSVT
76
Singular Value Decomposition
  • XUSVT is the
  • singular value decomposition of X

77
PCA on Link Traffic (2)
Singular values indicate the energy attributable
to a principal component
Each link is weighted sum of all eigenlinks
78
Low Intrinsic Dimensionalityof Link Data
A plot of the singular values reveals how much
energy is captured by each PC Sharp elbow
indicates that most of the energy captured by
5-10 singular values, for all datasets
79
Implications of Low Instrinsic Dimensionality
  • Apparently, we can reconstruct X with high
    accuracy, keeping only a few columns of U
  • A form of lossy data compression
  • Even more, a way of extracting the most
    significant part of X, automatically
  • signal noise?

XUSVT
80
Approximating With Top 5 Eigenlinks
81
Approximating With Top 5 Eigenlinks
82
Approximating With Top 5 Eigenlinks
83
A Link, Reconstructed
Link traffic
Components 1-5
Components 6-10
All the 100 rest
84
Anomaly Detection
Single Link Approach Use wavelets to detrend
each flow in isolation. BarfordIMW02 Multi
Link approach Detrend all links simultaneously
by choosing only certain principal components.
X X X
85
PCA based anomaly detection
L2 norm of entire traffic vector X
L2 norm of residualvector X
86
Traffic Forecasting
Single-Link approach Treat each flow
timeseries independently. Use wavelets to
extract trends. Build timeseries forecasting
models on trends. PapagiannakiINFOCOM03 M
ulti-Link approach Build forecasting models on
most significant eigenlinks as trends. Allows
simultaneous examination and forecasting for
entire ensemble of links.
XUSVT
87
Conclusions
  • Whew!
  • Bibliography on handout
  • Traffic analysis methods vary considerably
    depending on
  • Question being asked
  • Timescale
  • Stationarity

88
Conclusions
  • Performance Evaluation
  • Marginals
  • Watch out for heavy tails
  • Correlation (in time)
  • Watch out for LRD / Self Similarity
  • Network Engineering
  • Signal Noise
  • Single-link Frequency domain analysis
  • Multi-link Exploit Spatial correlation

89
Afternoon Lab Exercises
  • For each dataset,
  • Perform PCA
  • Assess the variance in each component
  • Reconstruct using small number of components
  • Time / Interest permitting,
  • Analyze some of the single link timeseries using
    wavelets (matlab wavelet toolbox)
About PowerShow.com