Network Traffic Modeling - PowerPoint PPT Presentation

View by Category
Title:

Network Traffic Modeling

Description:

Network Traffic Modeling Mark Crovella Boston University Computer Science ... Concerned with questions such as capacity planning, traffic engineering, anomaly detection. – PowerPoint PPT presentation

Number of Views:1730
Avg rating:3.0/5.0
Slides: 90
Provided by: Lip6
Category:
Tags:
Transcript and Presenter's Notes

Title: Network Traffic Modeling

1
Network Traffic Modeling
• Mark Crovella
• Boston University Computer Science

2
Outline of Day
• 900 1045 Lecture 1, including break
• 1045 1200 Exercise Set 1
• 1200 1330 Lunch
• 1330 1515 Lecture 2, including break
• 1515 1700 Exercise Set 2

3
The Big Picture
• There are two main uses for Traffic Modeling
• Performance Analysis
• Concerned with questions such as delay,
throughput, packet loss.
• Network Engineering and Management
• Concerned with questions such as capacity
planning, traffic engineering, anomaly detection.
• Some principal differences are that of timescale
and stationarity.

4
Relevant Timescales
Network Engineering effects happen on long
timescales from an hour to months
Performance effects happen on short
timescales from nanoseconds up to an hour
1 hour
1 day
1 week
1 usec
1 sec
5
Stationarity, informally
• A stationary process has the property that the
mean, variance and autocorrelation structure do
not change over time.
• Informally we mean a flat looking series,
without trend, constant variance over time, a
constant autocorrelation structure over time and
no periodic fluctuations (seasonality).

NIST/SEMATECH e-Handbook of Statistical Methods
http//www.itl.nist.gov/div898/handbook/
6
The 1-Hour / Stationarity Connection
• Nonstationarity in traffic is primarily a result
of varying human behavior over time
• The biggest trend is diurnal
• This trend can usually be ignored up to
timescales of about an hour, especially in the
busy hour

7
Outline
• Morning Performance Evaluation
• Part 0 Stationary assumption
• Part 1 Models of fine-timescale behavior
• Part 2 Traffic patterns seen in practice
• Afternoon Network Engineering
• Models of long-timescale behavior

8
Morning Part 1Traffic Models for Performance
Evaluation
• Goal Develop models useful for
• Queueing analysis
• eg, G/G/1 queues
• Other analysis
• eg, traffic shaping
• Simulation
• eg, router or network simulation

9
A Reasonable Approach
• Fully characterizing a stochastic process can be
impossible
• Potentially infinite set of properties to capture
• Some properties can be very hard to estimate
• A reasonable approach is to concentrate on two
particular properties
• marginal distribution and autocorrelation

10
Marginals and Autocorrelation
• Characterizing a process in terms of these two
properties gives you
• a good approximate understanding of the process,
• without involving a lot of work,
• or requiring complicated models,
• or requiring estimation of too many parameters.
• Hopefully!

11
Marginals
• Given a stochastic process XXi, we are
interested in the distribution of any Xi
• i.e., f(x) P(Xix)
• Since we assume X is stationary, it doesnt
matter which Xi we pick.
• Estimated using a histogram

12
Histograms and CDFs
• A Histogram is often a poor estimate of the pdf
f(x) because it involves binning the data
• The CDF F(x) PXi lt x will have a point for
each distinct data value can be much more
accurate

13
Modeling the Marginals
• We can form a compact summary of a pdf f(x) if we
find that it is well described by a standard
distribution eg
• Gaussian (Normal)
• Exponential
• Poisson
• Pareto
• Etc
• Statistical methods exist for
• asking whether a dataset is well described by a
particular distribution
• Estimating the relevant parameters

14
Distributional Tails
• A particularly important part of a distribution
is the (upper) tail
• PXgtx
• Large values dominate statistics and performance
• Shape of tail critically important

15
Light Tails, Heavy Tails
• Light Exponential or faster decline
• Heavy Slower than any exponential

f1(x) 2 exp(-2(x-1))
f2(x) x-2
16
Examining Tails
• Best done using log-log complementary CDFs
• Plot log(1-F(x)) vs log(x)

1-F2(x)
1-F1(x)
17
Heavy Tails Arrive
• pre-1985 Scattered measurements note high
• 1985 1992 Detailed measurements note long
distributional tails
• File sizes
• 1993 1998 Attention focuses specifically on
(approximately) polynomial tail shape heavy
tails
• post-1998 Heavy tails used in standard models

18
Power Tails, Mathematically
We say that a random variable X is power tailed
if
where a b means
Focusing on polynomial shape allows Parsimonious
description Capture of variability in a parameter
19
A Fundamental Shift in Viewpoint
• Traditional modeling methods have focused on
distributions with light tails
• Tails that decline exponentially fast (or faster)
• Arbitrarily large observations are vanishingly
rare
• Heavy tailed models behave quite differently
• Arbitrarily large observations have
non-negligible probability
• Large observations, although rare, can dominate a
systems performance characteristics

20
Heavy Tails are Surprisingly Common
• Sizes of data objects in computer systems
• Files stored on Web servers
• Data objects/flow lengths traveling through the
Internet
• Files stored in general-purpose Unix filesystems
• I/O traces of filesystem, disk, and tape activity
• Node degree in certain graphs
• Inter-domain and router structure of the Internet
• Connectivity of WWW pages
• Zipfs Law

21
Evidence Web File Sizes
Barford et al., World Wide Web, 1999
22
• Harchol-Balter and Downey,
• ACM TOCS,
• 1997

23
• Workload metrics following heavy tailed
distributions are extremely variable
• For example, for power tails
• When a ? 2, distribution has infinite variance
• When a ? 1, distribution has infinite mean
• In practice, empirical moments are slow to
converge or nonconvergent
• To characterize system performance, either
• Attention must shift to distribution itself, or
• Attention must be paid to timescale of analysis

24
Heavy Tails in Practice
Power tailswith a0.8
Large observations dominate statistics (e.g.,
sample mean)
25
Autocorrelation
• Once we have characterized the marginals, we know
• In fact, if the process consisted of i.i.d.
samples, we would be done.
• However, most traffic has the property that its
measurements are not independent.
• Lack of independence usually results in
autocorrelation
• Autocorrelation is the tendency for two
measurements to both be greater than, or less
than, the mean at the same time.

26
Autocorrelation
27
Measuring Autocorrelation
• Autocorrelation Function (ACF) (assumes
stationarity)
• R(k) Cov(Xn,Xnk)
• EXn Xnk E2X0

28
ACF of i.i.d. samples
29
How Does Autocorrelation Arise?
Network traffic is the superposition of flows
Request
Server
Client
Internet (TCP/IP)
click
Response
30
Why Flows? Sources appear to be ON/OFF
ON
OFF

P1
P2
P3

31
Superposition of ON/OFF sources ? Autocorrelation
P1
P2
P3
32
Morning Part 2 Traffic Patterns Seen in Practice
• Traffic patterns on a link are strongly affected
by two factors
• amount of multiplexing on the link
• Essentially how many flows are sharing the
• Where flows are bottlenecked
• Is each flows bottleneck on, or off the link?
• Do all bottlenecks have similar rate?

33
Low Multiplexed Traffic
• Marginals highly variable
• Autocorrelation low

34
Highly MultiplexedTraffic
35
High Multiplexed, Bottlenecked Traffic
• Marginals tending to Gaussian
• Autocorrelation high

36
Highly Mutiplexed, Mixed-Bottlenecks
dec-pkt-1 (Internet Traffic Archive)
37
Alpha and Beta Traffic
• ON/OFF model revisited
• High variability in connection rates (RTTs)

Low rate beta
High rate alpha

stable Levy noise
fractional Gaussian noise
Rice U., SPIN Group
38
Long Range Dependence
Rk k-a 0 lt a lt 1
Rk a-k a gt 1
H1-a/2
39
Correlation and Scaling
• Long range dependence affects how variability
scales with timescale
• Take a traffic timeseries Xn, sum it over blocks
of size m
• This is equivalent to observing the original
process on a longer timescale
• How do the mean and std dev change?
• Mean will always grow in proportion to m
• For i.i.d. data, the std dev will grow in
proportion to sqrt(m)
• So, for i.i.d. data, the process is smoother at
longer timescale

40
Self-similarity unusual scaling of variability
• Exact self-similarity of a zero-mean, stationary
process Xn
• H Hurst parameter 1/2 lt H lt 1
• H 1/2 for i.i.d. Xn
• LRD leads to (at least) asymptotic s.s.

41
Self Similarity in Practice
H0.95
H0.50
10ms
1s
100s
42
The Great Wave (Hokusai)
43
How Does Self-Similarity Arise?
Self-similarity ? Autocorrelation ? Flows
Autocorrelation declines like a power law
? Distribution of flow lengths has power law
tail
?
44
Power Tailed ON/OFF sources ? Self-Similarity

ON
OFF
P1
P2
P3
45
Measuring Scaling Properties
• In principle, one can simply aggregate Xn over
varying sizes of m, and plot resulting variance
as a function of m
• Linear behavior on a log-log plot gives an
estimate of H (or a).
• Slope gt -1 indicates LRD

WARNING this method is very sensitive to
violation of assumptions!
46
Better Wavelet-based estimation
Veitch and Abry
47
Optional Material PerformanceImplications of
Self-Similarity
48
Performance implications of S.S.
• Asymptotic queue length distribution (G/G/1)
• For SRD Traffic
• For LRD Traffic
• Severe - but, realistic?

49
Evaluating Self-Similarity
• Queueing Models like these are open systems
• delay does not feed back to source
• TCP dynamics are not being considered
• packet losses cause TCP to slow down
• Better approach
• Closed network, detailed modeling of TCP dynamics
• self-similarity traffic generated naturally

50
Simulating Self-Similar Traffic
• Simulated network with multiple clients, servers
• Clients alternative between requests, idle times
• Files drawn from heavy-tailed distribution
• Vary a to vary self-similarity of traffic
• Each request is simulated at packet level,
including detailed TCP flow control
• Compare with UDP (no flow control) as an example
of an open system

51
Traffic Characteristics
• Self-similarity varies smoothly as function of a

52
Performance EffectsPacket Loss
Open Loop UDP
Closed Loop TCP
53
Performance EffectsTCP Throughput
54
Performance EffectsBuffer Queue Lengths
Open Loop UDP
Closed Loop TCP
55
Severity of Packet Delay
56
Performance Implications
• Self-similarity is present in Web traffic
• The internets most popular application
• For the Web, the causes of s.s. can be traced to
the heavy-tailed distribution of Web file sizes
• Caching doesnt seem to affect things much
• Multimedia tends to increase tail weight of Web
files
• But, even text files alone appear to be
heavy-tailed

57
Performance Implications (continued)
• Knowing how s.s. arises allows us to recreate it
naturally in simulation
• Effects of s.s. in simulated TCP networks
• Packet loss not as high as open-loop models might
suggest
• Throughput not a problem
• Packet delays are the big problem

58
Morning Lab Exercises
• For each dataset, explore its marginals
• Histograms
• CDFs, CCDFs,
• Log-log CCDFs to look at tails
• For each dataset, explore its correlations
• ACFs
• Logscale diagrams
• Compare to scrambled dataset
• Study performance
• Simple queueing
• Compare to scrambled dataset

59
Afternoon Network Engineering
• Moving from stationary domain to nonstationary
• Goal Traffic models that are useful for
• capacity planning
• traffic engineering
• anomaly / attack detection
• Two main variants
• Looking at traffic on a single link at a time
• Looking at traffic on multiple or all links in a
network simultaneously

60
• The general modeling framework that is most often
used is
• signal noise
• Sometimes interested in the signal, sometimes the
noise
• So, signal processing techniques are common
• Frequency Domain / Spectral Analysis
• Generally based on FFT
• Time-Frequency Analysis
• Generally based on wavelets

61
A Typical Trace
Notable features periodicity noisiness spikes

62
Capacity Planning
• Here, mainly interested in the signal
• Want to predict long-term trends
• What do we need to remove?
• Noise
• Periodicity

K. Papagiannaki et al., Infocom 2003
63
Periodicity Spectral Analysis
64
Denoising with Wavelets
65
Capacity Planning via Forecasting
66
Anomaly detection
• Goal models that are useful in detecting
• Network equipment failures
• Network misconfigurations
• Flash crowds
• Attacks
• Network Abuse

67
Misconfiguration detection via Wavelets
Traffic
High Freq.
Med Freq.
Low Freq.
P. Barford et al., Internet Measurement Workshop
2002
68
Flash Crowd Detection
Long Term Change in Mean Traffic (8 weeks)
69
Detecting DoS attacks
70
• How to analyze traffic from multiple links?
• Clearly, could treat as a collection of single
• But, want more to detect trends and patterns
• Observation multiple links share common
underlying patterns
• Diurnal variation should be similar across links
• Many anomalies will span multiple links
• Problem is one of pattern extraction in high
dimension
• Dimension is number of links

71
Example Link Traces from a Single Network
Some have visible structure, some less so
72
High Dimensionality A General Strategy
• Look for a low-dimensional representation
preserving the most important features of data
• Often, a high-dimensional structure may
explainable in terms of a small number of
independent variables
• Commonly used tool Principal Component Analysis
(PCA)

73
Principal Component Analysis
For any given dataset, PCA finds a new coordinate
system that maps maximum variability in the data
to a minimum number of coordinates New axes are
called Principal Axes or Components
74
Correlation in Space, not Time
time
75
time
XVS-1U XUSVT
76
Singular Value Decomposition
• XUSVT is the
• singular value decomposition of X

77
Singular values indicate the energy attributable
to a principal component
78
A plot of the singular values reveals how much
energy is captured by each PC Sharp elbow
indicates that most of the energy captured by
5-10 singular values, for all datasets
79
Implications of Low Instrinsic Dimensionality
• Apparently, we can reconstruct X with high
accuracy, keeping only a few columns of U
• A form of lossy data compression
• Even more, a way of extracting the most
significant part of X, automatically
• signal noise?

XUSVT
80
81
82
83
Components 1-5
Components 6-10
All the 100 rest
84
Anomaly Detection
Single Link Approach Use wavelets to detrend
each flow in isolation. BarfordIMW02 Multi
by choosing only certain principal components.
X X X
85
PCA based anomaly detection
L2 norm of entire traffic vector X
L2 norm of residualvector X
86
Traffic Forecasting
timeseries independently. Use wavelets to
extract trends. Build timeseries forecasting
models on trends. PapagiannakiINFOCOM03 M
ulti-Link approach Build forecasting models on
most significant eigenlinks as trends. Allows
simultaneous examination and forecasting for
XUSVT
87
Conclusions
• Whew!
• Bibliography on handout
• Traffic analysis methods vary considerably
depending on
• Timescale
• Stationarity

88
Conclusions
• Performance Evaluation
• Marginals
• Watch out for heavy tails
• Correlation (in time)
• Watch out for LRD / Self Similarity
• Network Engineering
• Signal Noise