Network Traffic Modeling

- Mark Crovella
- Boston University Computer Science

Outline of Day

- 900 1045 Lecture 1, including break
- 1045 1200 Exercise Set 1
- 1200 1330 Lunch
- 1330 1515 Lecture 2, including break
- 1515 1700 Exercise Set 2

The Big Picture

- There are two main uses for Traffic Modeling
- Performance Analysis
- Concerned with questions such as delay,

throughput, packet loss. - Network Engineering and Management
- Concerned with questions such as capacity

planning, traffic engineering, anomaly detection.

- Some principal differences are that of timescale

and stationarity.

Relevant Timescales

Network Engineering effects happen on long

timescales from an hour to months

Performance effects happen on short

timescales from nanoseconds up to an hour

1 hour

1 day

1 week

1 usec

1 sec

Stationarity, informally

- A stationary process has the property that the

mean, variance and autocorrelation structure do

not change over time. - Informally we mean a flat looking series,

without trend, constant variance over time, a

constant autocorrelation structure over time and

no periodic fluctuations (seasonality).

NIST/SEMATECH e-Handbook of Statistical Methods

http//www.itl.nist.gov/div898/handbook/

The 1-Hour / Stationarity Connection

- Nonstationarity in traffic is primarily a result

of varying human behavior over time - The biggest trend is diurnal
- This trend can usually be ignored up to

timescales of about an hour, especially in the

busy hour

Outline

- Morning Performance Evaluation
- Part 0 Stationary assumption
- Part 1 Models of fine-timescale behavior
- Part 2 Traffic patterns seen in practice
- Afternoon Network Engineering
- Models of long-timescale behavior
- Part 1 Single Link
- Part 2 Multiple Links

Morning Part 1Traffic Models for Performance

Evaluation

- Goal Develop models useful for
- Queueing analysis
- eg, G/G/1 queues
- Other analysis
- eg, traffic shaping
- Simulation
- eg, router or network simulation

A Reasonable Approach

- Fully characterizing a stochastic process can be

impossible - Potentially infinite set of properties to capture
- Some properties can be very hard to estimate
- A reasonable approach is to concentrate on two

particular properties - marginal distribution and autocorrelation

Marginals and Autocorrelation

- Characterizing a process in terms of these two

properties gives you - a good approximate understanding of the process,
- without involving a lot of work,
- or requiring complicated models,
- or requiring estimation of too many parameters.
- Hopefully!

Marginals

- Given a stochastic process XXi, we are

interested in the distribution of any Xi - i.e., f(x) P(Xix)
- Since we assume X is stationary, it doesnt

matter which Xi we pick. - Estimated using a histogram

Histograms and CDFs

- A Histogram is often a poor estimate of the pdf

f(x) because it involves binning the data - The CDF F(x) PXi lt x will have a point for

each distinct data value can be much more

accurate

Modeling the Marginals

- We can form a compact summary of a pdf f(x) if we

find that it is well described by a standard

distribution eg - Gaussian (Normal)
- Exponential
- Poisson
- Pareto
- Etc
- Statistical methods exist for
- asking whether a dataset is well described by a

particular distribution - Estimating the relevant parameters

Distributional Tails

- A particularly important part of a distribution

is the (upper) tail - PXgtx
- Large values dominate statistics and performance
- Shape of tail critically important

Light Tails, Heavy Tails

- Light Exponential or faster decline
- Heavy Slower than any exponential

f1(x) 2 exp(-2(x-1))

f2(x) x-2

Examining Tails

- Best done using log-log complementary CDFs
- Plot log(1-F(x)) vs log(x)

1-F2(x)

1-F1(x)

Heavy Tails Arrive

- pre-1985 Scattered measurements note high

variability in computer systems workloads - 1985 1992 Detailed measurements note long

distributional tails - File sizes
- Process lifetimes
- 1993 1998 Attention focuses specifically on

(approximately) polynomial tail shape heavy

tails - post-1998 Heavy tails used in standard models

Power Tails, Mathematically

We say that a random variable X is power tailed

if

where a b means

Focusing on polynomial shape allows Parsimonious

description Capture of variability in a parameter

A Fundamental Shift in Viewpoint

- Traditional modeling methods have focused on

distributions with light tails - Tails that decline exponentially fast (or faster)
- Arbitrarily large observations are vanishingly

rare - Heavy tailed models behave quite differently
- Arbitrarily large observations have

non-negligible probability - Large observations, although rare, can dominate a

systems performance characteristics

Heavy Tails are Surprisingly Common

- Sizes of data objects in computer systems
- Files stored on Web servers
- Data objects/flow lengths traveling through the

Internet - Files stored in general-purpose Unix filesystems
- I/O traces of filesystem, disk, and tape activity
- Process/Job lifetimes
- Node degree in certain graphs
- Inter-domain and router structure of the Internet
- Connectivity of WWW pages
- Zipfs Law

Evidence Web File Sizes

Barford et al., World Wide Web, 1999

Evidence Process Lifetimes

- Harchol-Balter and Downey,
- ACM TOCS,
- 1997

The Bad News

- Workload metrics following heavy tailed

distributions are extremely variable - For example, for power tails
- When a ? 2, distribution has infinite variance
- When a ? 1, distribution has infinite mean
- In practice, empirical moments are slow to

converge or nonconvergent - To characterize system performance, either
- Attention must shift to distribution itself, or
- Attention must be paid to timescale of analysis

Heavy Tails in Practice

Power tailswith a0.8

Large observations dominate statistics (e.g.,

sample mean)

Autocorrelation

- Once we have characterized the marginals, we know

a lot about the process. - In fact, if the process consisted of i.i.d.

samples, we would be done. - However, most traffic has the property that its

measurements are not independent. - Lack of independence usually results in

autocorrelation - Autocorrelation is the tendency for two

measurements to both be greater than, or less

than, the mean at the same time.

Autocorrelation

Measuring Autocorrelation

- Autocorrelation Function (ACF) (assumes

stationarity) - R(k) Cov(Xn,Xnk)
- EXn Xnk E2X0

ACF of i.i.d. samples

How Does Autocorrelation Arise?

Network traffic is the superposition of flows

Request

Server

Client

Internet (TCP/IP)

click

Response

Why Flows? Sources appear to be ON/OFF

ON

OFF

P1

P2

P3

Superposition of ON/OFF sources ? Autocorrelation

P1

P2

P3

Morning Part 2 Traffic Patterns Seen in Practice

- Traffic patterns on a link are strongly affected

by two factors - amount of multiplexing on the link
- Essentially how many flows are sharing the

link? - Where flows are bottlenecked
- Is each flows bottleneck on, or off the link?
- Do all bottlenecks have similar rate?

Low Multiplexed Traffic

- Marginals highly variable
- Autocorrelation low

Highly MultiplexedTraffic

High Multiplexed, Bottlenecked Traffic

- Marginals tending to Gaussian
- Autocorrelation high

Highly Mutiplexed, Mixed-Bottlenecks

dec-pkt-1 (Internet Traffic Archive)

Alpha and Beta Traffic

- ON/OFF model revisited
- High variability in connection rates (RTTs)

Low rate beta

High rate alpha

stable Levy noise

fractional Gaussian noise

Rice U., SPIN Group

Long Range Dependence

Rk k-a 0 lt a lt 1

Rk a-k a gt 1

H1-a/2

Correlation and Scaling

- Long range dependence affects how variability

scales with timescale - Take a traffic timeseries Xn, sum it over blocks

of size m - This is equivalent to observing the original

process on a longer timescale - How do the mean and std dev change?
- Mean will always grow in proportion to m
- For i.i.d. data, the std dev will grow in

proportion to sqrt(m) - So, for i.i.d. data, the process is smoother at

longer timescale

Self-similarity unusual scaling of variability

- Exact self-similarity of a zero-mean, stationary

process Xn - H Hurst parameter 1/2 lt H lt 1
- H 1/2 for i.i.d. Xn
- LRD leads to (at least) asymptotic s.s.

Self Similarity in Practice

H0.95

H0.50

10ms

1s

100s

The Great Wave (Hokusai)

How Does Self-Similarity Arise?

Self-similarity ? Autocorrelation ? Flows

Autocorrelation declines like a power law

? Distribution of flow lengths has power law

tail

?

Power Tailed ON/OFF sources ? Self-Similarity

ON

OFF

P1

P2

P3

Measuring Scaling Properties

- In principle, one can simply aggregate Xn over

varying sizes of m, and plot resulting variance

as a function of m - Linear behavior on a log-log plot gives an

estimate of H (or a). - Slope gt -1 indicates LRD

WARNING this method is very sensitive to

violation of assumptions!

Better Wavelet-based estimation

Veitch and Abry

Optional Material PerformanceImplications of

Self-Similarity

Performance implications of S.S.

- Asymptotic queue length distribution (G/G/1)
- For SRD Traffic
- For LRD Traffic
- Severe - but, realistic?

Evaluating Self-Similarity

- Queueing Models like these are open systems
- delay does not feed back to source
- TCP dynamics are not being considered
- packet losses cause TCP to slow down
- Better approach
- Closed network, detailed modeling of TCP dynamics
- self-similarity traffic generated naturally

Simulating Self-Similar Traffic

- Simulated network with multiple clients, servers
- Clients alternative between requests, idle times
- Files drawn from heavy-tailed distribution
- Vary a to vary self-similarity of traffic
- Each request is simulated at packet level,

including detailed TCP flow control - Compare with UDP (no flow control) as an example

of an open system

Traffic Characteristics

- Self-similarity varies smoothly as function of a

Performance EffectsPacket Loss

Open Loop UDP

Closed Loop TCP

Performance EffectsTCP Throughput

Performance EffectsBuffer Queue Lengths

Open Loop UDP

Closed Loop TCP

Severity of Packet Delay

Performance Implications

- Self-similarity is present in Web traffic
- The internets most popular application
- For the Web, the causes of s.s. can be traced to

the heavy-tailed distribution of Web file sizes - Caching doesnt seem to affect things much
- Multimedia tends to increase tail weight of Web

files - But, even text files alone appear to be

heavy-tailed

Performance Implications (continued)

- Knowing how s.s. arises allows us to recreate it

naturally in simulation - Effects of s.s. in simulated TCP networks
- Packet loss not as high as open-loop models might

suggest - Throughput not a problem
- Packet delays are the big problem

Morning Lab Exercises

- For each dataset, explore its marginals
- Histograms
- CDFs, CCDFs,
- Log-log CCDFs to look at tails
- For each dataset, explore its correlations
- ACFs
- Logscale diagrams
- Compare to scrambled dataset
- Study performance
- Simple queueing
- Compare to scrambled dataset

Afternoon Network Engineering

- Moving from stationary domain to nonstationary
- Goal Traffic models that are useful for
- capacity planning
- traffic engineering
- anomaly / attack detection
- Two main variants
- Looking at traffic on a single link at a time
- Looking at traffic on multiple or all links in a

network simultaneously

Part 1 Single Link Analysis

- The general modeling framework that is most often

used is - signal noise
- Sometimes interested in the signal, sometimes the

noise - So, signal processing techniques are common
- Frequency Domain / Spectral Analysis
- Generally based on FFT
- Time-Frequency Analysis
- Generally based on wavelets

A Typical Trace

Notable features periodicity noisiness spikes

Capacity Planning

- Here, mainly interested in the signal
- Want to predict long-term trends
- What do we need to remove?
- Noise
- Periodicity

K. Papagiannaki et al., Infocom 2003

Periodicity Spectral Analysis

Denoising with Wavelets

Capacity Planning via Forecasting

Anomaly detection

- Goal models that are useful in detecting
- Network equipment failures
- Network misconfigurations
- Flash crowds
- Attacks
- Network Abuse

Misconfiguration detection via Wavelets

Traffic

High Freq.

Med Freq.

Low Freq.

P. Barford et al., Internet Measurement Workshop

2002

Flash Crowd Detection

Long Term Change in Mean Traffic (8 weeks)

Detecting DoS attacks

Afternoon Part 2 Multiple Links

- How to analyze traffic from multiple links?
- Clearly, could treat as a collection of single

links, and proceed as before - But, want more to detect trends and patterns

across multiple links - Observation multiple links share common

underlying patterns - Diurnal variation should be similar across links
- Many anomalies will span multiple links
- Problem is one of pattern extraction in high

dimension - Dimension is number of links

Example Link Traces from a Single Network

Some have visible structure, some less so

High Dimensionality A General Strategy

- Look for a low-dimensional representation

preserving the most important features of data - Often, a high-dimensional structure may

explainable in terms of a small number of

independent variables - Commonly used tool Principal Component Analysis

(PCA)

Principal Component Analysis

For any given dataset, PCA finds a new coordinate

system that maps maximum variability in the data

to a minimum number of coordinates New axes are

called Principal Axes or Components

Correlation in Space, not Time

Links

time

Traditional Frequency-Domain Analysis

PCA on Link Traffic

Links

time

XVS-1U XUSVT

Singular Value Decomposition

- XUSVT is the
- singular value decomposition of X

PCA on Link Traffic (2)

Singular values indicate the energy attributable

to a principal component

Each link is weighted sum of all eigenlinks

Low Intrinsic Dimensionalityof Link Data

A plot of the singular values reveals how much

energy is captured by each PC Sharp elbow

indicates that most of the energy captured by

5-10 singular values, for all datasets

Implications of Low Instrinsic Dimensionality

- Apparently, we can reconstruct X with high

accuracy, keeping only a few columns of U - A form of lossy data compression
- Even more, a way of extracting the most

significant part of X, automatically - signal noise?

XUSVT

Approximating With Top 5 Eigenlinks

Approximating With Top 5 Eigenlinks

Approximating With Top 5 Eigenlinks

A Link, Reconstructed

Link traffic

Components 1-5

Components 6-10

All the 100 rest

Anomaly Detection

Single Link Approach Use wavelets to detrend

each flow in isolation. BarfordIMW02 Multi

Link approach Detrend all links simultaneously

by choosing only certain principal components.

X X X

PCA based anomaly detection

L2 norm of entire traffic vector X

L2 norm of residualvector X

Traffic Forecasting

Single-Link approach Treat each flow

timeseries independently. Use wavelets to

extract trends. Build timeseries forecasting

models on trends. PapagiannakiINFOCOM03 M

ulti-Link approach Build forecasting models on

most significant eigenlinks as trends. Allows

simultaneous examination and forecasting for

entire ensemble of links.

XUSVT

Conclusions

- Whew!
- Bibliography on handout
- Traffic analysis methods vary considerably

depending on - Question being asked
- Timescale
- Stationarity

Conclusions

- Performance Evaluation
- Marginals
- Watch out for heavy tails
- Correlation (in time)
- Watch out for LRD / Self Similarity

- Network Engineering
- Signal Noise
- Single-link Frequency domain analysis
- Multi-link Exploit Spatial correlation

Afternoon Lab Exercises

- For each dataset,
- Perform PCA
- Assess the variance in each component
- Reconstruct using small number of components
- Time / Interest permitting,
- Analyze some of the single link timeseries using

wavelets (matlab wavelet toolbox)