Time Series from their Observed Sums: Network Tomography - PowerPoint PPT Presentation

About This Presentation
Title:

Time Series from their Observed Sums: Network Tomography

Description:

The distinct OD flows, components of (t), are assumed to be independent. Use EM algorithm ... Gamma and log-Normal OD flows (Xt) ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 54
Provided by: edoardo
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Time Series from their Observed Sums: Network Tomography


1
Time Series from their Observed Sums Network
Tomography
  • Edoardo M. Airoldi
  • School of Computer Science
  • Carnegie Mellon University
  • (joint work with Christos Faloutsos)

SIGKDD, Seattle, WA August 23nd 2004
2
Acknowledgements
  • Srinivasan Seshan, CSD, CMU
  • Russel Yount and Frank Kietzke, Network
    Development, CMU
  • Stephen Fienberg, Statistics, CMU
  • Jin Cao, Bell Labs
  • Claudia Tebaldi, NCAR
  • Yin Zhang, ATT Labs

3
Outline
  • Introduction / Motivation
  • Survey
  • Proposed Methods
  • Results
  • Conclusions

4
Application Domains
  • Communication Networks
  • goal Who is sending to whom
  • refs Cao et al (2001), Liang Yu (2003),
    Zhang et al (2004)
  • Transportation Networks
  • goal Who is going where
  • Network Probing (Rish et al, IBM)
  • goal Which server is down
  • refs Rish et al (2002, 2004)

5
Communication Networks
  • A large ISP network has 100s of nodes, 1000s of
    links, 10000s routes, and over 1 petabyte (1015
    bytes) per day

OD flows
  • Reliability analysis
  • Predict link loads under unexpected/planned
    router/link failures
  • Traffic engineering
  • Optimize routes to minimize congestion
  • Capacity planning
  • Forecast future capacity requirements

link loads
6
Mathematical Formulation
X1
X
X2
LINK
Y
X3
Situation at time t
X4
One Constraint
Total ?i Yi 0


7
Problem Definition
Given topology, fixed routing scheme Anxm,
traffic on the links of the network Y(t)Y1(t),
, Yn(t) over time t 1, , T Find non-observab
le traffic between origin-destination (OD) pairs
X(t)X1(t), , Xm(t) over time t 1, , T.
Y(t) AX(t)
Under-constrained
8
A Glance at the Data
Find OD Flows X(t)
X1(t1) X2(t1) X3(t1) X4(t1)
X1(t2) X2(t2) X3(t2) X4(t2)
X1(t3) X2(t3) X3(t3) X4(t3)
X1(t4) X2(t4) X3(t4) X4(t4)
?
Time
Kb
Y1(t1) Y2(t1) Y3(t1)
Y1(t2) Y2(t2) Y3(t2)
Y1(t3) Y2(t3) Y3(t3)
Y1(t4) Y2(t4) Y3(t4)
Measure Link Flows Y(t)
hour of the day
9
Our Problem No Traffic Matrix
  • Traffic matrix
  • Gives traffic volumes between origin and
    destination
  • Very difficult to directly measure
  • Direct measurement Feldmann et al. 2000
  • Collect flow-level data around the whole edge of
    the network
  • Combine with routing data
  • Semi-standard router feature Netflow
  • Cisco, Juniper, etc.
  • Not always well supported
  • Potential performance impact on routers
  • Huge amount of data (500GB/day)
  • Widely available SNMP data gives only link loads
  • Even this data is not perfect (glitches, loss, )

10
Outline
  • Introduction / Motivation
  • Survey
  • Proposed Methods
  • Results
  • Conclusions

11
Infinite Exact Solutions
  • Measurements (Yt) and routing scheme A3x4 allow
    for many feasible OD flows (Yt)
  • For example

The problem is under-constrained and we need some
assumptions
12
Related Work
y Ax
  • Solutions in the past
  • Direct solution SVD
  • Scoring criterion GLS, maximum likelihood,
    entropy, Bayesian methods,
  • Regularization assume independent OD flows
  • Estimate OD flows xt using yt-?, yt?

13
Pitfalls of Past Approaches
  • Unrealistic Models
  • Gaussian or Poisson OD traffic flows. But we
    observe bursty, log-Normal traffic flows.
  • Time Dependence across Epochs
  • Never explicitly addressed, and typically
    assume xt independent over time. But we observe
    time dependence of single OD flows.

14
Empirical Laws log-Normality
  • Aggregate OD flows look log-log Normal

Counts
Counts
Log Bytes
Log-Log Bytes
12321 OD time series. CMU validation data.
15
Outline
  • Introduction / Motivation
  • Survey
  • Proposed Method
  • 1st Stage - Linear Dynamical Systems
  • 2nd Stage - Bayesian Dynamical Systems
  • Results
  • Conclusions

16
The Model
  • A smooth average process ?t t gt 0
  • A possibly bursty process xt t gt 0 to model
    the OD traffic flows

17
Parameter Estimation
  • Estimate parameters underlying the average
    process ?t t gt 0
  • Calibrate priors for the parameters driving the
    dynamic of the OD flows process xt t gt 0
  • Estimate the OD flows using a Particle Filter

18
Outline
  • Introduction / Motivation
  • Survey
  • Proposed Method
  • 1st Stage - Linear Dynamical Systems
  • 2nd Stage - Bayesian Dynamical Systems
  • Results
  • Conclusions

19
Introducing Time Dependence
  • We introduce explicit time dependence
  • ?(t) Fnxn ? ?(t-1) e(t)
  • The distinct OD flows, components of ?(t), are
    assumed to be independent
  • Use EM algorithm

20
Introducing Time Dependence
  • Our Linear Dynamical System contains the models
    by Cao et al. as a special case

21
Outline
  • Introduction / Motivation
  • Survey
  • Proposed Method
  • 1st Stage - Linear Dynamical Systems
  • 2nd Stage - Bayesian Dynamical Systems
  • Results
  • Conclusions

22
Bayesian Dynamical System
  • Gamma and log-Normal OD flows (Xt)
  • Use preliminary estimates of ?t t gt 0 , the
    average OD flows, to softly constrain the
    dynamical behavior of the OD flows to identify
    the correct solution for Xt

23
Non-Deterministic Dynamics
  • Introduce explicit non-deterministic dynamics (F)
    on the average OD flows
  • ?(t1) Fnxn ?(t)
  • Diagonal matrix Fnxn Fi,i log-Normal

24
Learning Latent Dynamics
  • We want a preliminary estimate for Ft in
  • ?t1 Ft1 ? ?t

?
P(?247Y247)
P(?246Y246)
Solve for F247
25
Outline
  • Introduction / Motivation
  • Survey
  • Proposed Methods
  • Results
  • Datasets
  • Importance of Time Dependence
  • Importance of non-Gaussianity
  • Informative Priors for non-Gaussian BDS
  • Conclusions

26
Validation Data sets
  • Consider star network topologies
  • 4 OD flows, 9 OD flows and 16 OD flows
  • Carnegie Mellon 12321 time series
  • Lucent Technologies 32 time series

X1
X
X2
LINK
Y
X3
Situation at time t
X4
27
Log-Normal OD Traffic Flows
  • The validation OD traffic flows are skewed on
    both data sets

28
Outline
  • Introduction / Motivation
  • Survey
  • Proposed Methods
  • Results
  • Datasets
  • Importance of Time Dependence
  • Importance of non-Gaussianity
  • Informative Priors for non-Gaussian BDS
  • Conclusions

29
Reduce Variability
  • Narrower range of possible values for the OD
    traffic flows those which receive positive
    posterior probability

30
Robust Estimates
  • Capture sharp changes in the distribution of the
    OD traffic flows

31
Outline
  • Introduction / Motivation
  • Survey
  • Proposed Methods
  • Results
  • Datasets
  • Importance of Time Dependence
  • Importance of non-Gaussianity
  • Informative Priors for non-Gaussian BDS
  • Conclusions

32
Capture Several Bursts
Kb
time
33
Outline
  • Introduction / Motivation
  • Survey
  • Proposed Methods
  • Results
  • Datasets
  • Importance of Time Dependence
  • Importance of non-Gaussianity
  • Informative Priors for non-Gaussian BDS
  • Conclusions

34
Priors and Bayesian inference
  • Informative Priors on ?t t gt 0 lead to
    uni-modal posteriors

35
Speed and Scalability
  • The computing is time about 3 minutes
  • 4 OD - 3 Links using R on Mac G4 667
  • Linear in (OD) for each time point
  • 1 day worth
  • of data in 45
  • minutes

36
Model Comparison
37
Numerical Comparison
l2
38
Outline
  • Introduction / Motivation
  • Survey
  • Proposed Methods
  • Results
  • Conclusions

39
Past Approaches
  • Unreasonable Models
  • Gaussian or Poisson arrivals
  • Time Dependence
  • never explicitly addressed

40
Conclusions
  • Log-Normal models account for skewed and bursty,
    non-observable OD flows
  • Novel BDS captures time dependence of data thus
    reducing the variability of the estimates
  • Informative priors serve as soft constraints to
    overcome the under-determinacy of the problem

41
Future Work
  • More tests on bigger networks
  • from 2-star (4-D) to 4-star (16-D)
  • Fit non-parametric seasonal components for the
    non-observable OD flows

42
  • BACK - UP

43
Network Engineering
  • State-of-the-Art guess and tweak
  • Guess based on experience intuition
  • Manually tweak things, and hope the best
  • Disadvantages
  • Manual process time consuming, error prone
  • Not very reliable intuition may be wrong,
    unexpected side effects
  • Suboptimal performance wastes resource/time
  • Need to repeat the exercise when traffic pattern
    changes

44
A More Scientific Approach?
A "Well, we don't know the topology, we don't
know the traffic matrix, the routers don't
automatically adapt the routes to the traffic,
and we don't know how to optimize the routing
configuration. But, other than that, we're all
set!" Rexford2000, Kurose2003
45
Contributions
  • Realistic Models Gamma and log-Normal
  • P( OD Flows(t) ?(t) )
  • Explicit Time Dependence
  • E( OD Flows(t) y(t) y(1) )

46
Contributions
  • Informative priors in a Bayesian Dynamical System
    for an under-constrained problem
  • Drive our inferences to the correct solution
  • Get high quality particles
  • Easy solution for Sparse Traffic

47
Exploring the OD space
  • Gibbs sampler with Metropolis steps is able to
    explore P(Xt Yt)
  • We prove irreducibility of the chains
  • Gamma, log-Normal

P(XtYt) gt 0
P(XtYt) 0
P(XtYt) gt 0
48
Non-Deterministic Dynamics
  • Introduce explicit non-deterministic dynamics (F)
    on the average OD flows
  • ?(t1) Fnxn ?(t)
  • Diagonal matrix Fnxn Fi,i log-Normal
    leads to
  • ?(t1) F?(t) ? e?(t1) eFe?(t) ? ?(t1)
    F?(t)

49
Better OD Flows in 4 Steps
1
2
4
3
50
Immanuel Kant o(1)
  • In making inferences on non-observable quantities
    we find the model we look for!
  • Assume a model that reasonably approximates real
    OD flows, and of course it does not hurt to have
    a prior opinion about it

51
Learning OD Flows
  • Typical solutions are based on
  • Generalized Least Squares
  • Maximum Likelihood
  • Bayesian methods
  • Entropy
  • These methods generate one set of OD flows X from
    multiple observations Y1,..,YT. In general

max pD1X, Xobs qD2Y, Yobs s.t.
Y A X, X ? 0, p,q ? 0,1 fixed
X
Random
52
Intrinsic Dimensionality
  • The routing matrix A has m rows lt n columns, and
    its m rows are linearly independent
  • The space Rn where the OD flows live, can be
    decomposed into a sub-space R(n-m) with an open
    interior, and a degenerate sub-space Rm

It is possible to rearrange AA1,A2, and
XX1,X2 accordingly, so that given X2 ? R(n-m)
X1 A1(Y - A2X2) ? Rm
-1
53
Doubly Stochastic BDS
Write a Comment
User Comments (0)
About PowerShow.com