Toward Validation and Control of Network Models - PowerPoint PPT Presentation

About This Presentation

Title:

Toward Validation and Control of Network Models

Description:

My (biased) opinion: The bar should now be very high for observation/interpretation. ... My (biased) opinion: this is one useful approach; but not the end of ... – PowerPoint PPT presentation

Number of Views:31

Avg rating:3.0/5.0

Slides: 40

Provided by: mich298

Learn more at: http://www.eecs.harvard.edu

Category:

more less

Transcript and Presenter's Notes

Title: Toward Validation and Control of Network Models

1
Toward Validation and Control of Network Models

Michael Mitzenmacher
Harvard University

2
Internet Mathematics
Articles Related to This Talk
The Future of Power Law Research
A Brief History of Generative Models for Power
Law and Lognormal Distributions
3
Motivation General

Network Science and Engineering is emerging as
its own (sub)field.
NSF cross-cutting area starting this year.
Courses Cornell (Easley/Kleinberg), Kearns (U
Penn), many others.
For undergrads, not just grads!
In popular culture books like Linked by
Barabasi or Six Degrees by Watts.
Other sciences Economics, biology, physics,
ecology, linguistics, etc.
What has been and what should be the research
agenda?

4
My (Biased) View

The 5 stages of networking research.
Observe Gather data to demonstrate a behavior
in a system. (Example power law behavior.)
Interpret Explain the importance of this
observation in the system context.
Model Propose an underlying model for the
observed behavior of the system.
Validate Find data to validate (and if
necessary specialize or modify) the model.
Control Design ways to control and modify the
underlying behavior of the system based on the
model.

5
My (Biased) View

In networks, we have spent a lot of time
observing and interpreting behaviors.
We are currently very active in modeling.
Many, many possible models.
Perhaps easiest to write papers about.
We need to now put much more focus on validation
and control.
Have been moving in this direction.
And these are specific areas where computer
science has much to contribute!

6
Models

After observation, the natural step is to
explain/model the behavior.
Outcome lots of modeling papers.
And many models rediscovered.
Example power laws
Lots of history

7
History

In 1990s, the abundance of observed power laws
in networks surprised the community.
Perhaps they shouldnt have power laws appear
frequently throughout the sciences.
Pareto income distribution, 1897
Zipf-Auerbach city sizes, 1913/1940s
Zipf-Estouf word frequency, 1916/1940s
Lotka bibliometrics, 1926
Yule species and genera, 1924.
Mandelbrot economics/information theory, 1950s
Observation/interpretation were/are key to
initial understanding.
My claim but now the mere existence of power
laws should not be surprising, or necessarily
even noteworthy.
My (biased) opinion The bar should now be very
high for observation/interpretation.

8
So Many Models

Preferential Attachment
Optimization (HOT)
Monkeys typing randomly (scaling)
Multiplicative processes
Kronecker graphs
Forest fire model (densification)

9
What Makes a Good Model

New variations coming up all of the time.
Question What makes a new network model
sufficiently interesting to merit attention
and/or publication?
Strong connection to an observed process.
Many models claim this, but few demonstrate it
convincingly.
Theory perspective significant new mathematical
insight or sophistication.
A matter of taste?
My (biased) opinion the bar should start being
raised on model papers.

10
Validation The Current Stage

We now have so many models.
It is important to know the right model, to
extrapolate and control future behavior.
Given a proposed underlying model, we need tools
to help us validate it.
We appear to be entering the validation stage of
research. BUT the first steps have focused on
invalidation rather than validation.

11
Examples Invalidation

Lakhina, Byers, Crovella, Xie
Show that observed power-law of Internet topology
might be because of biases in traceroute
sampling.
Pedarsani, Figueiredo, Grossglauser
Show that densification may also arise by
sampling approaches, not necessarily intrinsic to
network.
Chen, Chang, Govindan, Jamin, Shenker, Willinger
Show that Internet topology has characteristics
that do not match preferential-attachment graphs.
Suggest an alternative mechanism.
But does this alternative match all
characteristics, or are we still missing some?

12
My (Biased) View

Invalidation is an important part of the process!
BUT it is inherently different than validating a
model.
Validating seems much harder.
Indeed, it is arguable what constitutes a
validation.
Question what should it mean to say
This model is consistent with observed data.

13
An Alternative View

There is no right model.
A model is the best until some other model comes
along and proves better.
Greedy refinement via invalidation in model
space.
Statistical techniques compare likelihood ratios
for various models.
My (biased) opinion this is one useful
approach but not the end of the question.
Need methods other than comparison for confirming
validity of a model.

14
Time-Series/Trace Analysis

Many models posit some sort of actions.
New pages linking to pages in the Web.
New routers joining the network.
New files appearing in a file system.
A validation approach gather traces and see if
the traces suitably match the model.
Trace gathering can be a challenging systems
problem.
Check model match requires using appropriate
statistical techniques and tests.
May lead to new, improved, better justified
models.

15
Sampling and Trace Analysis

Often, cannot record all actions.
Internet is too big!
Sampling
Global snapshots of entire system at various
times.
Local record actions of sample agents in a
system.
Examples
Snapshots of file systems full systems vs.
actions of individual users.
Router topology Internet maps vs. changes at
subset of routers.
Question how much/what kind of sampling is
sufficient to validate a model appropriately?
Does this differ among models?

16
To Control

In many systems, intervention can impact the
outcome.
Maybe not for earthquakes, but for computer
networks!
Typical setting individual agents acting in
their own selfish interest. Agents can be given
incentives to change behavior.
General problem given a good model, determine
how to change system behavior to optimize a
global performance function.
Distributed algorithmic mechanism design.
Mix of economics/game theory and computer science.

17
Possible Control Approaches

Adding constraints local or global
Example total space in a file system.
Example preferential attachment but links
limited by an underlying metric.
Add incentives or costs
Example charges for exceeding soft disk quotas.
Example payments for certain AS level
connections.
Limiting information
Impact decisions by not letting everyone have
true view of the system.

18
My Related Work Hash Algorithms

On the Internet, we need a measurement and
monitoring infrastructure, for validation and
control.
Approximate is fine speed is key.
Must be general, multi-purpose.
Must allow data aggregation.
Solution hash-based architecture.
Eventual goal every router has a programmable
hash engine.

19
Vision

Three-pronged research data.
Low Efficient hardware implementations of
relevant algorithms and data structures.
Medium New, improved data structures and
algorithms for old and new applications.
High Distributed infrastructure supporting
monitoring and measurement schemes.

20
The High-Level Pitch

Lots of hash-based schemes being designed for
approximate measurement/monitoring tasks.
But not built into the system to begin with.
Want a flexible router architecture that allows
New methods to be easily added.
Distributed cooperation using such schemes.

21
What We Need
On-Chip Memory
Off-Chip Memory
CAM(s)
Memory
Hashing Computation Unit
Programming Language
Unit for Other Computation
Computation
Control System
Communication Architecture
Communication Control
22
Lots of Design Questions

How much space for various memory levels? How to
dynamically divide memory among competing
applications?
What hash functions should be included? Openness
to new hash functions?
What programming language and functionality?
What communication infrastructure?
Security?
And so on

23
Which Hash Functions?

Theorists
Want analyzable hash functions.
Dislike standard assumption of perfectly random
hash functions.
Hard to prove things about actual performance.
Practitioners
Want easy implementation, speed, small space.
Want simple analysis (back-of-the-envelope).
Will accept simulated results under right
settings.

24
Why Do Weak Hash Functions Work So Well?

In reality, assuming perfectly random hash
functions seems to be the right thing to do.
Easier to analyze.
Real systems almost always work that way, even
with weak hash functions!
Can Theory explain strong performance of weak
hash functions?

25
Recent Work

A new explanation (joint work with Salil Vadhan)
Choosing a hash function from a pairwise
independent family is enough if data has
sufficient entropy.
Randomness of hash function and data combine.
Behavior matches truly random hash function with
high probability.
Techniques based on theory of randomness
extraction.
Extensions of Leftover Hash Lemma.

26
What Functionality?

Hash tables should be a basic primitive.
Best hash tables cuckoo hashing.
Worst case constant lookup time.
Simple to build, design.
How can we make them even better?
Move cuckoo hashing from theory to practice!

27
Cuckoo Hashing Pagh,Rodler

Basic scheme each element gets two possible
locations.
To insert x, check both locations for x. If one
is empty, insert.
If both are full, x kicks out an old element y.
Then y moves to its other location.
If that location is full, y kicks out z, and so
on, until an empty slot is found.

28
Cuckoo Hashing Examples
A
B
C
E
D
29
Cuckoo Hashing Examples
A
B
C
F
E
D
30
Cuckoo Hashing Examples
A
B
F
C
E
D
31
Cuckoo Hashing Examples
A
B
F
C
G
E
D
32
Cuckoo Hashing Examples
E
G
B
F
C
A
D
33
Cuckoo Hashing Examples
A
B
C
G
E
D
F
34
Cuckoo Hashing Failures

Bad case 1 inserted element runs into cycles.
Bad case 2 inserted element has very long path
before insertion completes.
Could be on a long cycle.
Bad cases occur with small probability when load
is sufficiently low, but not low enough
Theoretical solution re-hash everything if a
failure occurs.
For 2 choices, load less than 50, n elements
gives failure rate of Q(1/n) maximum insert time
O(log n).
Better space utilization and rate for more
choices, more elements per bucket.

35
Recent Work A CAM-Stash

Use a CAM (Content Addressable Memory) to stash
away elements that would cause failure.
Joint with Kirsch/Wieder.
Intuition if failures were independent,
probability that s elements cause failures goes
to Q(1/ns).
Failures not independent, but nearly so.
A stash holding a constant number of elements
greatly reduces failure probability.
Implemented as a CAM in hardware, or a cache line
in hardware/software.
Lookup requires also looking at stash.

36
Modeling Economic Principles

Joint work with Corbo, Jain, Parkes.
Exploration what models make sense for AS
connectivity.
Extending approach of Chang, Jamin, Mao,
Willinger.
Entering nodes link according to business model,
utility function.
Nodes revise their links based on new entrants.
Like the forest fire model.
Future considerations how to validate such
models.

37
Conclusion My (Biased) View

There are 5 stages of networking research.
Observe Gather data to demonstrate power law
behavior in a system.
Interpret Explain the import of this
observation in the system context.
Model Propose an underlying model for the
observed behavior of the system.
Validate Find data to validate (and if
necessary specialize or modify) the model.
Control Design ways to control and modify the
underlying behavior of the system based on the
model.
We need to focus on validation and control.
Lots of open research problems.

38
A Chance for Collaboration

The observe/interpret stages of research are
dominated by systems modeling dominated by
theory.
And need new insights, from statistics, control
theory, economics!!!
Validation and control require a strong
theoretical foundation.
Need universal ideas and methods that span
different types of systems.
Need understanding of underlying mathematical
models.
But also a large systems buy-in.
Getting/analyzing/understanding data.
Find avenues for real impact.
Good area for future systems/theory/others
collaboration and interaction.