Toward Validation and Control of Network Models - PowerPoint PPT Presentation

About This Presentation
Title:

Toward Validation and Control of Network Models

Description:

My (biased) opinion: The bar should now be very high for observation/interpretation. ... My (biased) opinion: this is one useful approach; but not the end of ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 40
Provided by: mich298
Category:

less

Transcript and Presenter's Notes

Title: Toward Validation and Control of Network Models


1
Toward Validation and Control of Network Models
  • Michael Mitzenmacher
  • Harvard University

2
Internet Mathematics
Articles Related to This Talk
The Future of Power Law Research
A Brief History of Generative Models for Power
Law and Lognormal Distributions
3
Motivation General
  • Network Science and Engineering is emerging as
    its own (sub)field.
  • NSF cross-cutting area starting this year.
  • Courses Cornell (Easley/Kleinberg), Kearns (U
    Penn), many others.
  • For undergrads, not just grads!
  • In popular culture books like Linked by
    Barabasi or Six Degrees by Watts.
  • Other sciences Economics, biology, physics,
    ecology, linguistics, etc.
  • What has been and what should be the research
    agenda?

4
My (Biased) View
  • The 5 stages of networking research.
  • Observe Gather data to demonstrate a behavior
    in a system. (Example power law behavior.)
  • Interpret Explain the importance of this
    observation in the system context.
  • Model Propose an underlying model for the
    observed behavior of the system.
  • Validate Find data to validate (and if
    necessary specialize or modify) the model.
  • Control Design ways to control and modify the
    underlying behavior of the system based on the
    model.

5
My (Biased) View
  • In networks, we have spent a lot of time
    observing and interpreting behaviors.
  • We are currently very active in modeling.
  • Many, many possible models.
  • Perhaps easiest to write papers about.
  • We need to now put much more focus on validation
    and control.
  • Have been moving in this direction.
  • And these are specific areas where computer
    science has much to contribute!

6
Models
  • After observation, the natural step is to
    explain/model the behavior.
  • Outcome lots of modeling papers.
  • And many models rediscovered.
  • Example power laws
  • Lots of history

7
History
  • In 1990s, the abundance of observed power laws
    in networks surprised the community.
  • Perhaps they shouldnt have power laws appear
    frequently throughout the sciences.
  • Pareto income distribution, 1897
  • Zipf-Auerbach city sizes, 1913/1940s
  • Zipf-Estouf word frequency, 1916/1940s
  • Lotka bibliometrics, 1926
  • Yule species and genera, 1924.
  • Mandelbrot economics/information theory, 1950s
  • Observation/interpretation were/are key to
    initial understanding.
  • My claim but now the mere existence of power
    laws should not be surprising, or necessarily
    even noteworthy.
  • My (biased) opinion The bar should now be very
    high for observation/interpretation.

8
So Many Models
  • Preferential Attachment
  • Optimization (HOT)
  • Monkeys typing randomly (scaling)
  • Multiplicative processes
  • Kronecker graphs
  • Forest fire model (densification)

9
What Makes a Good Model
  • New variations coming up all of the time.
  • Question What makes a new network model
    sufficiently interesting to merit attention
    and/or publication?
  • Strong connection to an observed process.
  • Many models claim this, but few demonstrate it
    convincingly.
  • Theory perspective significant new mathematical
    insight or sophistication.
  • A matter of taste?
  • My (biased) opinion the bar should start being
    raised on model papers.

10
Validation The Current Stage
  • We now have so many models.
  • It is important to know the right model, to
    extrapolate and control future behavior.
  • Given a proposed underlying model, we need tools
    to help us validate it.
  • We appear to be entering the validation stage of
    research. BUT the first steps have focused on
    invalidation rather than validation.

11
Examples Invalidation
  • Lakhina, Byers, Crovella, Xie
  • Show that observed power-law of Internet topology
    might be because of biases in traceroute
    sampling.
  • Pedarsani, Figueiredo, Grossglauser
  • Show that densification may also arise by
    sampling approaches, not necessarily intrinsic to
    network.
  • Chen, Chang, Govindan, Jamin, Shenker, Willinger
  • Show that Internet topology has characteristics
    that do not match preferential-attachment graphs.
  • Suggest an alternative mechanism.
  • But does this alternative match all
    characteristics, or are we still missing some?

12
My (Biased) View
  • Invalidation is an important part of the process!
    BUT it is inherently different than validating a
    model.
  • Validating seems much harder.
  • Indeed, it is arguable what constitutes a
    validation.
  • Question what should it mean to say
    This model is consistent with observed data.

13
An Alternative View
  • There is no right model.
  • A model is the best until some other model comes
    along and proves better.
  • Greedy refinement via invalidation in model
    space.
  • Statistical techniques compare likelihood ratios
    for various models.
  • My (biased) opinion this is one useful
    approach but not the end of the question.
  • Need methods other than comparison for confirming
    validity of a model.

14
Time-Series/Trace Analysis
  • Many models posit some sort of actions.
  • New pages linking to pages in the Web.
  • New routers joining the network.
  • New files appearing in a file system.
  • A validation approach gather traces and see if
    the traces suitably match the model.
  • Trace gathering can be a challenging systems
    problem.
  • Check model match requires using appropriate
    statistical techniques and tests.
  • May lead to new, improved, better justified
    models.

15
Sampling and Trace Analysis
  • Often, cannot record all actions.
  • Internet is too big!
  • Sampling
  • Global snapshots of entire system at various
    times.
  • Local record actions of sample agents in a
    system.
  • Examples
  • Snapshots of file systems full systems vs.
    actions of individual users.
  • Router topology Internet maps vs. changes at
    subset of routers.
  • Question how much/what kind of sampling is
    sufficient to validate a model appropriately?
  • Does this differ among models?

16
To Control
  • In many systems, intervention can impact the
    outcome.
  • Maybe not for earthquakes, but for computer
    networks!
  • Typical setting individual agents acting in
    their own selfish interest. Agents can be given
    incentives to change behavior.
  • General problem given a good model, determine
    how to change system behavior to optimize a
    global performance function.
  • Distributed algorithmic mechanism design.
  • Mix of economics/game theory and computer science.

17
Possible Control Approaches
  • Adding constraints local or global
  • Example total space in a file system.
  • Example preferential attachment but links
    limited by an underlying metric.
  • Add incentives or costs
  • Example charges for exceeding soft disk quotas.
  • Example payments for certain AS level
    connections.
  • Limiting information
  • Impact decisions by not letting everyone have
    true view of the system.

18
My Related Work Hash Algorithms
  • On the Internet, we need a measurement and
    monitoring infrastructure, for validation and
    control.
  • Approximate is fine speed is key.
  • Must be general, multi-purpose.
  • Must allow data aggregation.
  • Solution hash-based architecture.
  • Eventual goal every router has a programmable
    hash engine.

19
Vision
  • Three-pronged research data.
  • Low Efficient hardware implementations of
    relevant algorithms and data structures.
  • Medium New, improved data structures and
    algorithms for old and new applications.
  • High Distributed infrastructure supporting
    monitoring and measurement schemes.

20
The High-Level Pitch
  • Lots of hash-based schemes being designed for
    approximate measurement/monitoring tasks.
  • But not built into the system to begin with.
  • Want a flexible router architecture that allows
  • New methods to be easily added.
  • Distributed cooperation using such schemes.

21
What We Need
On-Chip Memory
Off-Chip Memory
CAM(s)
Memory
Hashing Computation Unit
Programming Language
Unit for Other Computation
Computation
Control System
Communication Architecture
Communication Control
22
Lots of Design Questions
  • How much space for various memory levels? How to
    dynamically divide memory among competing
    applications?
  • What hash functions should be included? Openness
    to new hash functions?
  • What programming language and functionality?
  • What communication infrastructure?
  • Security?
  • And so on

23
Which Hash Functions?
  • Theorists
  • Want analyzable hash functions.
  • Dislike standard assumption of perfectly random
    hash functions.
  • Hard to prove things about actual performance.
  • Practitioners
  • Want easy implementation, speed, small space.
  • Want simple analysis (back-of-the-envelope).
  • Will accept simulated results under right
    settings.

24
Why Do Weak Hash Functions Work So Well?
  • In reality, assuming perfectly random hash
    functions seems to be the right thing to do.
  • Easier to analyze.
  • Real systems almost always work that way, even
    with weak hash functions!
  • Can Theory explain strong performance of weak
    hash functions?

25
Recent Work
  • A new explanation (joint work with Salil Vadhan)
  • Choosing a hash function from a pairwise
    independent family is enough if data has
    sufficient entropy.
  • Randomness of hash function and data combine.
  • Behavior matches truly random hash function with
    high probability.
  • Techniques based on theory of randomness
    extraction.
  • Extensions of Leftover Hash Lemma.

26
What Functionality?
  • Hash tables should be a basic primitive.
  • Best hash tables cuckoo hashing.
  • Worst case constant lookup time.
  • Simple to build, design.
  • How can we make them even better?
  • Move cuckoo hashing from theory to practice!

27
Cuckoo Hashing Pagh,Rodler
  • Basic scheme each element gets two possible
    locations.
  • To insert x, check both locations for x. If one
    is empty, insert.
  • If both are full, x kicks out an old element y.
    Then y moves to its other location.
  • If that location is full, y kicks out z, and so
    on, until an empty slot is found.

28
Cuckoo Hashing Examples
A
B
C
E
D
29
Cuckoo Hashing Examples
A
B
C
F
E
D
30
Cuckoo Hashing Examples
A
B
F
C
E
D
31
Cuckoo Hashing Examples
A
B
F
C
G
E
D
32
Cuckoo Hashing Examples
E
G
B
F
C
A
D
33
Cuckoo Hashing Examples
A
B
C
G
E
D
F
34
Cuckoo Hashing Failures
  • Bad case 1 inserted element runs into cycles.
  • Bad case 2 inserted element has very long path
    before insertion completes.
  • Could be on a long cycle.
  • Bad cases occur with small probability when load
    is sufficiently low, but not low enough
  • Theoretical solution re-hash everything if a
    failure occurs.
  • For 2 choices, load less than 50, n elements
    gives failure rate of Q(1/n) maximum insert time
    O(log n).
  • Better space utilization and rate for more
    choices, more elements per bucket.

35
Recent Work A CAM-Stash
  • Use a CAM (Content Addressable Memory) to stash
    away elements that would cause failure.
  • Joint with Kirsch/Wieder.
  • Intuition if failures were independent,
    probability that s elements cause failures goes
    to Q(1/ns).
  • Failures not independent, but nearly so.
  • A stash holding a constant number of elements
    greatly reduces failure probability.
  • Implemented as a CAM in hardware, or a cache line
    in hardware/software.
  • Lookup requires also looking at stash.

36
Modeling Economic Principles
  • Joint work with Corbo, Jain, Parkes.
  • Exploration what models make sense for AS
    connectivity.
  • Extending approach of Chang, Jamin, Mao,
    Willinger.
  • Entering nodes link according to business model,
    utility function.
  • Nodes revise their links based on new entrants.
  • Like the forest fire model.
  • Future considerations how to validate such
    models.

37
Conclusion My (Biased) View
  • There are 5 stages of networking research.
  • Observe Gather data to demonstrate power law
    behavior in a system.
  • Interpret Explain the import of this
    observation in the system context.
  • Model Propose an underlying model for the
    observed behavior of the system.
  • Validate Find data to validate (and if
    necessary specialize or modify) the model.
  • Control Design ways to control and modify the
    underlying behavior of the system based on the
    model.
  • We need to focus on validation and control.
  • Lots of open research problems.

38
A Chance for Collaboration
  • The observe/interpret stages of research are
    dominated by systems modeling dominated by
    theory.
  • And need new insights, from statistics, control
    theory, economics!!!
  • Validation and control require a strong
    theoretical foundation.
  • Need universal ideas and methods that span
    different types of systems.
  • Need understanding of underlying mathematical
    models.
  • But also a large systems buy-in.
  • Getting/analyzing/understanding data.
  • Find avenues for real impact.
  • Good area for future systems/theory/others
    collaboration and interaction.

39
More About Me
  • Website www .eecs.harvard.edu/michaelm
  • Links to papers
  • Link to book
  • Link to blog mybiasedcoin
  • mybiasedcoin.blogspot.com
Write a Comment
User Comments (0)
About PowerShow.com