SpaceCode Bloom Filter for Efficient PerFlow Traffic Measurement - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

SpaceCode Bloom Filter for Efficient PerFlow Traffic Measurement

Description:

Space-Code Bloom Filter for Efficient Per-Flow Traffic Measurement ... This paper aims to investigate highly efficient algorithms and data structures ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 24
Provided by: ghe
Category:

less

Transcript and Presenter's Notes

Title: SpaceCode Bloom Filter for Efficient PerFlow Traffic Measurement


1
Space-Code Bloom Filter for Efficient Per-Flow
Traffic Measurement
  • Abhishek Kumar, Jun Xu, Jia Wang, Oliver
    Spatschek, Li Li
  • Presented by guanghui He

2
Outline
  • Objective
  • A novel data structure Space Code Bloom Filter
    (SCBF)
  • Maximum likelihood estimation
  • Performance evaluation
  • Conclusion

3
Objective
  • Per-flow traffic measurement was done in two
    ways
  • Random sampling which is inaccurate
  • Identify elephants cannot provide information of
    the mice which occupy majority of the network
    flows
  • This paper aims to investigate highly efficient
    algorithms and data structures to facilitate
    per-flow measurement on very high-speed links.

4
Solution proposed
  • A naïve solution is to maintain per-flow counters
    that are updated upon every packet arrival. But
    this solution wont scale well.
  • Proposed method tries to keep track of the
    approximate number of packets in each flow
    regardless of its size SCBF.

5
Space-Code Bloom Filter (SCBF)
  • Traditional Bloom Filter consists of
  • A set
  • Described by an array A of m bits.
  • Whose values are determined by k independent hash
    functions with range
  • Given x to be inserted in S, the bits
  • is set to 1.
  • To query for an element y, check bits
  • , if all these bits
    are 1, return yes.

6
SCBF (cont.)
  • In SCBF, there are l groups of hash functions
  • ,,
  • During insertion, a group of hush function
  • is randomly
    chosen, and the bits
    are set to 1.
  • When query for the number of occurrences of
    element of y, count the number of groups that y
    has matched. And y matches group
  • if all the bits
    are 1.

7
SCBF (cont.)
  • For a flow y, based on the number of groups that
    y matches, , we can estimate the
    multiplicity of y in the multiset.

8
Multi-Resolution Space-Code Bloom Filter
  • Problems with SCBF
  • The distribution of Internet flow-size is
    heavy-tailed
  • After copies of x are inserted, all l
    groups are used at least once with high
    probability, which equals
  • An SCBF can with l groups cannot distinguish
    between multiplicities that are larger than
  • Making l large wont solve the problem, so the
    Multi-Resolution SCBF (MRSCBF) comes into play.

9
MRSCBF (cont.)
  • An MRSCBF uses multiple SCBFs operating at
    different resolutions so as to cover the entire
    range of multiplicities.
  • When inserting x, it will result in an insertion
    into each SCBF I with a probability .
  • With r SCBFs, assume ,
    and higher corresponds to higher
    resolution, vice verse.
  • In query, count the number of groups that x
    matches in filters 1,2, , r.

10
MRSCBF (cont.)
11
MRSCBF (cont.)
  • In setting , a geometric progression is
    adopted, i.e,

12
MRSCBF (cont.)
  • Intuitively, short flows can be inserted into
    filters with higher resolution, and long flows
    can be inserted into filters with lower
    resolution, so that for each possible flow size,
    one of the filters will have a resolution that
    measures its count with reasonable accuracy.

13
Estimation
  • Two estimation mechanisms
  • Maximum Likelihood Estimation (MLE)
  • Mean Value Estimation (MVE)
  • Both estimators can be implemented as a simple
    lookup into an estimate table precomputed for all
    possible values of

14
Maximum Likelihood Estimation
  • MLE with observation from one SCBF
  • Let be the set of groups that matched by
    an element x in SCBF i. We would like to find
  • However a priori distribution of F is necessary
    to compute
  • Given a uniform priori distribution, we have

15
MLE (cont.)
  • So we have
  • With , p be the sampling probability
    and
  • be the fraction of bits that are set to 1 in
    the MRSCBF, equals

16
MLE (cont.)
  • MLE with observations from multiple SCBFs in
    MRSCBF
  • Let be the set of
    groups that matched by the element x in SCBF
    1,2,,r. Due to the independence of the hash
    functions used in SCBFs,

17
Mean Value Estimation
  • MLE for a single SCBF let f be the multiplicity
    of an element x, then the number of positives
    (number of matched groups) we observe from the
    SCBF is clearly a random variable that is a
    function of f (not a function of x with good hash
    functions).
  • Denote this random variable as , and
  • exists since g(f) is monotonically
    increasing

18
MVE (cont.)
  • MVE works this way, given the observation
  • the estimate is the value of f that on the
    average produces positives, i.e,
  • The function g(f) can be proved to be

19
Performance Evaluationtheoretical accuracy
20
Performance evaluation (cont.)
21
Performance evaluation (cont.)
22
Performance Evaluation (cont.)
23
Conclusion
  • SCBF performs well in estimating the flow size
    without maintaining per-flow state.
  • Computational complexity of the estimation
    procedures seems to be a drawback.
Write a Comment
User Comments (0)
About PowerShow.com