Timeout Bloom Filter: A New Sampling Method for Recording More Flows - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Timeout Bloom Filter: A New Sampling Method for Recording More Flows

Description:

set all the d timestamps to t. Why TBF? ( Reasons) A flow F, with r packets: Pi (1i r) ... memory, e.g m=64K, d=3, 2bytes for a timestamp, only 65536 3 2=384KB ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:5.0/5.0
Slides: 17
Provided by: kon89
Category:

less

Transcript and Presenter's Notes

Title: Timeout Bloom Filter: A New Sampling Method for Recording More Flows


1
Time-out Bloom Filter A New Sampling Method for
Recording More Flows
  • GWN2005
  • Shijin Kong,
  • TaoHe, Xiaoxin Shao, Xing Li
  • Dept. of Electronic Engineering, Tsinghua Univ.,
    P.R.China
  • China Education and Research Network (CERNET)

2
Motivation
  • Sampling for network measurement
  • Choose partial traffic to infer the
    characteristics of whole traffic
  • Reduce memory consumptions and counts of flow
    look-ups
  • Save bandwidth for collection
  • Random sampling is widely deployed 1 from every
    N packets is randomly sampled.
  • Random sampling causes great loss of flows
  • A flow is lost if none of its packets is sampled
  • A flow with fewer than N packets is easier to be
    lost than longer flows
  • Flow information is very important for traffic
    identification, attack detection, network
    deployment characterizing.

3
Problem Definition
  • How to record more flows while keeping a
    specified volume of sampled data
  • Develop a new sampling method to alleviate the
    flow loss flaw
  • A flow is a set of packets with same flow keys
    values on some fields of packet headers
  • The information of a flow is sampled/recorded if
    any of its packets is sampled

4
Basic Thoughts of Solution
  • To record more flows than random sampling
  • In random sampling, a shorter flow is more
    probable to be lost
  • Increase the probability of shorter flows to be
    sampled
  • The number of sampled packets of shorter flows
    increases
  • To keep the same volume of sampled data as random
    sampling
  • Decrease the number of sampled packets of longer
    flows
  • Less biased sampling towards long flows

5
Related Work
  • Sampling methods
  • Cisco Netflow Implementation of random sampling
    with fixed sampling period N
  • Duffield et al s work (2001) Smart sampling,
    long flows have more packets to be sampled
  • Estan et al s work (2004) Adaptive Netflow,
    dynamically change sampling period N
  • Hash-based Filters
  • Bloom s study (1970) Basic Bloom Filter, query
    with tolerable errors
  • Estan et al s work (2001) Multistage Filter,
    detect heavy hitters
  • Kumar et al s work (2003) Space-Code Bloom
    Filter, flow size estimation
  • Kompella et al s work (2004) Partial Completion
    Filter, attacks detection

6
Solution Overview
  • Time-out Bloom Filter (TBF) A data structure in
    memory with m counters
  • Each counter i stores a timestamp ti
  • perfect random, d hash functions
  • A time-out value t0

7
Per-packet Pipeline Summary
  • At the beginning ts, all m buckets are set to
    ts-t0
  • When a packet with flow key c comes at time t
  • if each of the d timestamps thi(c), 1i d
    follows
  • t - thi(c) gt t0 (counter i gets timeout)
  • sample the packet
  • endif
  • set all the d timestamps to t

8
Why TBF? (Reasons)
  • A flow F, with r packets Pi (1i r)
  • First packet P1
  • Rest packets Pi (2i r)
  • Inter-packet interval (INI) of Pi (2i r) is the
    interval between the arrival time of Pi-1 and
    that of Pi
  • P1 does not have an INI
  • Two reasons
  • For each flow, TBF guarantees a high probability
    for P1 to be sampled
  • The probability for Pi (2i r) to be sampled is
    smaller if r is greater

9
Sample First Packets
  • The first packet P1 arrives at t
  • L number of flows in the previous t0 time
  • Probabilities
  • A bucket gets timeout p0(1 - 1/m)Ld
  • A bucket does not get timeout 1 - p0
  • All d bucket does not get timeout (first packet
    sampled)
  • ps(1 - p0)d (1-(1 - 1/m)Ld)d
  • High ps modulate m and d
  • Larger m greater ps
  • m is constrained by memory find an optimal d

10
Sample Rest Packets
  • Case 1 INI of Pi (2i d) gt t0
  • Same as P1
  • Sampled with a probability ps
  • Case 2 INI of Pi (2i d) t0
  • All d counters are updated by Pi-1 within t0 (not
    time-out)
  • Definitely not sampled
  • The proportion of rest packets with INI gt t0
    the proportion of sampled rest packets

11
Correlation by Test
  • Correlation between flow length and INIs
  • Shorter flow, greater average packet interval
  • Fixed t0 longer flow, fewer proportion of
    sampled rest packets Less biased sampling
    towards long flows

12
Comparison
  • 1 hour, 681,268,937 packets captured from a
    gigabytes link, one of the outlets of THUNET
  • Random sampling, N1, 2, 5, 10, 20, 50, 100, 200,
    500, 1000
  • TBF sampling
  • t00.01s, 0.02, 0.05s, 0.1s, 0.2s, 0.5s, 1.0s,
    2.0s, 5.0s, 10s
  • Fixed m65,536, d1,2,3,4
  • Fixed d3, m4,096, 16,384, 65,536

13
Experimental Results
  • Same volume of sampled data (sampling rate)
  • The number of sampled flows TBFgtrandom sampling
  • sensitivity m gt d

14
Memory and Speed
  • For practical implementation, although TBF
    sampling is a little more complicated
  • Memory
  • TBF consume small memory, e.g m64K, d3, 2bytes
    for a timestamp, only 6553632384KB
  • Can be put into on-chip systems (FPGA, etc)
  • Speed
  • Hash functions based on AND and OR operations,
    fast to execute
  • Put in fast memory (SRAM), per-packet pipeline is
    fast

15
Summary
  • Conclusions
  • Each flow in TBF sampling has a great probability
    to be sampled
  • Longer flows are less biased in TBF sampling than
    in random sampling
  • TBF sampling records information of several times
    more flows than random sampling
  • Future work
  • Integrate with measurement system based on Intel
    IXP series network processors

16
Questions and Comments?
  • Thank you!
Write a Comment
User Comments (0)
About PowerShow.com