The Bloom Paradox - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

The Bloom Paradox

Description:

The Bloom Paradox Ori Rottenstreich Joint work with Yossi Kanizo and Isaac Keslassy Technion, Israel – PowerPoint PPT presentation

Number of Views:115
Avg rating:3.0/5.0
Slides: 26
Provided by: OriRo
Category:
Tags: bloom | paradox

less

Transcript and Presenter's Notes

Title: The Bloom Paradox


1
The Bloom Paradox
Ori Rottenstreich Joint work with Yossi Kanizo
and Isaac Keslassy Technion, Israel

2
Problem Definition
x
y
user
  • Requirement A data structure in user with fast
    answer to
  • Solutions
  • O(n) Searching in a list
  • O(log(n)) Searching in a sorted list
  • O(1) But with false positives / negatives

x
M central memory with all elements
y
cost 10
cost 1
S local cache
cost 10
v
u
z
y
x
z
x
y
user
2
3
Two Possible Errors
  • False Positive but the data
    structure answers
  • Results in a redundant access to the local cache.
  • Additional cost of 1.
  • False Negative but the data structure
    answers
  • Results in an expensive access to the central
    memory instead of the local cache.
  • Additional cost of 10-19.

y
x
4
Bloom Filters (Bloom, 1970)
  • Initialization Array of zero bits.
  • Insertion Each of the elements is hashed
    times, the corresponding bits are set.
  • Query Hashing the element, checking that all
    bits are set.
  • False positive rate (probability) of
  • No false negatives

0
0
0
0
0
0
0
0
0
0
0
0
y
x
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
z
x
w
5
Bloom Filters are Widely Used
  • Cache/Memory Framework
  • Packet Classification
  • Intrusion Detection
  • Routing
  • Accounting
  • Beyond networking Spell Checking, DNA
    Classification
  • Can be found in
  • Google's web browser Chrome
  • Google's database system BigTable
  • Facebook's distributed storage system Cassandra
  • Mellanox's IB Switch System

6
Outline
  • Introduction to Bloom Filters
  • The Bloom Paradox
  • The Variable-Increment Counting Bloom Filter

7
The Bloom Paradox
Sometimes, it is better to disregard the Bloom
filter results, and in fact not to even query it,
thus making the Bloom filter useless.
8
Example
Bloom filter
  • Parameters
  • Extreme case without locality All elements with
    equal probability of
  • belonging to the cache.
  • Toy example

9
The Bloom Paradox
  • Parameters
  • Let be the set of elements that the Bloom
    filter indicates are in
  • In particular, no false negatives ?
  • Intuition

Bloom filter
B Bloom filter
user

M central memory with all elements
cost 10
cost 1
S local cache
cost 10
v
u
z
y
x
z
x
. .
10
The Bloom Paradox
  • Parameters
  • Let be the set of elements that the Bloom
    filter indicates are in
  • In particular, no false negatives ?
  • Surprise

B Bloom filter

M central memory with all elements
cost 10
cost 1
S local cache
cost 10
v
u
z
y
x
z
x
. .
11
The Bloom Paradox
  • Parameters
  • Let be the set of elements that the Bloom
    filter indicates are in
  • In particular, no false negatives ?
  • Surprise
  • The Bloom filter indicates the membership of
  • elements. Only
    of them are indeed in .

B Bloom filter

. .
12
The Bloom Paradox
  • When the Bloom filter states that ,
    it is wrong with probability
  • Average cost if we listen to the Bloom filter
  • Average cost if we dont
  • The Bloom filter is useless!

?
Dont listen to the Bloom filter


13
Outline
  • Introduction to Bloom Filters
  • The Bloom Paradox
  • The Variable-Increment Counting Bloom Filter

14
Counting Bloom Filters (CBFs)
  • Bloom filters do not support deletions of
    elements. Simply resetting bits might cause false
    negatives.
  • The solution Counting Bloom filters - Storing
    array of counters instead of bits.
  • Insertion Incrementing counters by one.
  • Deletion Decrementing counters by one.
  • Query Checking that counters are positive.
  • The same false positive probability.
  • Require too much memory, e.g. 57 bits per element
    for .

y
x
1
1
1
1
1
1
0
0
0
0
0
0
1
0
1
0
0
0
y
x
1
1
1
1
1
1
0
1
0
2
0
0
1
0
1
0
0
1
15
Intuition for Variable Increments
  • Upon query, we should consider the exact values
    of the counters and not just their positiveness
  • Can we design a deterministic scheme that
    exploits the exact values of the counters?
  • Idea Use variable increments to encode the
    element identity

0
3
8
1
0
5
2
0
1
0
1
2
z
y
14
16
Architecture
  • Each hash entry contains a pair of counters
  • , fixed increments ? number of elements in
    entry (as in CBF)
  • , variable increments ? weighted sum of
    elements
  • weights from a pre-determined set
  • We use two sets of hash functions
  • The first set uses
    hash functions with range
  • , i.e. it points to the set of
    entries.
  • The second set uses
    hash functions with range , i.e.
    it points to the set
    .

2
7
8
9
4
5
6
1
3
5
3
3
4
2
3
0
3
c1
2
34
9
6
26
26
17
21
0
25
c2
15
17
Insertion
  • Insertion
  • At each entry , the two counters are
    updated as follows.
  • from the
    set
  • Example 1

2
7
8
9
4
5
6
1
3
5
3
3
4
2
3
0
3
c1
2
3 4
0 1
3 4
4 5
34
9
13
26
17
17
21
0
25
c2
25 29
30 43
30 34
0 8
4
8
4
13
x
z
16
18
Query
  • Query ( with
    )
  • We ask whether
  • 17 can be a sum of 2 elements from the set
    including 4
  • 30 can be a sum of 3 elements from the set
    including 8
  • No
  • How should we pick the set of variable
    increments?

y
8?
4?
y?
  • We should use Sequences!

17
19
Bh Sequences
  • Definition 1
  • Let be a
    sequence of positive integers.
  • Then, is a sequence iff all the sums
  • with are
    distinct.
  • Example 2

  • All the sums of elements of are
    distinct
  • Therefore, is a sequence.
  • sequences are widely used in
    error-correcting codes.

20
The Bh-CBF Scheme Query
  • Example 3
    is a sequence
  • Since , then the Bh-CBF can
    determine that

4?
19
21
The Bh-CBF Scheme Operations
The Bh-CBF Scheme Query
  • Example 3
    is a sequence
  • Here, and then necessarily
  • Since , the Bh-CBF can
    determine that

4?
8?
4?
y?
19
22
The Bh-CBF Scheme Operations
The Bh-CBF Scheme Query
  • Example 3
    is a sequence
  • Since , the
    Bh-CBF cannot exclude that

4?
13?
4?
8?
4?
z?
y?
19
23
Experimental Results
  • Internet trace (equinix-chicago) with real hash
    functions.
  • For the Bh-CBF,
    (with ).

20
24
Experimental Results
  • Internet trace (equinix-chicago) with real hash
    functions.
  • For the Bh-CBF,
    (with ).
  • For the VI-CBF,
    and . .

20
25
Concluding Remarks
  • The Bloom Paradox
  • Discovery of the Bloom paradox
  • Importance of the a priori membership probability
  • The Variable-Increment Counting Bloom Filter
  • Can extend many variants of the counting Bloom
    filter
  • First time sequences are presented in
    networking applications

21
26
Thank You
Write a Comment
User Comments (0)
About PowerShow.com