Parallel and Distributed Programming - PowerPoint PPT Presentation

1 / 7
About This Presentation
Title:

Parallel and Distributed Programming

Description:

Add element: feed it to k hash functions, and get k different array positions. ... of an element is. The probability of that bit is not set by any of k hash ... – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 8
Provided by: cuo64
Category:

less

Transcript and Presenter's Notes

Title: Parallel and Distributed Programming


1
Parallel and Distributed Programming
  • Tran Vinh Cuong
  • Department of Computer Science

2
Bloom filter
  • Conceived by Burton H. Bloom in 1970, is a
    space-efficient probabilistic data structure that
    is used to test whether an element is a member of
    a set.
  • False positives are possible, but false negatives
    are not.
  • Elements can be added to the set, but not
    removed.
  • The more elements are added to the set, the
    larger the probability of false positives.

3
Bloom filter algorithm
  • An empty bloom filter is a bit array of m bits,
    all set to 0.
  • There are k different hash functions.
  • Add element feed it to k hash functions, and get
    k different array positions. Set these positions
    to 1.
  • Check existence feed it to k hash functions, get
    k array positions. If any of these positions is 0
    -gt not in the set.

4
Probability of false positive
Assume hash functions select each array position
with equal probability. The probability of a
certain bit is not set to 1 by a hash function
during insertion of an element is
The probability of that bit is not set by any of
k hash functions is
After inserting r elements, the probability that
it is still 0 is
The probability that it is 1 is therefore
5
Test existence of an element that is not in the
set
The probability of all of k positions being 1,
(false positive)
This is the upper bound probability of hash
collision on the first r elements (states)
entered. And it is minimal when
m 109, r107 P(hash collision)
10-21 k89.315
6
(No Transcript)
7
Trade-off
  • In SPIN k 2, prob. 4.10-4, average coverage
    search reduces from 100 to 99
Write a Comment
User Comments (0)
About PowerShow.com