Improve sketching of Hamming Distance with Error Correcting - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Improve sketching of Hamming Distance with Error Correcting

Description:

Due to the birthday principal: The probability that 2 Error will ... Backward compatibility. Even if you don't have the whole file you can mix functionals. ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 35
Provided by: Goog186
Category:

less

Transcript and Presenter's Notes

Title: Improve sketching of Hamming Distance with Error Correcting


1
Improve sketching of Hamming Distance with Error
Correcting
Ely Porat Bar-Ilan University Google Inc
Ohad Lipsky Bar-Ilan University Check Point Inc
December 2003
2
Problem Definition (1)
Alice
Bob
TA
TB
n
n
hamm(TA,TB)
Given k - bound on the number of mismatches
December 2003
3
Problem Definition (2)
TA
TB
n
n
S
S
SA
SB
Calculate hamm(TA,TB) given only SA,SB
Finding the mistakes
Given k - bound on the number of mismatches
December 2003
4
Motivations
  • Data Bases
  • Internet
  • Error Correcting

Router C
Router B
Router A
Router D
December 2003
5
Outline
  • Simple Solution
  • Error Correcting
  • Improved Solution
  • Improve more
  • Recursion
  • File sharing

December 2003
6
Simplest Solution - O(k2log1/?)
  • Binary Alphabet
  • Allocate k2 cells.
  • Take the input array and hash each bit to one of
    the cells.
  • In each cell remember the xor of all the values
    hash to it.

0
1
1
0
December 2003
7
Simplest Solution - O(k2log1/?)
0
1
0
0
1
1
0
0
December 2003
8
Simplest Solution - O(k2log1/?)
  • Due to the birthday principal The probability
    that 2 Error will fall to the same cell
  • log1/? - to get a probability to fail ?

0
1
1
0
December 2003
9
Alphabet
  • Denote with S the size of the alphabet.
  • We can encode each latter with its unary
    representation.
  • The only effect is that each mistake will be
    counted twice.

0 - 1000000.0 1 - 0100000.0 . S-1 -
0000000.1
0 - 1000000.0 5 - 0000010.0
December 2003
10
Error correcting - O(k2logNS)
  • Here we allocate two kind of k2 cells k2 of logS
    bits. k2 of logNS bits.

C1h(Ai)Ai
5
8
3
2
C2h(Ai)iAi
15
6
7
8
December 2003
11
Error correcting - O(k2logNS)
  • As before with probability 1/2 there wont fall
    2 Errors in the same cell.

C1h(Ai)Ai
5
8
3
2
C1h(Ai)iAi
15
6
7
8
December 2003
12
Error correcting - O(k2logNS)
  • We get from the red cells

5
5
8
3
2
C1h(Ai)Ai
5
6
3
2
3
8 - 6 5 - 3
December 2003
13
Error correcting - O(k2logNS)
  • We get from the blue cells

0
1
2
5
15
11
7
5
C2h(Ai)iAi
15
9
7
5
3
11 - 9 2(5 - 3) i2
December 2003
14
Error correcting - O(k2logNS)
  • The probability to succeed is about 1/2.
  • To lower the failer probability we will run it 3
    times.
  • We will get a list of possible mistakes each
    time.
  • Output all the mistakes that appear in at least 2
    of the 3 runs.

December 2003
15
O(klog2k) - Solution
  • The Idea is two stage hashes

k/logk
w.h.p O(logk)
Bar-Yossef, Jayram, Kumar, Sivakumar 03
December 2003
16
O(klog2k) - Solution
keep accumulated XOR
The Probability to fail is less then 1/2.
Run it 2logk times And take the max. failer
probabilty less then 1/k2
O(logk)
O(log2k)
Space O(log3k)
Bar-Yossef, Jayram, Kumar, Sivakumar 03
December 2003
17
O(klog2k) - Solution
k/logk
O(log3k)
O(log3k)
O(log3k)
O(log3k)
O(klog2k)
P(Failer) ? k/logk 1/k2 Bar-Yossef, Jayram, Kumar, Sivakumar 03
December 2003
18
O(k2logklogk) -Idea (recursion)
k/logk
Pr(F)logk/loglogk
logk/loglogk runs, take max
December 2003
19
Error Correcting O(klogNS)
Alice
Bob
TA
TB
n
n
r0r1r2
p?(N3S)
Constant Probability
December 2003
20
Error Correcting O(klogNS)
Alice
Bob
TA
TB
n
n
If we wrong w.h.p jn
December 2003
21
Error Correcting O(klogNS)
Alice
Bob
TA
TB
n
n
rj , aj - bj
December 2003
22
Error Correcting O(klogNS)
Alice
Bob
TA
TB
n
n
O(klnk)
December 2003
23
Recursion
Alice
Bob
TA
TB
n
n
ck
TA
TB
n
n
December 2003
24
Recursion
Alice
Bob
TA
TB
n
n
ck
O(klogNS)
December 2003
25
Complexity
TA
TB
n
n
S
S
SA
SB
Size O(klogNS) Computing sketch O(nlogk) Comp
aring sketches O(klogk)
December 2003
26
O(klogk) -Solution
  • We can just encode in unary and hash the input to
    k3 cells and then run the O(klogNS)O(klogk)
    algorithm.

December 2003
27
Reed-Solomon Codes
We manage to develop a deterministic algorithm
based on that. But the encoding and the decoding
is slower.
Amir, Farach 95Feigenbaum, Ishai, Malkin,
Nissim, Strauss, Wright 01Bar-Yossef, Jayram,
Kumar, Sivakumar 03
Efremenko, Porat, Rothschild 06Efremenko, Porat
07
28
File Sharing
Napster
source
n
Source need to stay until someone will have the
whole file. (and willing to stay)
There is bottleneck at the end.
29
File Sharing
emule/kazaa/torrent
source
n
The source has to send nlnn blocks before
disconnecting.
Sometimes there are some bottlenecks
30
Improved File Sharing - Ver 1
a0a1a2.an-1
source
n
n6
31
Improved File Sharing - Ver 1
n6
Each client that got n points can recreate the
file
There is no more nlnn
Almost no bottlenecks
32
Improved File Sharing - Ver 2
a0a1a2.an-1
source
n
Send linear equations on the file.
33
Improved File Sharing - Ver 2
a0a1a2.an-1
source
n
  • Problems
  • 1. Heavy to encode each packet we need to go over
    all the file.
  • 2. Very heavy to decode O(n2) block operation
    O(n3) fields operations.
  • Facts
  • 1. If you get n(1/2-?) random combination of two
    blocks
  • you wont have dependents w.h.p.
  • 2. If you have d - pairs combinations you can
    easilly reduce your system
  • to n-d variables.

Solution Use sparse functionals
34
Improved File Sharing - Ver 2
a0a1a2.an-1
source
n
  • Futures
  • Backward compatibility.
  • Even if you dont have the whole file you can mix
    functionals.
Write a Comment
User Comments (0)
About PowerShow.com