Noise Tolerant Learning

About This Presentation

Title:

Noise Tolerant Learning

Description:

'Noise-tolerant learning, the parity problem, and the ... all I can see now are blondes, brunettes, redheads...' - Cipher ('The matrix') void appendix ... – PowerPoint PPT presentation

Number of Views:79

Avg rating:3.0/5.0

Slides: 41

Provided by: Adi

Category:

more less

Transcript and Presenter's Notes

Title: Noise Tolerant Learning

1
Noise Tolerant Learning

Presented by Aviad Maizels

Based on Noise-tolerant learning, the parity
problem, and the statistical query model \
Avrim Blum, Adam Kalai and Hal Wasserman A
Generalized Birthday problem \ David
Wagner Hard-core predicates for any one way
function \ Goldreich O. and L.A.Levin Simulated
annealing and Boltzmann machines \ Emile Aarts
and Jan Korst
2
void Agenda()

do
A few sentences about Codes
The opposite problem
Learning with noise
The k-sum problem
Can we do it faster ??
Annealing
while (!understandable)

3
void fast_introduction_to_LECC()

The communication channel may disrupt the
original data

Proposed solution encode messages to give some
protection against errors.
4
void fast_introduction_to_LECC()(Continue
terminology)
Source
Encoder
Channel
msgu1u2uk
codewordx1x2xn

Linear Codes
Fixed sized block code
Additive closure
Code is tagged using two parameters (n,k)
k data size
n encoded word size

noise
5
void fast_introduction_to_LECC()(Continue
terminology)

Systematic code original data appears directly
inside the codeword.

Generating matrix (G) - a matrix s.t. multiplying
it with a message will output the encoded word.
Num of rows space dimension (k)
Every codeword can be represented as a linear
combination of Gs rows.

6
void fast_introduction_to_LECC()(Continue
terminology)

Hamming distance the number of places two
vectors differ in
Denoted by dist(x,y)
Hamming weight the number of places that differ
from zero in a vector
Denoted by wt(x)
Minimum distance of linear code minimum weight
of any non-zero vector

7
void fast_introduction_to_LECC()(Continue
terminology)
Channel
Decoder
Target
received wordx e
msg ??
error vectore1e2en

Perfect code (t)- Every vector has hamming
distance

8
void fast_introduction_to_LECC()(Continue
terminology)
...

Complete Decoding The acceptance groups around
the codewords together contains all the vectors
of length n

9
void the_opposite_problem()

Decoding linear (n,k) codes in the presence of
random noise when k O(logn) in poly(n)-time.
k O(logn) is trivial
in !(coding-theory) terms
Given a finite set of code words (examples) of
length n, their labels and a codeword ,
find\learn the label of , in the presence of
random noise, in poly(n) time.

10
void the_opposite_problem()(Continue Main idea)

Without noise
Any vector can be written as a linear combination
of previously seen examples.
Deducing the vectors label can be done in the
same way.
So All we need is to find a basis to deduce any
label of a new example.
Qs Is it the same with the presence of noise ??

11
void the_opposite_problem()(Continue Main idea)

Well No.
Summing examples actually boosts the noise
Given s examples and a noise rate of ? sum of s examples has a noise rate of
½ ½(1-2?)s
write basis vectors as a sum of small number of
examples and the new sample as a linear
combination of the above.

12
void learning_with_noise()

Concept boolean function over the input space
Concept class set of concepts
World model
There is a fixed noise rate ?
Fixed probability distribution D over the input
space
The alg. may ask for labeled example (x,l).
an unknown concept c.

13
void learning_with_noise()

Goal Find an e-approximation of c
a function h s.t. Prx?Dh(x) c(x) 1-e
Parity function defined by a corresponding
vector v?0,1n. The function is then given by
the rule

14
void learning_with_noise()(Continue
Preliminaries)

Efficiently learnable Concept class C is E.L. in
the presence of random classification noise under
distribution D if
? alg A s.t. ? e0, d0, ?0 and ? concept c?C
A produces an e-approximation of c with
probability at least 1- d when given access to
D-random examples.
A must run in time polynomial in n,1/e,1/ d and
in 1/(1/2- ?).

15
void learning_with_noise()(Continue Goal)

Well show that The length-k parity problem for
noise rate ?time and total size of examples of 2O(k/logk).
Observe the behavior of the noise when were
adding up examples

16
void learning_with_noise()(Continue Noise
behavior)
p1 appearing frequency of noisy bit. q1
appearing frequency of correct bit.
1010111
1111011
p2 appearing frequency of noisy bit. q2
appearing frequency of correct bit.

pi qi 1
Denote si pi-qi 2pi1 12qi
si?-1,1
? p3 p1q2p2q1 q3 p1p2 q1q2
? s3 p3q3 s1s2
?

17
void learning_with_noise()(Continue Idea)

Main idea Draw much more examples than needed to
find basis vectors as a sum of relatively small
number of examples.
If ?polynomially indistinguishable from random
We can repeat the process to boost reliability

18
void learning_with_noise()(Continue
Definitions)

A few more definitions
k ab
Vi - subspace of 0,1ab consisting of vectors
whose last i blocks are zeroed
i-sample set of independent vectors that are
uniformly distributed over Vi

19
void learning_with_noise()(Continue Main
construction)

Construction Given i-sample of size s, we
construct (i1)-sample of size at least s-2b in
time O(s)
Behold
i-samplex1,,xs.
Partition the xs based on the (a-i) block (well
get max 2b partitions).
For each non-empty partition, pick a random
vector, add it to the other vectors on his
partition and then discard the vector.
Result z1,,zm vectors, ms-2b where
The block (a-i-1) is zeroed out
zj are independent uniformly distributed over
Vi1

20
void learning_with_noise()(Continue Algorithm)

Algorithm (Finding the 1st bit)
Ask for a2b labeled examples
Apply construction (a-1) times to get
(a-1)-sample
There is 1-1/e chance that the vector (1,0,,0)
will be a member of the (a-1)- sample. If its
not there, well do it again with new labeled
examples (expected number of repetitions is
constant)
Note weve written (1,0,,0) as a sum of 2(a-1)
examples, causing the noise rate to boost to

21
void learning_with_noise()(Continue
Observations)

Observations
We found the first bit of our new sample using
the number of examples and computation time in
poly
We can shift all examples to determine the
remainder bits
Fixing a(1/2)logk and b2k/logk will give the
desired
for a constant noise rate ?.

22
void the_k_sum_problem()

The key to improve the above alg is to find a
better way to solve a problem similar to k-sum.
Problem Given k lists L1,,Lk of elements, drawn
uniformly and independently from 0,1n, find
x1?L1,,xk?Lk s.t.
Note a solution to the k-sum problem exists
with good probability if L1L2Lk 2n
(Similar to birthday paradox)

23
void the_k_sum_problem()(Continue Wagners
Algorithm - Definitions)

Preliminary definitions and observations
Lowl(x) the l LS bits of x
L1 xl L2 contains all pairs from L1 x L2 that
agree on the l LS bits.
If lowl (x1?x2)0 and lowl (x3?x4)0 then lowl
(x1?x2?x3?x4)0 and Prx1?x2?x3?x402l/2n
Join (xl) operation
Hash join stores one list and scans through the
other
(L1 L2) steps, O(L1L2) storage
Merge join sorts scans the two sorted lists
O(max(L1,L2)log(max(L1,L2))) time

24
void the_k_sum_problem()(Continue Wagners
Algorithm Simple case)

The 4 lists case
Extends lists until they each contains 2l
elements
Generate a new list L12 of values x1?x2 s.t.
lowl(x1?x2)0 and a new list L34 in the same way
Search for matches between L12 and L34

25
void the_k_sum_problem()(Continue Wagners
Algorithm)

Observation
Prlowl(xi?xj)01/2l when 1?i?j ?4 and xi,xj
are chosen uniformly at random
ELij(LiLj)/2l22l/2l2l
The expected number of elements common between
L12 and L34 that will yield the desired solutions
is L12L34/2n-l (l?n/3 will give us at least
1)
Complexity
O(2n/3) time and space

26
void the_k_sum_problem()(Continue Wagners
Algorithm)

Improvisations
We dont need low l bits to be zero. We can fix
them to any a (i.e. )
The value 0 in x1? ?xk0 can be replaced with a
constant c of our choice (by replacing Lk with
LkLk?c)
If kk the complexity of the k-sum problem can
be no larger than the complexity of the k-sum
problem (just pick arbitrary xk1,,xk, define
cxk1? ?xk and use k-sum alg to find a
solution for x1? ?xkc) ?
we can solve k-sum problem with complexity at
most O(2n/3) for all k?4

27
void the_k_sum_problem()(Continue Wagners
Algorithm)

Extending the 4 lists case
Create complete binary tree of depth logk.
At depth h well use
So well get an algorithm that requires
time and space
Note if k is not a power of 2 well take k to
be
- the largest power of 2 less than k, using
afterwards the list elimination trick

28
void can_we_do_it_better_?()

But Maybe theres a problem with the approach ?
How many samples do we really need to get a
solution with good probability ?
Do we even need a basis ?
Can we do it without scanning the whole space ?
Do we need the best solution ?

Yes
Yes
Klogk-log(-ln(1-e))
Yes no
Yes
no

29
void can_we_do_it_better_?()(Continue Sampling
space)

To have a solution we need k linearly independent
vectors in our sampling space S. So
Well want where e?0,1
? sampling spaceO(klogkf(e))

30
void annealing()

Physical process of heating up solid until it
melts, followed by cooling it down into a state
of perfect lattice.
Problem finding, among potentially very large
number of solutions, a solution with minimal
cost.
Note We dont even need the minimal cost
solution - just one who has a noise rate below
our threshold

31
void annealing()(Continue Combinatorial
optimization)

Some definitions
The set of solutions to the combinatorial problem
is taken as the set of states S
Note In our case
The price function is the energy ES ? R that
we minimize
The transition probability between neighboring
states depends on their energy difference and an
external temperature T

32
void annealing()(Continue Pseudo code
algorithm)

Set T to a high temperature
Choose an arbitrary initial state c
Loop
Select a neighbor c of c set ?E E(c')-E(c)
If ?E probability exp(-?E/T).
Do the 2 steps above several more times
Decrease T
Wait long enough and cross fingers(preferably
more than 2)

33
void annealing()(Continue Problems)

Problems
Not all states can yield our new sample (only the
ones containing at least one vector from
S\basis).
The probability that a capable state will yield
the zero vector is 1/2k
The probability that any 1?j?k vectors from S
will yield a solution is
Note When S?k the phrase above approaches zero

34
void annealing()(Continue Reduction)

Idea
Sample a little more than is needed SO(ck)
Assign each vector its hamming weight and sort S
by it.
Reduction
Spawning the next generation all the states
which includes a vector who has a hamming weight
? 2wt(?l)

35
void annealing()(Continue Convergence
Complexity ??)

Complexity
Where L denotes the number of steps to reach
quasi-equilibrium in each phase and ? denotes the
computation time of a transition
ln(S) denotes the number of phases to reach an
accepted solution, using polynomial-time cooling
schedule

36
Game Over
I dont even see the code anymore all I can
see now are blondes, brunettes, redheads -
Cipher (The matrix)
37
void appendix()(GL)

Theorem Suppose we have oracle access to random
process bx0,1n?0,1, so that
where the probability
is taken uniformly over internal coin tosses of
bx and all possible choices of r, and b(x,r)
denote the inner-product mod 2 of x and r.
Then, We can in time polynomial in n/? output a
list of string that contains x with probability
at least ½.

38
void appendix()(Continue GL highway)

How ??
1 way (to extract xi)
Suppose s(x)Prbx(r)b(x,r)?3/4? (hmmm??)
The probability that both bx(r)b(x,r) and
bx(r?ei)b(x,r?ei) will hold is at least
but

39
void appendix()(Continue GL better way)

2nd way
Idea Guess b(x,r) by ourselves.
Problem Need to guess polynomially many rs.
Solution Generate polynomially many rs so that
they are sufficiently random but still we can
guess them with non-negligible probability.

40
void appendix()(Continue GL better way)

Construction
Select uniformly strings in
0,1n and denote them by s1,,sl.
Guess
The probability that all guesses are correct is
assign each rj to different subsets of 1,..,l
s.t.
Note that
Try all possibilities for ?1,,?l and output a
list of 2l candidates for zi?0,1n

Write a Comment

User Comments (0)

About PowerShow.com

Noise Tolerant Learning - PowerPoint PPT Presentation

Noise Tolerant Learning

'Noise-tolerant learning, the parity problem, and the ... all I can see now are blondes, brunettes, redheads...' - Cipher ('The matrix') void appendix ... – PowerPoint PPT presentation