Title: Why Not Store Everything in Main Memory? Why use disks?
1First 3NN using horizontal data to classify an
unclassified sample, a ( 0 0
0 0 0
0 ).
t12 0 0 1 0 1 1 0 2
t13 0 0 1 0 1 0 0 1
t53 0 0 0 0 1 0 0 1
t15 0 0 1 0 1 0 1 2
0 1
Key a1 a2 a3 a4 a5 a6 a7 a8 a9 a10C a11 a12
a13 a14 a15 a16 a17 a18 a19 a20 t12 1 0 1 0
0 0 1 1 0 1 0 1 1 0 1 1 0 0 0
1 t13 1 0 1 0 0 0 1 1 0 1 0 1 0
0 1 0 0 0 1 1 t15 1 0 1 0 0 0 1 1
0 1 0 1 0 1 0 0 1 1 0 0 t16 1 0
1 0 0 0 1 1 0 1 1 0 1 0 1 0 0 0
1 0 t21 0 1 1 0 1 1 0 0 0 1 1 0
1 0 0 0 1 1 0 1 t27 0 1 1 0 1 1 0
0 0 1 0 0 1 1 0 0 1 1 0 0 t31 0
1 0 0 1 0 0 0 1 1 1 0 1 0 0 0 1
1 0 1 t32 0 1 0 0 1 0 0 0 1 1 0
1 1 0 1 1 0 0 0 1 t33 0 1 0 0 1 0
0 0 1 1 0 1 0 0 1 0 0 0 1 1 t35
0 1 0 0 1 0 0 0 1 1 0 1 0 1 0 0
1 1 0 0 t51 0 1 0 1 0 0 1 1 0 0
1 0 1 0 0 0 1 1 0 1 t53 0 1 0 1 0
0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 t55
0 1 0 1 0 0 1 1 0 0 0 1 0 1 0
0 1 1 0 0 t57 0 1 0 1 0 0 1 1 0 0
0 0 1 1 0 0 1 1 0 0 t61 1 0 1 0
1 0 0 0 1 0 1 0 1 0 0 0 1 1 0
1 t72 0 0 1 1 0 0 1 1 0 0 0 1 1
0 1 1 0 0 0 1 t75 0 0 1 1 0 0 1 1
0 0 0 1 0 1 0 0 1 1 0 0
2 Next C3NN using horizontal data (a second pass
is necessary to find all other voters that are at
distance? 2 from a)
Vote after 1st scan.
Key a1 a2 a3 a4 a5 a6 a7 a8 a9 a10C a11 a12
a13 a14 a15 a16 a17 a18 a19 a20 t12 1 0 1 0
0 0 1 1 0 1 0 1 1 0 1 1 0 0 0
1 t13 1 0 1 0 0 0 1 1 0 1 0 1 0
0 1 0 0 0 1 1 t15 1 0 1 0 0 0 1 1
0 1 0 1 0 1 0 0 1 1 0 0 t16 1 0
1 0 0 0 1 1 0 1 1 0 1 0 1 0 0 0
1 0 t21 0 1 1 0 1 1 0 0 0 1 1 0
1 0 0 0 1 1 0 1 t27 0 1 1 0 1 1 0
0 0 1 0 0 1 1 0 0 1 1 0 0 t31 0
1 0 0 1 0 0 0 1 1 1 0 1 0 0 0 1
1 0 1 t32 0 1 0 0 1 0 0 0 1 1 0
1 1 0 1 1 0 0 0 1 t33 0 1 0 0 1 0
0 0 1 1 0 1 0 0 1 0 0 0 1 1 t35
0 1 0 0 1 0 0 0 1 1 0 1 0 1 0 0
1 1 0 0 t51 0 1 0 1 0 0 1 1 0 0
1 0 1 0 0 0 1 1 0 1 t53 0 1 0 1 0
0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 t55
0 1 0 1 0 0 1 1 0 0 0 1 0 1 0
0 1 1 0 0 t57 0 1 0 1 0 0 1 1 0 0
0 0 1 1 0 0 1 1 0 0 t61 1 0 1 0
1 0 0 0 1 0 1 0 1 0 0 0 1 1 0
1 t72 0 0 1 1 0 0 1 1 0 0 0 1 1
0 1 1 0 0 0 1 t75 0 0 1 1 0 0 1 1
0 0 0 1 0 1 0 0 1 1 0 0
C0 wins now!
3PINE a Closed 3NN method using pTrees
(vertically data structures). 1st pTree-based
C3NN goes as follows
First let all training points at distance0
vote, then distance1, then distance2,
... until ? 3 votes are cast. For distance0
(exact matches) constructing the P-tree, Ps then
AND with PC and PC to compute the vote.
a14 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0
a13 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1
No neighbors at distance0
a12 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0
C' 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1
a11 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1
C 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1
C 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
a1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0
a4 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 1
a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0
a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0
a7 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1
a8 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1
a9 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 0
a11 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0
a12 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1
a13 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0
a14 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1
a15 1 1 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0
a16 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0
a17 0 0 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1
a18 0 0 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1
a19 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0
Ps 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
a5 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1
a20 1 1 0 0 1 0 1 1 1 0 1 1 0 0 1 1 0
key t12 t13 t15 t16 t21 t27 t31 t32 t33 t35 t51 t5
3 t55 t57 t61 t72 t75
a2 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0
a3 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1
4Construct Ptree, PS(s,1) OR
Pi Psi-ti1 sj-tj0, j?i OR
PS(si,1) ? ? S(sj,0)
pTree-based C3NN find all distance1 nbrs
i5,6,11,12,13,14
i5,6,11,12,13,14
j?5,6,11,12,13,14-i
a14 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0
a13 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1
a12 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0
C' 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1
a11 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1
C 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
a10 C 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1
PD(s,1) 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0
a20 1 1 0 0 1 0 1 1 1 0 1 1 0 0 1 1 0
key t12 t13 t15 t16 t21 t27 t31 t32 t33 t35 t51 t5
3 t55 t57 t61 t72 t75
a1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0
a2 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0
a3 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1
a4 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 1
a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0
a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0
a7 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1
a8 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1
a9 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 0
a11 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0
a12 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1
a13 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0
a14 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1
a15 1 1 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0
a16 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0
a17 0 0 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1
a18 0 0 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1
a19 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0
5pTree-based C3NN, dist2 nbrs
PINECkNN in which all training samples vote
weighted by their nearness to a (Olympic
podiums)
We now have 3 nearest nbrs. We could quite and
declare C1 winner?
We now have the C3NN set and we can declare C0
the winner!
P5,12
P5,13
P5,14
P6,11
P6,12
P6,13
P6,14
P11,12
P11,13
P11,14
P12,13
P12,14
P13,14
P5,6
P5,11
a10 C 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0
a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0
a11 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0
a12 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1
a13 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0
a14 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1
key t12 t13 t15 t16 t21 t27 t31 t32 t33 t35 t51 t5
3 t55 t57 t61 t72 t75