OCR - PowerPoint PPT Presentation

About This Presentation
Title:

OCR

Description:

ii 10 15 12 15 – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 73
Provided by: 792580
Category:
Tags: ocr | algorithm | apriori

less

Transcript and Presenter's Notes

Title: OCR


1
?????? ?????? ???????? II 1015-1215 ??????????
??????? ????? 1015-1155 ????????? ?? ??
2
  • ????????
  • ??
  • ??????
  • ??
  • ??

3
??????
??????
??????????
  • ?????
  • ????????
  • OCR

?????????
?????????? ????????? ??DB??? ??????? OLAP
?????? (??????? Data Warehouse)
  • ???????????
  • ?????????

????????? (????????)
??????????
  • ????????????
  • ???????????
  • ????
  • ?????
  • ?????
  • ????
  • ?????
  • ??????
  • ????

4
Association Rules
5
????? X ? Y ??????No ? ??????????Yes
???? Pr(X??Y) ? 5 ??? Pr(YX) ?
32 ????????????? interesting ????
Interesting Rules ??????
?? B ? C ? interesting Pr(BC) ????? Pr(B) ?
Pr(C) ?????
6
???
  • ?????????????? ?? ???????????
  • ?????????????? ??????????????

7
????????????????(???????)??? ?????????????? ????
???
????A,B,C ? ABC ??????
8
ABCD
????????????????(???????)??? ?????????????? ????
??? ??? Pr(AB) lt ?? ? Pr(ABC) lt ?? ???
B ? C ???? Pr(CB)Pr(BC)/ Pr(B) ??????????
ABC
ABD
ACD
BCD
AB
AC
BC
AD
BD
CD
A
B
C
D
f
A Pr(A)??? AB Pr(AB)lt??
9
??????????
?????????????
?????????????????????????
ACDE
10
??????????
?????????????
?????????????????????????
ACDE
Hash table
11
??????????
?????????????
?????????????????????????
ACDE
Hash table
12
??????????
?????????????
?????????????????????????
ABDE
Hash table
13
????????????
???????????????????
? ????????5???
14
????????????
ABCD
ABC
ABD
ACD
BCD
AB
AC
BC
AD
BD
CD
A
B
C
D
f
???1? ????? ?????
15
ABCD
ABC
ABD
ACD
BCD
AB
AC
BC
AD
BD
CD
???2 ???
???
A
B
C
D
f
???1? ????? ?????
16
ABCD
ABC
ABD
ACD
BCD
???
???3 ???
AB
AC
BC
AD
BD
CD
???2 ???
A
B
C
D
f
17
ABCD
???1? ?????? ??
???
ABC
ABD
ACD
BCD
???3 ???
AB
AC
BC
AD
BD
CD
???2 ???
A
B
C
D
f
18
???1? ?????? ??
ABCD
? 1 ? ? ? ?
ABC
ABD
ACD
BCD
???3 ???
AB
AC
BC
AD
BD
CD
???2? ????
? ? ?
A
B
C
D
f
???1? ????? ?????? ???
19
A priori ??? 20??4???????????????
ABCD
???1? ?????? ??
? 1 ? ? ? ?
ABC
ABD
ACD
BCD
???3? ????
? ? ?
AB
AC
BC
AD
BD
CD
???2? ????
A
B
C
D
f
???1? ????? ?????? ???
20
?????R ? ????????Yes
????
Pr(?????R )?10??????
???????????
21
?????R ? ????????Yes
????
Pr(?????R )?10??????
???????????
???80??? Pr(?????R )??
22
?????R ? ????????Yes ?? Pr(?????R) ???
?? ??????????? R
???? X ? ( Pr(?????X) , Pr(?????X,????????
Yes)
23
?????R ? ????????Yes ?? Pr(?????R) ???
?? ??????????? R
???? X ? ( Pr(?????X) , Pr(?????X,????????
Yes)
O(M log M) M number of records
24
(No Transcript)
25
Clockwise Search
26
(No Transcript)
27
Counter Clockwise Search
Clockwise, Counter Clockwise ?????????1????? ??
28
(??,????)?S ? ????????Yes
29
(??,????)?S ? ????????Yes
30
(??,????)?S ? ????????Yes
31
(??,????)?S ? ????????Yes
32
???
????
X????
?????
p( (??,????)?S ) ????S?????? ???????
????????????????????????S ????????
????????????????????????S
33
(??,????)? S ? ????????Yes
???? M, ????? n
??????? ?????????????? O(n1.5) ?????
????
???X???????????? ?????????????? X???O(n
M)?????O(n 1.5 M) ?????? n ? log M
?????????????? P NP ?????????
??
???????
????????
34
(??,????)? S ? ????????Yes
S
p( ??,????)?S, ????????Yes )
p((??,????)?S)
35
(??,????)? S ? ????????Yes
S
p( ??,????)?S, ????????Yes )
p((??,????)?S)
36
Hand Probing ???????
1?? hand probing ???? X???? O(n) ????? O(n1.5)

hand probing ????O(log M)
37
y ?x a
?? a ????
  • ????????????????
  • ???????????????

38
?????? - ????????????
????????????? ?????????????? ????????????????
10-fold Cross Validation
39
Classification
40
???
?????? ????????????????
?? ??? ???? ??? GPT GOT
????
41
???
?????? ????????????????
?? lt 125
Yes
No
Yes
No
????
Yes
?????????? ???? ??????????? ?? ????????????????
No
42
??? ??????????
?????
?????
43
??? ??????????
Quinlan??????? ???
?????
?????
n
Ent1- (p log p q log q)
Ent2
n1
n2
p
q

44
S
???????????? ???????????? ????????? Hand
Probing ??? ?????????? (????????? ????????????)
S????????
S??????
45
Ent(???XYZ??????) ? min(Ent(X),Ent(Y),ENT(Z))
?? Ent(Z)? ???????????? ?????? Branch and Bound
Search
???????O(logM)?Hand Probing
46
???????
UC Irvine, Repository of Machine Learning
databases http//www.ics.uci.edu/mlearn/MLReposit
ory.html
10-fold Cross Validation
47
??? (Regression Tree)
BPS GDM YEN TB3M TB30Y
SP500 GOLD 1.443530 0.407460
0.004980 7.02 9.31 210.88
326.00 1.446120 0.408050 0.004950 7.04
9.28 205.96 339.45


48
(No Transcript)
49
Yes
No
No
Yes
50
?
D2
D1
???
µ1
µ2
??????????????
51
A
µ
?
D2
D1
???
µ2
µ1
??????? ???

D1?D2
D1 ( µ -µ1 )2 D2 ( µ -µ2 )2
??????????
D1?D2
52
S
???????????? ???????????? ????????? Hand
Probing ??? ?????????? Branch and Bound
Search ?????O(log M)
S???? ????? ????
S??????
53
???????
http//www.cs.utoronto.ca/delve/data/datasets.htm
l
10-fold Cross Validation
??????(???????) ?????? ?????? ??? X?? ???
?? ??? add10 9792 10 0.141 0.123 0.156 0.185
abalone 4177 8 0.521 0.515 0.534 0.539 kin-8fh
8192 8 0.447 0.433 0.459 0.479 kin-8fm 8192
8 0.225 0.197 0.257 0.249 kin-8nh 8192
8 0.649 0.618 0.619 0.655 kin-8nm 8192
8 0.494 0.449 0.478 0.541 pumadyn-kin-8fh 8192
8 0.412 0.402 0.409 0.410 pumadyn-kin-8fh 8192
8 0.0604 0.0595 0.0653 0.0632 pumadyn-kin-8fh 8192
8 0.347 0.337 0.353 0.355 pumadyn-kin-8fh 8192
8 0.0530 0.0496 0.0550 0.0535
54
(No Transcript)
55
??? ???, ??,
???? (3102?) ????????

?? 102 103 ?
56
(No Transcript)
57
(No Transcript)
58
??? ???, ??, ??????, ????, ???, ...
???? (102107?) ??????, SNP, ...

?? 102 104 ?
59
Clustering
60
Expression Patterns of Genes in Various Tissues
Brain in embryo
Five brain tissues of adult mouse
61
Clustering genes via expression patterns is
promising.
  • A set of genes are expected to share common
    rolesin cellular processes.
  • Genes in the same group would be observed in
    the same tissue at the same time.
  • Their expression patterns would be similar.
  • Clustering genes by expression patterns would
    providesubstantial insight on real groups of
    genes.

62
Graphical Representation of Expression Patterns
63
Cluster of genes coding ribosomal proteins
64
Tightness of a cluster C of points
diameter max x y x and y are points
in C
  • intra-class variance (1 / C ) S x in C x
    c(C) 2
  • C number of points in C
  • c(C) centroid (mean) of C, S x in C x

65
(No Transcript)
66
Diameter Problem
  • NP-hard if k is treated as a variable
  • Approximation within a factor a of the optimal
    diameter is NP-hard for a lt 2.
  • Approximation factor of 2 is achieved by
    furthest point heuristic in O(n k)-time.
  • (n number of points)
  • O(n log k)-time version

Diameter1 Diameter2
Intra-class variance1 gtgt Intra-class variance2
67
Intra-class Variance Problem
  • O(n (d2)k1 )-time algorithm (d number of
    dimensions)
  • O(n(1/e)d )-time e-approximate 2-clustering
    algorithm

Problems of k-clustering
  • It is hard to guess an appropriate value for k,
    beforehand.
  • It is not easy to avoid generating a
    false-positive cluster of large intra-class
    variance that may contain genes of different
    functions.

Our Approach
  • Perform hierarchical clustering by e-approximate
    2-clustering.
  • Stop dividing a cluster if its intra-class
    variance is no more than a given threshold.

68
Cluster of genes coding ribosomal
proteins intra-class variance 209
Clusters of genes coding myelin intra-class
variance 128
69
?????
70
  • ??????????
  • Apriori
  • Dynamic Itemset Counting
  • ????
  • ????
  • Correlation
  • ???????
  • 2?????
  • ????? ?????
  • ?????
  • NP?? NP??
  • ?????
  • ????

71
  • ???? / ??? / ???
  • C4.5
  • CART
  • ??????
  • NP-hardness / Parallel Search
  • Optimized Ranges / Regions
  • Boosting / Bagging / Weighted Majority
  • ???????
  • NP??
  • ?????
  • ???

72
  • ??????
  • ???????
  • ???????? Google / Clever
  • ?????????
  • Clustering / Nearest Neighborhood
  • k-means / k-clustering
  • ???????
  • ????????
  • ?????????
Write a Comment
User Comments (0)
About PowerShow.com