Title: ?e??t?d?t?s? ?a? ??a??t?s? (Indexing
1?e??t?d?t?s? ?a? ??a??t?s?(Indexing Searching)
2??sa????
- ?e p????? t??p??? µp????µe ?a a?a??t?s??µe
p????f???a ap? µ?a s?????? ?e?µ???? - ? p?? ap??? ?a? e????a ???p???s?µ?? t??p?? e??a?
?a ?????µe se???a?? ??a ta ?e?µe?a t?? s???????. - ??a? ????? t??p?? e??a? ?a ?t?s??µe e?d???? d?µ??
ded?µ???? (index structures) ?ste ?a ep?ta?????µe
t? d?ad??as?a a?a??t?s??.
3??sa????
- ? ???s? de??t?? e??a? e??e?a sta s?st?µata ß?se??
ded?µ???? (p.?. Oracle, MySQL, SQLserver). - ?? de??te? ????? t?? ??a??t?ta ?a ap????pt??? ??a
µe???? tµ?µa t?? ded?µ???? t? ?p??? de?
s?µµet??e? st?? ap??t?s?. - ?a?ade??µata de??t?? ?-d??d?a, ?ata?e?µat?sµ??
(hashing).
4??sa????
5??ad??? ???d?a ??a??t?s??
6?-d??d?a
7?ata?e?µat?sµ??
0
10
20
30
40
50
60
1
2
12
22
42
3
4
S????t?s? ?ata?e?µat?sµ?? h(key) key mod 10
5
6
7
8
9
9
19
79
8?e??te? ??a ?e?µe?a
- St?? pe??pt?s? t?? ?e?µ???? ?? µ??a??sµ??
de??t?d?t?s?? d?af????? ap? t??? a?t?st?????? ??a
a???µ???. - ?e??te? ??a ?e?µe?a
- ??test?aµµ??a ???e?a (Inverted Files)
- Suffix Trees, Suffix Arrays
- ???e?a ?p???af?? (Signature Files)
9??test?aµµ??a ???e?a
- n µ??e??? ?e?µ????
- m µ???? t?? pattern
- v µ??e??? ?e????????
- M t? µ??e??? t?? d?a??s?µ?? µ??µ??
10??test?aµµ??a ???e?a
- ???a? ??a? µ??a??sµ?? de??t?d?t?s?? st?????µe?e?
se ???e?? (word-based) ? ?p???? ???s?µ?p??e?ta?
??a ap?d?t???te?? a?a??t?s?. - ??µ? a?test?aµµ???? a??e???
- ?e??????? (vocabulary)
- ??ste? eµf???s??
11?a??de??µa
?e?µe??
1 6 12 16 18 25
29 36 40 45 54
58 66 70 That house has a
garden. The garden has many flowers. The flowers
are beautiful
??test?aµµ??? ???e??
Vocabulary
Occurrences
beautiful flowers garden house
70 45, 58 18, 29 6
12??test?aµµ??a ???e?a
- ?? apa?t?se?? ????? ??a t?? ap????e?s? t??
?e???????? (vocabulary) e??a? a??et? µ?????. - S?µf??a µe t? ??µ? t?? Heap t? µ??e??? t??
?e???????? a????eta? a?????a t?? O(nß) ?p?? ß
e??a? µ?a sta?e?? µeta?? 0 ?a? 1. St?? p???? t? ß
pa???e? t?µ?? µeta?? 0.4 ?a? 0.6 - G?a pa??de??µa ??a ?e?µe?a s???????? µe??????
1GBytes ap? t? s?????? TREC-2 t? ?e???????
?ata?aµß??e? µ???? 5MBytes.
13??test?aµµ??a ???e?a
- ?? tµ?µa t?? eµfa??se?? ?ata?aµß??e? p???
pe??ss?te?? ????. - ?f?s?? ???e ???? eµfa???eta? t??????st?? µ?a f???
st? ?e?µe??, ? ep?p???? apa?t??µe??? ????? e??a?
t?? t???? t?? O(n). - ???µ? ?a? µet? t?? ap?µ?????s? t?? stopwords, t?
ep?p???? ??st?? se ???? ??µa??eta? µeta?? 30 ?a?
40 t?? µe?????? t?? ?e?µ????.
14??test?aµµ??a ???e?a
- G?a t? µe??s? t?? apa?t??µe??? ?????
???s?µ?p??e?ta? ? te????? t?? d?e????s??d?t?s??
block (block addressing). - ?? ?e?µe?? ?????eta? se tµ?µata (blocks) ?a? ??
eµfa??se?? de?????? sta a?t?st???a block ?a? ???
se ?a?a?t??e?. - ?? ??as???? µ???d?? p?? ???s?µ?p????? de??te? se
??se?? ?a?a?t???? ?a????ta? full inverted indices.
15??test?aµµ??a ???e?a
- ???s?µ?p????ta? block addressing apa?t???ta?
pointers µ????te??? µe?????? d??t? ta blocks
e??a? p??? ????te?a ap? t??? ?a?a?t??e? t??
?e?µ????. - ?p?s?? eµfa??se?? p?? a?af????ta? se ???e?? t??
?d??? block eµfa?????ta? µe t?? ?d?a a?af???. - S?????? t? ep?p???? ??st?? se ???? p?? apa?te?ta?
µe t?? te????? a?t? e??a? pe??p?? 5 t?? µe??????
t?? ?e?µ????.
16?a??de??µa
?e?µe??
Block 1 Block 2 Block 3
Block 4
That house has a garden. The garden has many
flowers. The flowers are beautiful
??test?aµµ??? ???e??
Vocabulary
Occurrences
beautiful flowers garden house
4 3 2 1
17S?????s?
18??a??t?s? se ??t. ???e??
- ??a t?p??? µ???d?? a?a??t?s?? se a?test?aµµ???
a??e?? a??????e? ta pa?a??t? ß?µata - ??a??t?s? ?e???????? ?? ???e?? p??
p??sd???????ta? st? e??t?µa ap?µ??????ta? ?a?
a?a??t???ta? st? ?e???????. - ????t?s? ?µfa??se?? p??sd???????ta? ??
eµfa??se?? t?? ???e ?????. - ?pe?e??as?a ?µfa??se?? ?? eµfa??se??
epe?e??????ta? ??a t?? ep???s? f??se??,
?µ???t?ta? ? ??????? te?est?? (boolean
operators). ??? ???s?µ?p??e?ta? block addressing
µp??e? ?a apa?t??e? ape??e?a? a?a??t?s? st?
?e?µe??.
19??a??t?s? se ??t. ???e??
- ?f?s?? ? a?a??t?s? ?e???? µe t? ?e???????, µ?a
?a?? p?a?t??? e??a? ?a ap????e?eta? se ?e????st?
a??e??. - ???a? p??a???, a??µ? ?a? ??a µe???e? s???????
?e?µ????, t? ?e??????? ?a ????e? st?? ????a
µ??µ?. - Se d?af??et??? pe??pt?s? µ???? t?? ?e????????
ß??s?eta? st?? ????a µ??µ? ?a? t? ?p????p? st?
ß????t??? µ??µ? (d?s??, CD-ROM).
20??a??t?s? se ??t. ???e??
- ???t?µata µ?a? ????? (single-word queries)
µp????? ?a apa?t????? ???s?µ?p????ta? ??p??a
ß????? d?µ? ded?µ???? ??a t? ??????? epe?e??as?a
t?? e??t?µat??. - ?ata?e?µat?sµ??, TRIES, ?-d??d?a.
- ?????? a?a??t?s?? O(m) ??a t?? d?? p??te?
µe??d???, ?(mlog(n)) ??a ta B-d??d?a.
21??a??t?s? se ??t. ???e??
- G?a ?a apa?t?s??µe e??t?se?? d?ast?µat?? ? d?µ?
t?? ?ata?e?µat?sµ?? de? e??a? ?at??????. - G?a t?? pe??pt?s? a?t? µp????µe ?a
???s?µ?p???s??µe d?ad??? d??d?a a?a??t?s??, TRIES
? ?-d??d?a.
22?a??de??µa
- ?a ß?e???? ?e?µe?a p?? pe??????? ???e?? ?? ?p??e?
?e??????af??? ß??s???ta? µeta?? t?? ????? cluster
?a? t?? ????? damage.
23?a??de??µa
Age basket cat cube cluster creature creative
damage
24??a??t?s? se ??t. ???e??
- Se pe??pt?s? p?? t? e??t?µa ap?te?e?ta? ap?
µeµ???µ??e? ???e?? ? a?a??t?s? staµat? ?ta?
????µe p??sd????se? t?? eµfa??se?? t??
s???e???µ???? ???e?? sta ?e?µe?a. - Se pe??pt?s? p?? p??? ap? µ?a ???e?? t??
e??t?µat?? ????? ß?e?e? a??????e? ? d?ad??as?a
t?? ???s?? (union) t?? eµfa??se??.
25??a??t?s? se ??t. ???e??
- St?? pe??pt?se?? ?p?? ????µe a?a??t?s? ?????????
f??se?? (??? µeµ???µ???? ???e??) ? e??t?µata
?e?t??as?? (proximity), ? epe?e??as?a e??a?
d?s????te??. - G?a ???e ???? d?µ?????e?ta? µ?a ??sta eµfa??se??.
St? s????e?a p?a?µat?p??e?ta? epe?e??as?a t??
??st?? ?ste ?a p??sd????ste? ? te???? ap??t?s?
t?? e??t?µat??.
26?a??de??µa
- ?st? ?t? a?a??te?ta? ? f??s?
- modern information retrieval
- ?st? ?t? µet? t?? a?a??t?s? t?? ?e???????? ?????
p?????e? ?? a??????e? ??ste? - modern 10, 50, 80
- information 17, 57, 120
- retrieval 29, 90, 400
???a ?a e??a? ? ap??t?s? st? e??t?µa ?p???e? ?
f??s? st? ?e?µe?? ? ???
27?atas?e?? ??t. ???e???
- ? ?atas?e?? ?a? ? e??µ???s? e??? a?test?aµµ????
a??e??? e??a? s?et??? e????? d?ad??as?a. - ??a a?test?aµµ??? a??e?? ??a ??a ?e?µe?? n
?a?a?t???? µp??e? ?a ?atas?e?aste? se ????? O(n).
28?atas?e?? ??t. ???e???
- ?? ?e??????? ???a???eta? µe t? ß???e?a µ?a?
ß?????? d?µ?? ded?µ???? (p.?. TRIE). - ???e ???? t?? ?e?µ???? d?aß??eta? ?a? a?a??te?ta?
st? ?e???????. - ??? ? ??a ???? de ß?e?e? st? ?e???????, t?te
e?s??eta? se a?t? ?a? e??µe???eta? ? ??sta
eµfa??se?? ??a t? s???e???µ??? ????. - ??? ? ???? ?p???e? st? ?e???????, t?te apa?te?ta?
µ??? e??µ???s? t?? ??sta? eµfa??se??.
29?atas?e?? ??t. ???e???
1 6 9 11 17 19 24 28 33
40 46 50 55 60
This is a text. A text has many words. Words are
made from letters
letters 60
l
made 50
d
m
a
t
n
many 28
text 11, 19
w
words 33, 40
30?atas?e?? ??t. ???e???
- ?f?s?? ??a t?? epe?e??as?a ???e ?a?a?t??a t??
?e?µ???? apa?te?ta? ?????? ?(1), ?a? ??a t??
e??µ???s? µ?a? ??sta? eµfa??se?? apa?te?ta?
?????? ?(1), ? s??????? p???p????t?ta t??
p??????µe??? µe??d?? e??a? ?(n). - Se pe??pt?s? p?? ? d?µ? de? µp??e? ?a ????se?
st?? ????a µ??µ?, ? µ???d?? pa???s???e?
p??ß??µata, d??t? apa?t???ta? p????? p??spe??se??
st? d?s??, µe ap?t??esµa ?a a????eta? d?aµat??? ?
?????? ?atas?e???.
31?atas?e?? ??t. ???e???
- ??a??a?t??? ????d??
- ? p??????µe?? d?ad??as?a s??e???eta? µ???? ?a
?eµ?se? ? ????a µ??µ?. - S??µat??eta? ??a tµ?µa t?? d?µ?? ded?µ???? Ii t?
?p??? ap????e?eta? st? d?s??. - ?????????ta? t?? ?d?a d?ad??as?a s??µat??eta? ??a
s????? tµ?µ?t?? Ii ta ?p??a e??a? ap????e?µ??a
st? d?s??. - ?????????? d?ad?????? s?????e?se?? ?ste ?a
p?????e? ? s??????? d?µ?.
32?atas?e?? ??t. ???e???
33?atas?e?? ??t. ???e???
- ????p????t?ta ??a??a?t???? ?e??d??
- ?????? ?atas?e??? t?? tµ?µ?t?? Ii e??a? O(n).
- ????µ?? tµ?µ?t?? O(n/M).
- ???e f?s? s?????e?s?? apa?te? ????? O(n).
- G?a t? s?????e?s? t?? O(n/M) tµ?µ?t?? apa?t???ta?
log(n/M) f?se?? s?????e?s??. - ?p?µ???? s??????? ?(n log(n/M))
34?e???e?t?µata ??t. ???e???
- ? µ???d?? t?? a?test?aµµ???? a??e??? ?p???te? ?t?
t? ?e?µe?? µp??e? ?a ?e????e? sa? µ?a a???????a
???e??. - ??t? t? ?a?a?t???st??? pe??????e? a??et? t?? t?p?
t?? e??t?µ?t?? p?? µp????? ?a epe?e??ast??? ap?
t? s?st?µa. - ???t?µata ?p?? a?a??t?s? f??se?? e??a? a???ß?
st?? epe?e??as?a t???. - ?????, se p????? efa?µ???? ? ?????a t?? de?
?p???e? (p.?. genetic databases).
35Suffix Trees Arrays
- ?p?te???? ap?d?t??? ???p???s? t?? suffix trees.
- ?p?t??p??? t?? epe?e??as?a p?? p???p?????
e??t?se??. - ? µ???d?? a?t? ß??pe? t? ?e?µe?? sa? µ?a µe????
se??? ?a?a?t????. - ???e ??s? st? ?e?µe?? ?e??e?ta? ?? suffix.
- ?? ??se?? p?? de??t?d?t???ta? ???µ????ta? index
points. ?e? apa?te?ta? ? de??t?d?t?s? ???? t??
??se?? t?? ?e?µ????.
36Suffix Trees Arrays
- ??a suffix tree e??a? st?? ??s?a µ?a d?µ? TRIE ?
?p??a ?t??eta? µe ß?s? t?? ?ata???e?? (suffixes)
t?? ?e?µ????. - ?? pointers p??? t?? ?ata???e?? ap????e???ta? sta
f???a t?? d?µ??. - G?a t? ße?t??s? t?? pa?????ta ???s?µ?p???s??
????? (space utilization), ta µ???p?t?a t?? d?µ??
s?µp?????ta? (Patricia trees). - ??t? µa? ep?t??pe? ?a ap????e?s??µe t? d?µ? se
???? O(n).
37Suffix Trees Arrays
1 6 9 11 17 19 24 28 33
40 46 50 55 60
This is a text. A text has many words. Words are
made from letters
60
l
50
d
m
a
t
n
28
19
e
x
t
w
Suffix Trie
11
.
40
o
r
d
s
.
33
38Suffix Trees Arrays
1 6 9 11 17 19 24 28 33
40 46 50 55 60
This is a text. A text has many words. Words are
made from letters
60
l
50
d
m
n
t
28
19
w
Suffix Tree
11
.
40
.
33
39Suffix Trees Arrays
- ?? p??ß??µa e??a? ?t? ??a t?? ap????e?s? t??
d?µ?? apa?te?ta? a??et?? ?????. - ?p??????eta? ?t? a??µ? ?a? st?? pe??pt?s? p??
de??t?d?t???ta? µ??? ?? p??t?? ?a?a?t??e? ???e
?????, ? ep?p???? ????? p?? apa?te?ta? e??a? 120
µe 240 t?? s???????? µe?????? t?? ?e?µ????. - ???e ??µß?? t?? d?µ?? apa?te? 12 ? 24 bytes ??a
t?? ap????e?s? t??.
40Suffix Trees Arrays
- ? d?µ? t?? suffix arrays p??sf??e? t?? ?d?a
?e?t???????t?ta, µe t? d?af??? ?t? apa?te?ta?
p??? ????te??? ????? ??a t?? ap????e?s? t??
d?µ??. - ??? d?as??s??µe ta f???a t?? suffix tree ap?
a??ste?? p??? ta de???, ??e? ?? ?ata???e??
(suffixes) t?? ?e?µ???? pa?????ta? ?at?
?e??????af??? d??ta??. - ??a suffix array pe????e? t??? pointers st??
?ata???e?? µe ?e??????af??? d??ta??.
41Suffix Trees Arrays
1 6 9 11 17 19 24 28 33
40 46 50 55 60
This is a text. A text has many words. Words are
made from letters
60
50
28
19
11
40
33
Suffix Array
? ep?p???? apa?t??µe??? ????? e??a? pe??p?? 40
t?? ?e?µ????.
42??a??t?s? µe S.T. S.A.
- ??a??t?se?? ??a ???e??, f??se?? ?a? p????µata
(prefixes) µp????? ?a p?a?µat?p??????? se ?????
O(logn). - G?a t? pattern p?? a?a??t??µe ß??s???µe d??
subpatterns P1 ?a? P2 ?a? a?a??t??µe ta suffixes
S ?ste ?e??????af??? ?a ?s??e? P1ltSltP2. - ?a??de??µa a? a?a??t??µe t? ???? text ????µe
P1text ?a? P2texu. ? d?µ? ep?st??fe? t??
eµfa??se?? 19 ?a? 11. - ?a P1 ?a? P2 a?a??t???ta? µe d?ad??? a?a??t?s?.
- ?f?s?? ???e d?ad??? a?a??t?s? ??st??e? logn
ß?µata st? ?e???te?? pe??pt?s?, ????µe O(logn).
43???e?a ?p???af??
- Signature Files
- ??a?e??????ta? ???e?? (word-based) ?a?
st??????ta? st?? ?ata?e?µat?sµ?. - ????? s?et??? µ???? ep?p???? ???? (pe??p?? 10 µe
20 t?? µe?????? t?? ?e?µ????). - S?µf??a µe pe??aµat???? µet??se??, ta
a?test?aµµ??a a??e?a ????? ?a??te?? ap?d?s? ap?
ta a??e?a ?p???af??.
44???e?a ?p???af??
- ??a a??e?? ?p???af?? ???s?µ?p??e? µ?a s????t?s?
?ata?e?µat?sµ?? ? ?p??a a?apa??st? ???e ???? µe
µ?a µ?s?a ap? ? bits. - ?? ?e?µe?? ?????eta? se blocks µe b ???e?? t?
?a???a. - Se ???e block µe?????? b a?t?st?????µe µ?a µ?s?a
ap? B bits. ? µ?s?a pa???eta? efa?µ????ta? t??
te?est? OR st?? d?ad???? a?apa?ast?se?? t??
???e?? t?? block.
45???e?a ?p???af??
- ??? µ?a ???? e??a? pa???sa se ??a block ?e?µ????,
t?te ??a ta bits p?? e??a? 1 st?? ?p???af? t??
?????, e??a? ep?s?? 1 st? µ?s?a t?? block. - Ost?s? e??a? p??a???, ta bits ?a e??a? 1 a??µ?
?a? ?ta? ? ???? de ß??s?eta? st? block. ??t?
???µ??eta? false drop. - ?? p?? e?d?af???? µ???? sta a??e?a ?p???af??
e??a? µa µe???e? st? e????st? ? p??a??t?ta ?a
????µe false drop.
46?a??de??µa
This is a text. A text has many words. Words are
made from letters
000101
110101
100100
101101
H(text) 000101 H(many) 110000 H(words)
100100 H(made) 001100 H(letters) 100001
? s????t?s? ?ata?e?µat?sµ?? ep????eta? ?ts? ?ste
?a ?p?????? t??????st?? ? bits e?e??? st??
?p???af? ???e ?????.
47Se???a?? ??a??t?s?
- ?p?????? pe??pt?se?? p?? de? ?p?????? ß????t????
d?µ?? ded?µ???? ??a t?? a?a??t?s?. - ?? d??eta? ??a pattern P µe m ?a?a?t??e? ?a? ??a
?e?µe?? ? µe n ?a?a?t??e?, p??pe? ?a ß?e???? ??
eµfa??se?? t?? P st? ?. - ????? p??ta?e? p????? µ???d?? ??a t?? ep???s? t??
p??ß??µat??. St? s????e?a ?a e?et?s??µe µe?????
ap? a?t??.
48Se???a?? ??a??t?s?
- ???fa??? µ???d?? (brute-force)
- ????d?? t?? Knuth, Morris ?a? Pratt (KMP)
- ????d?? Boyer-Moore
- ????d?? Shift-or
- ????d?? Suffix Automaton
49Brute-Force
- ???a? ? p?? ap?? µ???d?? a?a??t?s??.
- ????µ????ta? se???a?? ??e? ?? ??se?? t?? ?e?µ????
?a? e????eta? e?? t? pattern ta?????e? µe t???
?a?a?t??e? t?? ?e?µ????. - ? d?ad??as?a a??????e?ta? ??? ?t?? ft?s??µe st?
t???? t?? ?e?µ???? ?.
50Brute-Force
a
b
r
a
c
a
b
r
a
c
a
d
a
b
r
a
a
b
r
a
c
a
d
a
a
a
b
a
a
b
r
a
c
a
d
a
b
r
a
51Brute-Force
- ?p?????? O(n) ??se?? st? ?e?µe?? ?a? ?(m) ??se??
st? pattern. - ?f?s?? e?et????ta? ??e? ?? d??at?? ??se?? ??a t?
pattern, ? p???p????t?ta ?e???te??? pe??pt?s??
??a t? µ???d? e??a? O(nm). - Ost?s? ? p???p????t?ta µ?s?? pe??pt?s?? e??a?
O(n), d??t? se t??a?? ?e?µe?? ?a ????µe ap?t???a
µet? ap? O(1) s?????se?? ?a?a?t????.
52Knuth-Morris-Pratt
- ???a? ? p??t?? a??????µ?? µe ??aµµ???
p???p????t?ta ?e???te??? pe??pt?s?? p?? p??t????e
(?(n)). - Ost?s? st? µ?s? pe??pt?s? ??e? pa??µ??a ap?d?s?
µe t?? brute-force. - ? ßas??? te????? p?? ???s?µ?p??e?ta? e??a? ?t?
ap?fe??eta? ? e??tas? ??se?? st?? ?p??e? e??a?
s?????? ?t? de ?a ß?e?e? t? pattern. - ?ts?, de? e?et????ta? ??e? ?? d??at?? ??se??.
53Knuth-Morris-Pratt
- ?pa?te?ta? p??epe?e??as?a t?? pattern.
- ?atas?e???eta? ??a? p??a?a? next, ? ?p????
d????e? p?se? ??se?? µp????µe ?a p??????s??µe. - ???e ??s? j t?? p??a?a de???e? p?? e??a? t?
µe?a??te?? ?a?????? p???eµa t?? P1..j-1 t? ?p???
e??a? ep?s?? ?a? ep??eµa ?a? ?? ?a?a?t??e? p??
a????????? e??a? d?af??et????. - ?p?µ????, µp????µe µe asf??e?a ?a pa?a??µ???µe j
- nextj - 1 ?a?a?t??e?.
54Knuth-Morris-Pratt
a
b
r
a
c
a
b
r
a
c
a
d
a
b
r
a
a
b
r
a
c
a
d
a
b
r
a
c
a
d
a
b
r
a
55Knuth-Morris-Pratt
- ? µ???d?? ???s?µ?p??e? ??a pa?????? t? ?p??? se
???e ß?µa ß??s?eta? se µ?a ??s? t?? ?e?µ????. - ?p???e? ??a? de??t?? (pointer) µ?sa st? pa??????.
- ???e f??? p?? ??a? ?a?a?t??a? t?? pattern
ta?????e?, ? de??t?? µeta???e?ta? µ?a ??s?
pa?a??t?. - ???e f??? p?? de? ?p???e? ta???asµa, t? pa??????
µeta???e?ta? e?? ? de??t?? pa?aµ??e? sta?e???. - ?f?s?? ???e f??? e?te t? pa?????? e?te ? de??t??
µeta??????ta? ?at? µ?a ??s?, ? µ???d??
p?a?µat?p??e? t? p??? 2n s?????se??.
56Boyer-Moore
- ?? pattern s??????eta? µe ?a?a?t??e? t?? ?e?µ????
ap? t? t???? t?? pattern p??? t?? a???. - ?p?? ?a? ? µ???d?? KMP ???s?µ?p??e? t? match
heuristic. - ??t?? ap? t? match heuristic, ???s?µ?p??e?ta? ?a?
t? occurrence heuristic ? ?a?a?t??a? t??
?e?µ???? p?? p?????ese t? p??ß??µa p??pe? ?a
e?????aµµ?ste? µe t? pattern µet? t? µeta????s?
t?? pa?a?????.
57Boyer-Moore
a
b
r
a
c
a
b
a
b
r
a
c
a
d
a
b
r
a
a
b
r
a
c
a
d
a
b
r
a
match heuristic µeta????s? 7 ??se??
a
b
r
a
c
a
d
a
b
r
a
a
b
r
a
c
a
d
a
b
r
a
occurence heuristic µeta????s? 5 ??se??
?????G???? ? ??G??????? ????????S?
58Boyer-Moore
- ??st?? p??epe?e??as?a? pattern ?(ms).
- ??st?? a?a??t?s?? µ?s?? pe??pt?s?? ?(nlogm/m)
- ??st?? a?a??t?s?? ?e???te??? pe??pt?s?? ?(mn).
- ?a?a??a??? ??-ap??p???µ????, BM-Horspool,
BM-Sunday, Commentz-Walter (ep??tas? ??a
a?a??t?s? p????? patterns).
59Shift-OR
- St????eta? st?? te????? bit-parallelism.
- ?e?t?????e? p?? af????? sta bits µ?a? ????? t??
epe?e??ast? p?? ap?te?e?ta? ap? w bits. - ?? s?µe????? epe?e??ast?? st??????ta? se
a???te?t?????? 32 ? 64 bits. - ? ße?t??s? p?? p??sf??e? ? µ???d?? st? ?????
a?a??t?s?? t?? pattern e??a? a??et? ?a??.
60Shift-OR
- ? µ???d?? e??µ????e? t? ?e?t?????a e???
µ?-?tete?µ???st???? a?t?µ?t?? t? ?p??? a?a??t? t?
pattern st? ?e?µe??. - ?? a?t?µat? e??µ????eta? se ????? O(nm).
- ? p???p????t?ta ?????? ?e???te??? pe??pt?s??
e??a? O(nm/w) (optimal speedup).
61Shift-OR
b
r
a
a
c
d
a
b
r
a
a
0 1 1 0 1
0 1 0 1 1
0
Ba
1 0 1 1 1
1 1 1 0 1
1
Bb
1 1 1 1 0
1 1 1 1 1
1
Bc
1 1 1 1 1
1 0 1 1 1
1
Bd
1 1 0 1 1
1 1 1 1 0
1
Br
1 1 1 1 1
1 1 1 1 1
1
B
62Shift-OR
- ? ?at?stas? t?? a?a??t?s?? ?ata???e?ta? se µ?a
???? µ??a??? Ddm d1, ?p?? t? bit di 0 ?ta? ?
?at?stas? i t?? a?t?µ?t?? e??a? e?e???. - ?p?µ???? ????µe ta?t?s? ?ta? dm 0.
- bOR bitwise OR
- bAND bitwise AND
63Shift-OR
- ?????? ??a ta bits t?? ????? D e??a? 1.
- G?a ???e ??? ?a?a?t??a ?e?µ???? Tj p??
e?et??eta?, ? ???? D e??µe???eta? ?? e??? D (
D ltlt 1) bOR BTj. - ?? s?µß??? ltlt s?µa??e? ?t? ta bits µeta??????ta?
µ?a ??s? a??ste?? (shift-left) ?a? t? p?? de??
bit ???eta? 1.
64(No Transcript)
65(No Transcript)
66????p???a Patterns
- Se p????? pe??pt?se?? a?a??t??µe p?? p???p???a
patterns ap? ap??? ???e??. ??et????µe ta e??? - ??a??t?s? µe ???? (approximate matching)
- ??a??t?s? extended patterns
67Approximate Matching
- ???eta? ??a pattern P µe?????? m, ?e?µe?? ?
µe?????? n, ?a? ??a? a???a??? a???µ?? k ? ?p????
d????e? t? µ???st? a???µ? ?a??? p?? ep?t??p??ta?
st? ta???asµa. - ?? p??ß??µa e??a? p?? p??sfat? se s??s? µe t??
a???ß? (exact) a?a??t?s?. - ?p?????? a??et?? ??se??. ?d? ?a s???t?s??µe d??
- ???aµ???? p????aµµat?sµ??
- ??t?µata
68???aµ???? ?????aµµat?sµ??
- ?? pe??ss?te??? ap? t??? a??????µ??? p??
s?et????ta? µe epe?e??as?a ?µ???a? ?a? ???ss??
a?????? st?? ???????e?a t?? ???aµ????
?????aµµat?sµ??. - ?eta?? a?t?? ß??s?eta? ?a? ? a??????µ?? p??
ßas??eta? st?? e????st? ap?stas? (minimum edit
distance). - ? ???aµ???? ?????aµµat?sµ?? ßas??eta? st?? a???
?t? t? a????? p??ß??µa µp??e? ?a ep????e?, µe
?at?????? s??d?asµ? t?? ??se?? µ????te???
?p?p??ß??µ?t??.
69???aµ???? ?????aµµat?sµ??
- ?st? p??a?a? C0m, 0n.
- ?? st???e?? Ci,j d????e? t?? e????st? a???µ?
?a??? p?? ?p?????? ?at? t? ta???asµa t?? P1i µe
??p??? suffix t?? T1j. - ? ?p?????sµ?? ???eta? ?? e???
????µe ta???asµa ?ta? ??a ??p??a ??s? j ?s??e?
Cm,j lt k
70???aµ???? ?????aµµat?sµ??
- ????p????t?ta ?????? O(mn).
- ????p????t?ta ????? O(m).
- ????p????t?ta ?????? p??epe?e??as?a? O(m).
- ???sfata ????? p??ta?e? a??????µ?? ?? ?p????
pet??a????? ??????? p???p????t?ta O(kn).
71???aµ???? ?????aµµat?sµ??
72??t?µata