?e??t?d?t?s? ?a? ??a??t?s? (Indexing - PowerPoint PPT Presentation

About This Presentation
Title:

?e??t?d?t?s? ?a? ??a??t?s? (Indexing

Description:

(Indexing & Searching) – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 73
Provided by: auth3163
Category:

less

Transcript and Presenter's Notes

Title: ?e??t?d?t?s? ?a? ??a??t?s? (Indexing


1
?e??t?d?t?s? ?a? ??a??t?s?(Indexing Searching)
2
??sa????
  • ?e p????? t??p??? µp????µe ?a a?a??t?s??µe
    p????f???a ap? µ?a s?????? ?e?µ????
  • ? p?? ap??? ?a? e????a ???p???s?µ?? t??p?? e??a?
    ?a ?????µe se???a?? ??a ta ?e?µe?a t?? s???????.
  • ??a? ????? t??p?? e??a? ?a ?t?s??µe e?d???? d?µ??
    ded?µ???? (index structures) ?ste ?a ep?ta?????µe
    t? d?ad??as?a a?a??t?s??.

3
??sa????
  • ? ???s? de??t?? e??a? e??e?a sta s?st?µata ß?se??
    ded?µ???? (p.?. Oracle, MySQL, SQLserver).
  • ?? de??te? ????? t?? ??a??t?ta ?a ap????pt??? ??a
    µe???? tµ?µa t?? ded?µ???? t? ?p??? de?
    s?µµet??e? st?? ap??t?s?.
  • ?a?ade??µata de??t?? ?-d??d?a, ?ata?e?µat?sµ??
    (hashing).

4
??sa????
5
??ad??? ???d?a ??a??t?s??
6
?-d??d?a
7
?ata?e?µat?sµ??
0
10
20
30
40
50
60
1
2
12
22
42
3
4
S????t?s? ?ata?e?µat?sµ?? h(key) key mod 10
5
6
7
8
9
9
19
79
8
?e??te? ??a ?e?µe?a
  • St?? pe??pt?s? t?? ?e?µ???? ?? µ??a??sµ??
    de??t?d?t?s?? d?af????? ap? t??? a?t?st?????? ??a
    a???µ???.
  • ?e??te? ??a ?e?µe?a
  • ??test?aµµ??a ???e?a (Inverted Files)
  • Suffix Trees, Suffix Arrays
  • ???e?a ?p???af?? (Signature Files)

9
??test?aµµ??a ???e?a
  • n µ??e??? ?e?µ????
  • m µ???? t?? pattern
  • v µ??e??? ?e????????
  • M t? µ??e??? t?? d?a??s?µ?? µ??µ??

10
??test?aµµ??a ???e?a
  • ???a? ??a? µ??a??sµ?? de??t?d?t?s?? st?????µe?e?
    se ???e?? (word-based) ? ?p???? ???s?µ?p??e?ta?
    ??a ap?d?t???te?? a?a??t?s?.
  • ??µ? a?test?aµµ???? a??e???
  • ?e??????? (vocabulary)
  • ??ste? eµf???s??

11
?a??de??µa
?e?µe??
1 6 12 16 18 25
29 36 40 45 54
58 66 70 That house has a
garden. The garden has many flowers. The flowers
are beautiful
??test?aµµ??? ???e??
Vocabulary
Occurrences
beautiful flowers garden house
70 45, 58 18, 29 6
12
??test?aµµ??a ???e?a
  • ?? apa?t?se?? ????? ??a t?? ap????e?s? t??
    ?e???????? (vocabulary) e??a? a??et? µ?????.
  • S?µf??a µe t? ??µ? t?? Heap t? µ??e??? t??
    ?e???????? a????eta? a?????a t?? O(nß) ?p?? ß
    e??a? µ?a sta?e?? µeta?? 0 ?a? 1. St?? p???? t? ß
    pa???e? t?µ?? µeta?? 0.4 ?a? 0.6
  • G?a pa??de??µa ??a ?e?µe?a s???????? µe??????
    1GBytes ap? t? s?????? TREC-2 t? ?e???????
    ?ata?aµß??e? µ???? 5MBytes.

13
??test?aµµ??a ???e?a
  • ?? tµ?µa t?? eµfa??se?? ?ata?aµß??e? p???
    pe??ss?te?? ????.
  • ?f?s?? ???e ???? eµfa???eta? t??????st?? µ?a f???
    st? ?e?µe??, ? ep?p???? apa?t??µe??? ????? e??a?
    t?? t???? t?? O(n).
  • ???µ? ?a? µet? t?? ap?µ?????s? t?? stopwords, t?
    ep?p???? ??st?? se ???? ??µa??eta? µeta?? 30 ?a?
    40 t?? µe?????? t?? ?e?µ????.

14
??test?aµµ??a ???e?a
  • G?a t? µe??s? t?? apa?t??µe??? ?????
    ???s?µ?p??e?ta? ? te????? t?? d?e????s??d?t?s??
    block (block addressing).
  • ?? ?e?µe?? ?????eta? se tµ?µata (blocks) ?a? ??
    eµfa??se?? de?????? sta a?t?st???a block ?a? ???
    se ?a?a?t??e?.
  • ?? ??as???? µ???d?? p?? ???s?µ?p????? de??te? se
    ??se?? ?a?a?t???? ?a????ta? full inverted indices.

15
??test?aµµ??a ???e?a
  • ???s?µ?p????ta? block addressing apa?t???ta?
    pointers µ????te??? µe?????? d??t? ta blocks
    e??a? p??? ????te?a ap? t??? ?a?a?t??e? t??
    ?e?µ????.
  • ?p?s?? eµfa??se?? p?? a?af????ta? se ???e?? t??
    ?d??? block eµfa?????ta? µe t?? ?d?a a?af???.
  • S?????? t? ep?p???? ??st?? se ???? p?? apa?te?ta?
    µe t?? te????? a?t? e??a? pe??p?? 5 t?? µe??????
    t?? ?e?µ????.

16
?a??de??µa
?e?µe??
Block 1 Block 2 Block 3
Block 4
That house has a garden. The garden has many
flowers. The flowers are beautiful
??test?aµµ??? ???e??
Vocabulary
Occurrences
beautiful flowers garden house
4 3 2 1
17
S?????s?
18
??a??t?s? se ??t. ???e??
  • ??a t?p??? µ???d?? a?a??t?s?? se a?test?aµµ???
    a??e?? a??????e? ta pa?a??t? ß?µata
  • ??a??t?s? ?e???????? ?? ???e?? p??
    p??sd???????ta? st? e??t?µa ap?µ??????ta? ?a?
    a?a??t???ta? st? ?e???????.
  • ????t?s? ?µfa??se?? p??sd???????ta? ??
    eµfa??se?? t?? ???e ?????.
  • ?pe?e??as?a ?µfa??se?? ?? eµfa??se??
    epe?e??????ta? ??a t?? ep???s? f??se??,
    ?µ???t?ta? ? ??????? te?est?? (boolean
    operators). ??? ???s?µ?p??e?ta? block addressing
    µp??e? ?a apa?t??e? ape??e?a? a?a??t?s? st?
    ?e?µe??.

19
??a??t?s? se ??t. ???e??
  • ?f?s?? ? a?a??t?s? ?e???? µe t? ?e???????, µ?a
    ?a?? p?a?t??? e??a? ?a ap????e?eta? se ?e????st?
    a??e??.
  • ???a? p??a???, a??µ? ?a? ??a µe???e? s???????
    ?e?µ????, t? ?e??????? ?a ????e? st?? ????a
    µ??µ?.
  • Se d?af??et??? pe??pt?s? µ???? t?? ?e????????
    ß??s?eta? st?? ????a µ??µ? ?a? t? ?p????p? st?
    ß????t??? µ??µ? (d?s??, CD-ROM).

20
??a??t?s? se ??t. ???e??
  • ???t?µata µ?a? ????? (single-word queries)
    µp????? ?a apa?t????? ???s?µ?p????ta? ??p??a
    ß????? d?µ? ded?µ???? ??a t? ??????? epe?e??as?a
    t?? e??t?µat??.
  • ?ata?e?µat?sµ??, TRIES, ?-d??d?a.
  • ?????? a?a??t?s?? O(m) ??a t?? d?? p??te?
    µe??d???, ?(mlog(n)) ??a ta B-d??d?a.

21
??a??t?s? se ??t. ???e??
  • G?a ?a apa?t?s??µe e??t?se?? d?ast?µat?? ? d?µ?
    t?? ?ata?e?µat?sµ?? de? e??a? ?at??????.
  • G?a t?? pe??pt?s? a?t? µp????µe ?a
    ???s?µ?p???s??µe d?ad??? d??d?a a?a??t?s??, TRIES
    ? ?-d??d?a.

22
?a??de??µa
  • ?a ß?e???? ?e?µe?a p?? pe??????? ???e?? ?? ?p??e?
    ?e??????af??? ß??s???ta? µeta?? t?? ????? cluster
    ?a? t?? ????? damage.

23
?a??de??µa
Age basket cat cube cluster creature creative
damage
24
??a??t?s? se ??t. ???e??
  • Se pe??pt?s? p?? t? e??t?µa ap?te?e?ta? ap?
    µeµ???µ??e? ???e?? ? a?a??t?s? staµat? ?ta?
    ????µe p??sd????se? t?? eµfa??se?? t??
    s???e???µ???? ???e?? sta ?e?µe?a.
  • Se pe??pt?s? p?? p??? ap? µ?a ???e?? t??
    e??t?µat?? ????? ß?e?e? a??????e? ? d?ad??as?a
    t?? ???s?? (union) t?? eµfa??se??.

25
??a??t?s? se ??t. ???e??
  • St?? pe??pt?se?? ?p?? ????µe a?a??t?s? ?????????
    f??se?? (??? µeµ???µ???? ???e??) ? e??t?µata
    ?e?t??as?? (proximity), ? epe?e??as?a e??a?
    d?s????te??.
  • G?a ???e ???? d?µ?????e?ta? µ?a ??sta eµfa??se??.
    St? s????e?a p?a?µat?p??e?ta? epe?e??as?a t??
    ??st?? ?ste ?a p??sd????ste? ? te???? ap??t?s?
    t?? e??t?µat??.

26
?a??de??µa
  • ?st? ?t? a?a??te?ta? ? f??s?
  • modern information retrieval
  • ?st? ?t? µet? t?? a?a??t?s? t?? ?e???????? ?????
    p?????e? ?? a??????e? ??ste?
  • modern 10, 50, 80
  • information 17, 57, 120
  • retrieval 29, 90, 400

???a ?a e??a? ? ap??t?s? st? e??t?µa ?p???e? ?
f??s? st? ?e?µe?? ? ???
27
?atas?e?? ??t. ???e???
  • ? ?atas?e?? ?a? ? e??µ???s? e??? a?test?aµµ????
    a??e??? e??a? s?et??? e????? d?ad??as?a.
  • ??a a?test?aµµ??? a??e?? ??a ??a ?e?µe?? n
    ?a?a?t???? µp??e? ?a ?atas?e?aste? se ????? O(n).

28
?atas?e?? ??t. ???e???
  • ?? ?e??????? ???a???eta? µe t? ß???e?a µ?a?
    ß?????? d?µ?? ded?µ???? (p.?. TRIE).
  • ???e ???? t?? ?e?µ???? d?aß??eta? ?a? a?a??te?ta?
    st? ?e???????.
  • ??? ? ??a ???? de ß?e?e? st? ?e???????, t?te
    e?s??eta? se a?t? ?a? e??µe???eta? ? ??sta
    eµfa??se?? ??a t? s???e???µ??? ????.
  • ??? ? ???? ?p???e? st? ?e???????, t?te apa?te?ta?
    µ??? e??µ???s? t?? ??sta? eµfa??se??.

29
?atas?e?? ??t. ???e???
1 6 9 11 17 19 24 28 33
40 46 50 55 60
This is a text. A text has many words. Words are
made from letters
letters 60
l
made 50
d
m
a
t
n
many 28
text 11, 19
w
words 33, 40
30
?atas?e?? ??t. ???e???
  • ?f?s?? ??a t?? epe?e??as?a ???e ?a?a?t??a t??
    ?e?µ???? apa?te?ta? ?????? ?(1), ?a? ??a t??
    e??µ???s? µ?a? ??sta? eµfa??se?? apa?te?ta?
    ?????? ?(1), ? s??????? p???p????t?ta t??
    p??????µe??? µe??d?? e??a? ?(n).
  • Se pe??pt?s? p?? ? d?µ? de? µp??e? ?a ????se?
    st?? ????a µ??µ?, ? µ???d?? pa???s???e?
    p??ß??µata, d??t? apa?t???ta? p????? p??spe??se??
    st? d?s??, µe ap?t??esµa ?a a????eta? d?aµat??? ?
    ?????? ?atas?e???.

31
?atas?e?? ??t. ???e???
  • ??a??a?t??? ????d??
  • ? p??????µe?? d?ad??as?a s??e???eta? µ???? ?a
    ?eµ?se? ? ????a µ??µ?.
  • S??µat??eta? ??a tµ?µa t?? d?µ?? ded?µ???? Ii t?
    ?p??? ap????e?eta? st? d?s??.
  • ?????????ta? t?? ?d?a d?ad??as?a s??µat??eta? ??a
    s????? tµ?µ?t?? Ii ta ?p??a e??a? ap????e?µ??a
    st? d?s??.
  • ?????????? d?ad?????? s?????e?se?? ?ste ?a
    p?????e? ? s??????? d?µ?.

32
?atas?e?? ??t. ???e???
33
?atas?e?? ??t. ???e???
  • ????p????t?ta ??a??a?t???? ?e??d??
  • ?????? ?atas?e??? t?? tµ?µ?t?? Ii e??a? O(n).
  • ????µ?? tµ?µ?t?? O(n/M).
  • ???e f?s? s?????e?s?? apa?te? ????? O(n).
  • G?a t? s?????e?s? t?? O(n/M) tµ?µ?t?? apa?t???ta?
    log(n/M) f?se?? s?????e?s??.
  • ?p?µ???? s??????? ?(n log(n/M))

34
?e???e?t?µata ??t. ???e???
  • ? µ???d?? t?? a?test?aµµ???? a??e??? ?p???te? ?t?
    t? ?e?µe?? µp??e? ?a ?e????e? sa? µ?a a???????a
    ???e??.
  • ??t? t? ?a?a?t???st??? pe??????e? a??et? t?? t?p?
    t?? e??t?µ?t?? p?? µp????? ?a epe?e??ast??? ap?
    t? s?st?µa.
  • ???t?µata ?p?? a?a??t?s? f??se?? e??a? a???ß?
    st?? epe?e??as?a t???.
  • ?????, se p????? efa?µ???? ? ?????a t?? de?
    ?p???e? (p.?. genetic databases).

35
Suffix Trees Arrays
  • ?p?te???? ap?d?t??? ???p???s? t?? suffix trees.
  • ?p?t??p??? t?? epe?e??as?a p?? p???p?????
    e??t?se??.
  • ? µ???d?? a?t? ß??pe? t? ?e?µe?? sa? µ?a µe????
    se??? ?a?a?t????.
  • ???e ??s? st? ?e?µe?? ?e??e?ta? ?? suffix.
  • ?? ??se?? p?? de??t?d?t???ta? ???µ????ta? index
    points. ?e? apa?te?ta? ? de??t?d?t?s? ???? t??
    ??se?? t?? ?e?µ????.

36
Suffix Trees Arrays
  • ??a suffix tree e??a? st?? ??s?a µ?a d?µ? TRIE ?
    ?p??a ?t??eta? µe ß?s? t?? ?ata???e?? (suffixes)
    t?? ?e?µ????.
  • ?? pointers p??? t?? ?ata???e?? ap????e???ta? sta
    f???a t?? d?µ??.
  • G?a t? ße?t??s? t?? pa?????ta ???s?µ?p???s??
    ????? (space utilization), ta µ???p?t?a t?? d?µ??
    s?µp?????ta? (Patricia trees).
  • ??t? µa? ep?t??pe? ?a ap????e?s??µe t? d?µ? se
    ???? O(n).

37
Suffix Trees Arrays
1 6 9 11 17 19 24 28 33
40 46 50 55 60
This is a text. A text has many words. Words are
made from letters
60
l
50
d
m
a
t
n
28
19

e
x
t
w
Suffix Trie
11
.

40
o
r
d
s
.
33
38
Suffix Trees Arrays
1 6 9 11 17 19 24 28 33
40 46 50 55 60
This is a text. A text has many words. Words are
made from letters
60
l
50
d
m
n
t
28
19

w
Suffix Tree
11
.

40
.
33
39
Suffix Trees Arrays
  • ?? p??ß??µa e??a? ?t? ??a t?? ap????e?s? t??
    d?µ?? apa?te?ta? a??et?? ?????.
  • ?p??????eta? ?t? a??µ? ?a? st?? pe??pt?s? p??
    de??t?d?t???ta? µ??? ?? p??t?? ?a?a?t??e? ???e
    ?????, ? ep?p???? ????? p?? apa?te?ta? e??a? 120
    µe 240 t?? s???????? µe?????? t?? ?e?µ????.
  • ???e ??µß?? t?? d?µ?? apa?te? 12 ? 24 bytes ??a
    t?? ap????e?s? t??.

40
Suffix Trees Arrays
  • ? d?µ? t?? suffix arrays p??sf??e? t?? ?d?a
    ?e?t???????t?ta, µe t? d?af??? ?t? apa?te?ta?
    p??? ????te??? ????? ??a t?? ap????e?s? t??
    d?µ??.
  • ??? d?as??s??µe ta f???a t?? suffix tree ap?
    a??ste?? p??? ta de???, ??e? ?? ?ata???e??
    (suffixes) t?? ?e?µ???? pa?????ta? ?at?
    ?e??????af??? d??ta??.
  • ??a suffix array pe????e? t??? pointers st??
    ?ata???e?? µe ?e??????af??? d??ta??.

41
Suffix Trees Arrays
1 6 9 11 17 19 24 28 33
40 46 50 55 60
This is a text. A text has many words. Words are
made from letters
60
50
28
19
11
40
33
Suffix Array
? ep?p???? apa?t??µe??? ????? e??a? pe??p?? 40
t?? ?e?µ????.
42
??a??t?s? µe S.T. S.A.
  • ??a??t?se?? ??a ???e??, f??se?? ?a? p????µata
    (prefixes) µp????? ?a p?a?µat?p??????? se ?????
    O(logn).
  • G?a t? pattern p?? a?a??t??µe ß??s???µe d??
    subpatterns P1 ?a? P2 ?a? a?a??t??µe ta suffixes
    S ?ste ?e??????af??? ?a ?s??e? P1ltSltP2.
  • ?a??de??µa a? a?a??t??µe t? ???? text ????µe
    P1text ?a? P2texu. ? d?µ? ep?st??fe? t??
    eµfa??se?? 19 ?a? 11.
  • ?a P1 ?a? P2 a?a??t???ta? µe d?ad??? a?a??t?s?.
  • ?f?s?? ???e d?ad??? a?a??t?s? ??st??e? logn
    ß?µata st? ?e???te?? pe??pt?s?, ????µe O(logn).

43
???e?a ?p???af??
  • Signature Files
  • ??a?e??????ta? ???e?? (word-based) ?a?
    st??????ta? st?? ?ata?e?µat?sµ?.
  • ????? s?et??? µ???? ep?p???? ???? (pe??p?? 10 µe
    20 t?? µe?????? t?? ?e?µ????).
  • S?µf??a µe pe??aµat???? µet??se??, ta
    a?test?aµµ??a a??e?a ????? ?a??te?? ap?d?s? ap?
    ta a??e?a ?p???af??.

44
???e?a ?p???af??
  • ??a a??e?? ?p???af?? ???s?µ?p??e? µ?a s????t?s?
    ?ata?e?µat?sµ?? ? ?p??a a?apa??st? ???e ???? µe
    µ?a µ?s?a ap? ? bits.
  • ?? ?e?µe?? ?????eta? se blocks µe b ???e?? t?
    ?a???a.
  • Se ???e block µe?????? b a?t?st?????µe µ?a µ?s?a
    ap? B bits. ? µ?s?a pa???eta? efa?µ????ta? t??
    te?est? OR st?? d?ad???? a?apa?ast?se?? t??
    ???e?? t?? block.

45
???e?a ?p???af??
  • ??? µ?a ???? e??a? pa???sa se ??a block ?e?µ????,
    t?te ??a ta bits p?? e??a? 1 st?? ?p???af? t??
    ?????, e??a? ep?s?? 1 st? µ?s?a t?? block.
  • Ost?s? e??a? p??a???, ta bits ?a e??a? 1 a??µ?
    ?a? ?ta? ? ???? de ß??s?eta? st? block. ??t?
    ???µ??eta? false drop.
  • ?? p?? e?d?af???? µ???? sta a??e?a ?p???af??
    e??a? µa µe???e? st? e????st? ? p??a??t?ta ?a
    ????µe false drop.

46
?a??de??µa
This is a text. A text has many words. Words are
made from letters
000101
110101
100100
101101
H(text) 000101 H(many) 110000 H(words)
100100 H(made) 001100 H(letters) 100001
? s????t?s? ?ata?e?µat?sµ?? ep????eta? ?ts? ?ste
?a ?p?????? t??????st?? ? bits e?e??? st??
?p???af? ???e ?????.
47
Se???a?? ??a??t?s?
  • ?p?????? pe??pt?se?? p?? de? ?p?????? ß????t????
    d?µ?? ded?µ???? ??a t?? a?a??t?s?.
  • ?? d??eta? ??a pattern P µe m ?a?a?t??e? ?a? ??a
    ?e?µe?? ? µe n ?a?a?t??e?, p??pe? ?a ß?e???? ??
    eµfa??se?? t?? P st? ?.
  • ????? p??ta?e? p????? µ???d?? ??a t?? ep???s? t??
    p??ß??µat??. St? s????e?a ?a e?et?s??µe µe?????
    ap? a?t??.

48
Se???a?? ??a??t?s?
  • ???fa??? µ???d?? (brute-force)
  • ????d?? t?? Knuth, Morris ?a? Pratt (KMP)
  • ????d?? Boyer-Moore
  • ????d?? Shift-or
  • ????d?? Suffix Automaton

49
Brute-Force
  • ???a? ? p?? ap?? µ???d?? a?a??t?s??.
  • ????µ????ta? se???a?? ??e? ?? ??se?? t?? ?e?µ????
    ?a? e????eta? e?? t? pattern ta?????e? µe t???
    ?a?a?t??e? t?? ?e?µ????.
  • ? d?ad??as?a a??????e?ta? ??? ?t?? ft?s??µe st?
    t???? t?? ?e?µ???? ?.

50
Brute-Force
a
b
r
a
c
a
b
r
a
c
a
d
a
b
r
a
a
b
r
a
c
a
d
a
a
a
b
a
a
b
r
a
c
a
d
a
b
r
a
51
Brute-Force
  • ?p?????? O(n) ??se?? st? ?e?µe?? ?a? ?(m) ??se??
    st? pattern.
  • ?f?s?? e?et????ta? ??e? ?? d??at?? ??se?? ??a t?
    pattern, ? p???p????t?ta ?e???te??? pe??pt?s??
    ??a t? µ???d? e??a? O(nm).
  • Ost?s? ? p???p????t?ta µ?s?? pe??pt?s?? e??a?
    O(n), d??t? se t??a?? ?e?µe?? ?a ????µe ap?t???a
    µet? ap? O(1) s?????se?? ?a?a?t????.

52
Knuth-Morris-Pratt
  • ???a? ? p??t?? a??????µ?? µe ??aµµ???
    p???p????t?ta ?e???te??? pe??pt?s?? p?? p??t????e
    (?(n)).
  • Ost?s? st? µ?s? pe??pt?s? ??e? pa??µ??a ap?d?s?
    µe t?? brute-force.
  • ? ßas??? te????? p?? ???s?µ?p??e?ta? e??a? ?t?
    ap?fe??eta? ? e??tas? ??se?? st?? ?p??e? e??a?
    s?????? ?t? de ?a ß?e?e? t? pattern.
  • ?ts?, de? e?et????ta? ??e? ?? d??at?? ??se??.

53
Knuth-Morris-Pratt
  • ?pa?te?ta? p??epe?e??as?a t?? pattern.
  • ?atas?e???eta? ??a? p??a?a? next, ? ?p????
    d????e? p?se? ??se?? µp????µe ?a p??????s??µe.
  • ???e ??s? j t?? p??a?a de???e? p?? e??a? t?
    µe?a??te?? ?a?????? p???eµa t?? P1..j-1 t? ?p???
    e??a? ep?s?? ?a? ep??eµa ?a? ?? ?a?a?t??e? p??
    a????????? e??a? d?af??et????.
  • ?p?µ????, µp????µe µe asf??e?a ?a pa?a??µ???µe j
    - nextj - 1 ?a?a?t??e?.

54
Knuth-Morris-Pratt
a
b
r
a
c
a
b
r
a
c
a
d
a
b
r
a
a
b
r
a
c
a
d
a
b
r
a
c
a
d
a
b
r
a
55
Knuth-Morris-Pratt
  • ? µ???d?? ???s?µ?p??e? ??a pa?????? t? ?p??? se
    ???e ß?µa ß??s?eta? se µ?a ??s? t?? ?e?µ????.
  • ?p???e? ??a? de??t?? (pointer) µ?sa st? pa??????.
  • ???e f??? p?? ??a? ?a?a?t??a? t?? pattern
    ta?????e?, ? de??t?? µeta???e?ta? µ?a ??s?
    pa?a??t?.
  • ???e f??? p?? de? ?p???e? ta???asµa, t? pa??????
    µeta???e?ta? e?? ? de??t?? pa?aµ??e? sta?e???.
  • ?f?s?? ???e f??? e?te t? pa?????? e?te ? de??t??
    µeta??????ta? ?at? µ?a ??s?, ? µ???d??
    p?a?µat?p??e? t? p??? 2n s?????se??.

56
Boyer-Moore
  • ?? pattern s??????eta? µe ?a?a?t??e? t?? ?e?µ????
    ap? t? t???? t?? pattern p??? t?? a???.
  • ?p?? ?a? ? µ???d?? KMP ???s?µ?p??e? t? match
    heuristic.
  • ??t?? ap? t? match heuristic, ???s?µ?p??e?ta? ?a?
    t? occurrence heuristic ? ?a?a?t??a? t??
    ?e?µ???? p?? p?????ese t? p??ß??µa p??pe? ?a
    e?????aµµ?ste? µe t? pattern µet? t? µeta????s?
    t?? pa?a?????.

57
Boyer-Moore
a
b
r
a
c
a
b
a
b
r
a
c
a
d
a
b
r
a
a
b
r
a
c
a
d
a
b
r
a
match heuristic µeta????s? 7 ??se??
a
b
r
a
c
a
d
a
b
r
a
a
b
r
a
c
a
d
a
b
r
a
occurence heuristic µeta????s? 5 ??se??
?????G???? ? ??G??????? ????????S?
58
Boyer-Moore
  • ??st?? p??epe?e??as?a? pattern ?(ms).
  • ??st?? a?a??t?s?? µ?s?? pe??pt?s?? ?(nlogm/m)
  • ??st?? a?a??t?s?? ?e???te??? pe??pt?s?? ?(mn).
  • ?a?a??a??? ??-ap??p???µ????, BM-Horspool,
    BM-Sunday, Commentz-Walter (ep??tas? ??a
    a?a??t?s? p????? patterns).

59
Shift-OR
  • St????eta? st?? te????? bit-parallelism.
  • ?e?t?????e? p?? af????? sta bits µ?a? ????? t??
    epe?e??ast? p?? ap?te?e?ta? ap? w bits.
  • ?? s?µe????? epe?e??ast?? st??????ta? se
    a???te?t?????? 32 ? 64 bits.
  • ? ße?t??s? p?? p??sf??e? ? µ???d?? st? ?????
    a?a??t?s?? t?? pattern e??a? a??et? ?a??.

60
Shift-OR
  • ? µ???d?? e??µ????e? t? ?e?t?????a e???
    µ?-?tete?µ???st???? a?t?µ?t?? t? ?p??? a?a??t? t?
    pattern st? ?e?µe??.
  • ?? a?t?µat? e??µ????eta? se ????? O(nm).
  • ? p???p????t?ta ?????? ?e???te??? pe??pt?s??
    e??a? O(nm/w) (optimal speedup).

61
Shift-OR
b
r
a
a
c
d
a
b
r
a
a
0 1 1 0 1
0 1 0 1 1
0
Ba
1 0 1 1 1
1 1 1 0 1
1
Bb
1 1 1 1 0
1 1 1 1 1
1
Bc
1 1 1 1 1
1 0 1 1 1
1
Bd
1 1 0 1 1
1 1 1 1 0
1
Br
1 1 1 1 1
1 1 1 1 1
1
B
62
Shift-OR
  • ? ?at?stas? t?? a?a??t?s?? ?ata???e?ta? se µ?a
    ???? µ??a??? Ddm d1, ?p?? t? bit di 0 ?ta? ?
    ?at?stas? i t?? a?t?µ?t?? e??a? e?e???.
  • ?p?µ???? ????µe ta?t?s? ?ta? dm 0.
  • bOR bitwise OR
  • bAND bitwise AND

63
Shift-OR
  • ?????? ??a ta bits t?? ????? D e??a? 1.
  • G?a ???e ??? ?a?a?t??a ?e?µ???? Tj p??
    e?et??eta?, ? ???? D e??µe???eta? ?? e??? D (
    D ltlt 1) bOR BTj.
  • ?? s?µß??? ltlt s?µa??e? ?t? ta bits µeta??????ta?
    µ?a ??s? a??ste?? (shift-left) ?a? t? p?? de??
    bit ???eta? 1.

64
(No Transcript)
65
(No Transcript)
66
????p???a Patterns
  • Se p????? pe??pt?se?? a?a??t??µe p?? p???p???a
    patterns ap? ap??? ???e??. ??et????µe ta e???
  • ??a??t?s? µe ???? (approximate matching)
  • ??a??t?s? extended patterns

67
Approximate Matching
  • ???eta? ??a pattern P µe?????? m, ?e?µe?? ?
    µe?????? n, ?a? ??a? a???a??? a???µ?? k ? ?p????
    d????e? t? µ???st? a???µ? ?a??? p?? ep?t??p??ta?
    st? ta???asµa.
  • ?? p??ß??µa e??a? p?? p??sfat? se s??s? µe t??
    a???ß? (exact) a?a??t?s?.
  • ?p?????? a??et?? ??se??. ?d? ?a s???t?s??µe d??
  • ???aµ???? p????aµµat?sµ??
  • ??t?µata

68
???aµ???? ?????aµµat?sµ??
  • ?? pe??ss?te??? ap? t??? a??????µ??? p??
    s?et????ta? µe epe?e??as?a ?µ???a? ?a? ???ss??
    a?????? st?? ???????e?a t?? ???aµ????
    ?????aµµat?sµ??.
  • ?eta?? a?t?? ß??s?eta? ?a? ? a??????µ?? p??
    ßas??eta? st?? e????st? ap?stas? (minimum edit
    distance).
  • ? ???aµ???? ?????aµµat?sµ?? ßas??eta? st?? a???
    ?t? t? a????? p??ß??µa µp??e? ?a ep????e?, µe
    ?at?????? s??d?asµ? t?? ??se?? µ????te???
    ?p?p??ß??µ?t??.

69
???aµ???? ?????aµµat?sµ??
  • ?st? p??a?a? C0m, 0n.
  • ?? st???e?? Ci,j d????e? t?? e????st? a???µ?
    ?a??? p?? ?p?????? ?at? t? ta???asµa t?? P1i µe
    ??p??? suffix t?? T1j.
  • ? ?p?????sµ?? ???eta? ?? e???

????µe ta???asµa ?ta? ??a ??p??a ??s? j ?s??e?
Cm,j lt k
70
???aµ???? ?????aµµat?sµ??
  • ????p????t?ta ?????? O(mn).
  • ????p????t?ta ????? O(m).
  • ????p????t?ta ?????? p??epe?e??as?a? O(m).
  • ???sfata ????? p??ta?e? a??????µ?? ?? ?p????
    pet??a????? ??????? p???p????t?ta O(kn).

71
???aµ???? ?????aµµat?sµ??
72
??t?µata
Write a Comment
User Comments (0)
About PowerShow.com