A Fast String Searching Algorithm presentation

About This Presentation

Transcript and Presenter's Notes

Title: A Fast String Searching Algorithm

1
A Fast String Searching Algorithm

Robert S. Boyer,
and J Strother Moore.
Communication of the ACM,
vol.20 no.10 , Oct. 1977

2
Outline

Introduction
The Knuth-Morris-Pratt algorithm
The Boyer-Moore algorithm
Bad Character heuristic
Good Suffix heuristic
Matching Algorithm
Experimental Result
Conclusion

3
Introduction

String Matching
Searching a pattern from a text or a longer
string.
If the pattern exist in the string, return the
position of the first character in the substring
which match the pattern.

4
Introduction (cont.)

Some definition
m the length of the pattern.
n the length of the string( or text ).
s (shift) the distance between first
character of matched substring and start
character.
w ? x a string w is a prefix of a string x.
w ? x a string w is a suffix of a string x.

5
Introduction (cont.)

The naive string-matching algorithm
Time Complexity
T((n-m1)m) in the worse case.
T(n2) if m

for s ? 0 to n-m do if pattern1..m
strings1..sm printf Pattern occurs with
shift s
6
Knuth-Morris-Pratt Algorithm
s q s k
7
Knuth-Morris-Pratt Algorithm(cont.)

Prefix Function
f(j) largest i lt j such that P1..i
Pj-i1..j
0 if I dose not exist.

A
B
A
B
A
Pq
Pk ? Pq
A B A
Pk
8
Knuth-Morris-Pratt Algorithm(cont.)

Prefix Function Algorithm

f1 ?0 k?0 for q?2 to m do while kgt0 and
Pk1 ?Pq do k ? fk if Pk1
Pq then k ? k1 fq k return f1..m
9
Knuth-Morris-Pratt Algorithm(cont.)

Example

3
2
1
0

Time Complexity
Prefix function O(m) by amortize analysis
Matching function O(n)
Total O(mn) ? Linear Complexity

10
The Boyer-Moore Algorithm

Symbols used
S the set of alphabets
patlen the length of pattern
m the last m characters of pattern matched
char the mismatched character

char

string
pattern
m
11
Characteristic

Match pattern from rightmost character of the
pattern to the left most character of the
pattern.
Pattern is relatively long, and S is reasonably
large, this algorithm is likely to be the most
efficient string-matching algorithm.

12
Bad Character heuristic

Observation 1
if the char doesnt occur in pat
Pattern Shift j character
String pointer shift patlen character
Example

A D C A B C A B A
13
Bad Character heuristic (cont.)

Observation 2
The char occur in the pattern
The rightmost char in pattern in position
d1char and the pointer to the pattern is in j
If j lt d1 char we shift the pattern right by 1
If j gt d1 char we shift the pattern right by
j- d1 char
d1 is an array which size is the size of S

14
Bad Character heuristic (cont.)

Example

A C B B A C A B C A
A B C
j 3 and d1B 2 pattern shift 1 string
pointer shift 1 (m pattern shift)
15
Good Suffix heuristic

2 sequence c1.. cn and d1.. dn is unify if
for j from 1 to patlen, either ci di or ci
or di , which be a character doesnt occur
in pat.
the position of rightmost plausible reoccurrence,
rpr(j) k , such that pat(j1)..pat(patlen)
and pat(k)..pat(kpatlen j - 1) are unify,
and either k1 or pat(k-1) ?pat(j)

16
Good Suffix heuristic (cont.)

Example
Pattern shift j1 rar(j)
String pointer shift m j 1 rar(j)
strlen j j 1 rar(j)
d2j

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9
A B X Y C D E X Y
-7 -6 -5 -4 -3 -2 3 0 1
j
pat
rpr(j)
17
Good Suffix heuristic (cont.)

Algorithm

18
Boyer-Moore Matching Algorithm
i patlen if n lt patlen return false j
patlen While j gt 0 do if string(i)
pat(j) j j-1 i i-1 else i i
max(d1(string(i)) , d2 (j) ) if i gt n then
return false
19
Boyer-Moore Matching Algorithm

Time Complexity
Bad Character heuristic O(patlen)
Good Suffix heuristic O(patlen)
Matching O(n)
Total O(npatlen)

20
Experimental Result
21
Conclusion

Boyer-Moore algorithm have sublinear time
complexity O(nm)
Boyer-Moore is most efficient string matching
algorithm when pattern is long and character is
reasonably large.

Write a Comment

User Comments (0)

About PowerShow.com

A Fast String Searching Algorithm PowerPoint PPT Presentation