A Fast String Searching Algorithm - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

A Fast String Searching Algorithm

Description:

Symbols used: S : the set of alphabets. patlen : the length of pattern ... if i n then return false. Boyer-Moore Matching Algorithm. Time Complexity: ... – PowerPoint PPT presentation

Number of Views:144
Avg rating:3.0/5.0
Slides: 22
Provided by: eecs5
Category:

less

Transcript and Presenter's Notes

Title: A Fast String Searching Algorithm


1
A Fast String Searching Algorithm
  • Robert S. Boyer,
  • and J Strother Moore.
  • Communication of the ACM,
  • vol.20 no.10 , Oct. 1977

2
Outline
  • Introduction
  • The Knuth-Morris-Pratt algorithm
  • The Boyer-Moore algorithm
  • Bad Character heuristic
  • Good Suffix heuristic
  • Matching Algorithm
  • Experimental Result
  • Conclusion

3
Introduction
  • String Matching
  • Searching a pattern from a text or a longer
    string.
  • If the pattern exist in the string, return the
    position of the first character in the substring
    which match the pattern.

4
Introduction (cont.)
  • Some definition
  • m the length of the pattern.
  • n the length of the string( or text ).
  • s (shift) the distance between first
    character of matched substring and start
    character.
  • w ? x a string w is a prefix of a string x.
  • w ? x a string w is a suffix of a string x.

5
Introduction (cont.)
  • The naive string-matching algorithm
  • Time Complexity
  • T((n-m1)m) in the worse case.
  • T(n2) if m

for s ? 0 to n-m do if pattern1..m
strings1..sm printf Pattern occurs with
shift s
6
Knuth-Morris-Pratt Algorithm
s q s k
7
Knuth-Morris-Pratt Algorithm(cont.)
  • Prefix Function
  • f(j) largest i lt j such that P1..i
    Pj-i1..j
  • 0 if I dose not exist.

A
B
A
B
A
Pq
Pk ? Pq
A B A
Pk
8
Knuth-Morris-Pratt Algorithm(cont.)
  • Prefix Function Algorithm

f1 ?0 k?0 for q?2 to m do while kgt0 and
Pk1 ?Pq do k ? fk if Pk1
Pq then k ? k1 fq k return f1..m
9
Knuth-Morris-Pratt Algorithm(cont.)
  • Example

3
2
1
0
  • Time Complexity
  • Prefix function O(m) by amortize analysis
  • Matching function O(n)
  • Total O(mn) ? Linear Complexity

10
The Boyer-Moore Algorithm
  • Symbols used
  • S the set of alphabets
  • patlen the length of pattern
  • m the last m characters of pattern matched
  • char the mismatched character

char


string
pattern
m
11
Characteristic
  • Match pattern from rightmost character of the
    pattern to the left most character of the
    pattern.
  • Pattern is relatively long, and S is reasonably
    large, this algorithm is likely to be the most
    efficient string-matching algorithm.

12
Bad Character heuristic
  • Observation 1
  • if the char doesnt occur in pat
  • Pattern Shift j character
  • String pointer shift patlen character
  • Example

A D C A B C A B A
13
Bad Character heuristic (cont.)
  • Observation 2
  • The char occur in the pattern
  • The rightmost char in pattern in position
    d1char and the pointer to the pattern is in j
  • If j lt d1 char we shift the pattern right by 1
  • If j gt d1 char we shift the pattern right by
  • j- d1 char
  • d1 is an array which size is the size of S

14
Bad Character heuristic (cont.)
  • Example

A C B B A C A B C A
A B C
j 3 and d1B 2 pattern shift 1 string
pointer shift 1 (m pattern shift)
15
Good Suffix heuristic
  • 2 sequence c1.. cn and d1.. dn is unify if
    for j from 1 to patlen, either ci di or ci
    or di , which be a character doesnt occur
    in pat.
  • the position of rightmost plausible reoccurrence,
    rpr(j) k , such that pat(j1)..pat(patlen)
    and pat(k)..pat(kpatlen j - 1) are unify,
    and either k1 or pat(k-1) ?pat(j)

16
Good Suffix heuristic (cont.)
  • Example
  • Pattern shift j1 rar(j)
  • String pointer shift m j 1 rar(j)
  • strlen j j 1 rar(j)
  • d2j

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9
A B X Y C D E X Y
-7 -6 -5 -4 -3 -2 3 0 1
j
pat
rpr(j)
17
Good Suffix heuristic (cont.)
  • Algorithm

18
Boyer-Moore Matching Algorithm
i patlen if n lt patlen return false j
patlen While j gt 0 do if string(i)
pat(j) j j-1 i i-1 else i i
max(d1(string(i)) , d2 (j) ) if i gt n then
return false
19
Boyer-Moore Matching Algorithm
  • Time Complexity
  • Bad Character heuristic O(patlen)
  • Good Suffix heuristic O(patlen)
  • Matching O(n)
  • Total O(npatlen)

20
Experimental Result
21
Conclusion
  • Boyer-Moore algorithm have sublinear time
    complexity O(nm)
  • Boyer-Moore is most efficient string matching
    algorithm when pattern is long and character is
    reasonably large.
Write a Comment
User Comments (0)
About PowerShow.com