Boyer-Moore String Searching Algorithm - PowerPoint PPT Presentation

About This Presentation
Title:

Boyer-Moore String Searching Algorithm

Description:

Boyer-Moore String Searching Algorithm By: Matthew Brown ... (mn) Rabin-Karp string search algorithm: [average O(n+m)] (n = length of search string, ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 13
Provided by: RustySha7
Category:

less

Transcript and Presenter's Notes

Title: Boyer-Moore String Searching Algorithm


1
Boyer-Moore String Searching Algorithm
  • By Matthew Brown

2
String-Searching Algorithms
  • The goal of any string-searching algorithm is to
    determine whether or not a match of a particular
    string exists within another (typically much
    longer) string.
  • Many such algorithms exist, with varying
    efficiencies.
  • String-searching algorithms are important to a
    number of fields, including computational
    biology, computer science, and mathematics.

3
The Boyer-Moore String Search Algorithm
  • Developed in 1977, the B-M string search
    algorithm is a particularly efficient algorithm,
    and has served as a standard benchmark for string
    search algorithm ever since.
  • This algorithms execution time can be
    sub-linear, as not every character of the string
    to be searched needs to be checked.
  • Generally speaking, the algorithm gets faster as
    the target string becomes larger.

4
How does it work?
  • The B-M algorithm takes a backward approach
    the target string is aligned with the start of
    the check string, and the last character of the
    target string is checked against the
    corresponding character in the check string.
  • In the case of a match, then the second-to-last
    character of the target string is compared to the
    corresponding check string character. (No gain
    in efficiency over brute-force method)
  • In the case of a mismatch, the algorithm computes
    a new alignment for the target string based on
    the mismatch. This is where the algorithm gains
    considerable efficiency.

5
An example
  • Target string rockstar
  • Check string -------x-----
  • Aligning the start of each string pairs r with
    x.
  • Since x is not a character in rockstar, it
    makes no sense to check alignments beginning with
    any character in the check string which comes
    before x, and the B-M algorithm skips all such
    alignments.
  • This eliminates several (7, in this case)
    alignments to be checked by the algorithm, and we
    needed to compare only two characters.

6
Efficiency of the B-M Algorithm
  • The average-case performance of the B-M
    algorithm, for a target string of length M and
    check string of length N, is N/M.
  • In the best case, only one in M characters needs
    to be checked.
  • In the worst case, 3N comparisons need to be
    made, leading to a complexity of O(n), regardless
    of whether or not a match exists.

7
Pre-processing Tables
  • The B-M algorithm computes 2 preprocessing tables
    to determine the next suitable alignment after
    each failed verification.
  • The first table calculates how many positions
    ahead of the current position to start the next
    search (based on character which caused failed
    verification).
  • The second table makes a similar calculation
    based on how many characters were matched
    successfully before a failed verification
  • These tables are often referred to as jump
    tables, though this leads to some ambiguity with
    the more common meaning of the term in computer
    science, which refers to an efficient way of
    transferring control from one part of a program
    to another.

8
Calculation of Preprocessing Tables
  • Table 1
  • Starting at the last character of the target
    string, move left toward the first character. At
    each character, if the character is not already
    in the table, add it to the table.
  • This characters shift value is equal to its
    distance from the right-most character in the
    string.
  • All other characters receive a shift value equal
    to the total length of the string.
  • Example peterpan would produce the following
    table (character, shift) (A, 1), (P, 2), (R,
    3), (E, 4),
  • (T, 5), (all other characters, 8)

9
Calculation of Preprocessing Tables
  • Table 2
  • First, for each value of i less than the length
    of the target string, calculate the pattern of
    the last i characters of the target string
    preceded by a mis-match for the character before
    it.
  • Then, determine the least number of characters of
    the partial pattern that must be shifted left
    before two patterns match.
  • Example for ANPANMAN, the table would be (I,
    pattern, shift) (0, -N, 1), (1, (-A)N, 8), (2,
    (-M)AN, 3), (3, (-N)MAN, 6), (4, (-A)NMAN, 6),
    (5, (-P)ANMAN, 6), (6, (-N)PANMAN, 6), (7,
    (-A)NPANMAN, 6). (here, -X means not X)

10
Comparison of String Searching Algorithm
Complexities
  • Boyer-Moore O(n)
  • Naïve string search algorithm O((n-m1)m)
  • Bitap Algorithm O(mn)
  • Rabin-Karp string search algorithm average
    O(nm)
  • (n length of search string, m length of
    target string)

11
About the Creators
  • Robert Boyer is a retired Professor Emeritus of
    the University of Texas at Austin Computer
    Science Department. He received his BA and PhD
    in mathematics at UT Austin, and has authored and
    co-authored several books concerning automatic
    theorem-proving.

J. Strother Moore is Admiral B.R. Inman
Centennial Chair in Computer Theory of the
Department of Computer Sciences at UT Austin. He
received his BS in mathematics from MIT in 1970,
and his PhD in computational logic from the
University of Edinburgh in 1973. He has authored
and co-authored several books concerning
automatic theorem-proving, some of them in
cooperation with Robert Boyer.
12
References
  • Wikipedia.org
  • http//www-igm.univ-mlv.fr/lecroq/string/
  • Epp, Susanna S. Discrete Mathematics with
    Applications. 3rd Ed., Brooks/Cole 2004.
Write a Comment
User Comments (0)
About PowerShow.com