Design and Analysis of Computer Algorithm Lecture 8 - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Design and Analysis of Computer Algorithm Lecture 8

Description:

This lecture note has been modified from lecture note by Prof. Somchai ... Finite automaton for P = AABC' Design and Analysis of Computer Algorithm. 15. KMP Flow chart ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 18
Provided by: pradondet
Category:

less

Transcript and Presenter's Notes

Title: Design and Analysis of Computer Algorithm Lecture 8


1
Design and Analysis of Computer AlgorithmLecture
8
  • Pradondet Nilagupta
  • Department of Computer Engineering

This lecture note has been modified from lecture
note by Prof. Somchai Prasitjutrakul and Prof.
Dimitris Papadias
2
String Matching
3
Notation
  • P The pattern being searched for
  • T The text in which P is sought
  • m The length of P
  • n The length of T
  • pi,ti The ith characters in P and T are denoted
    with
  • lower case letters and subscripts.
  • j Current position withing T
  • k Current position withing P

4
string matching
  • Naive string matching
  •   for (i0 Ti ! '\0' i)
  • for (j0 Tij ! '\0' Pj ! '\0'
    TijPj j)
  • if (Pj '\0') found a match
  • There are two nested loops the inner one takes
    O(m) iterations and the outer one takes O(n)
    iterations so the total time is the product,
    O(mn). This is slow we'd like to speed it up.

5
Example
  • Suppose we're looking for pattern "nano" in text
    "banananobano".
  • Each row represents an iteration of the outer
    loop, with each character in the row representing
    the result of a comparison (X if the comparison
    was unequal).
  • Suppose we're looking for pattern "nano" in text
    "banananobano".

6
Example
7
Note
  • Some of these comparisons are wasted work!
  • For instance, after iteration i2, we know from
    the comparisons we've done that T3"a", so
    there is no point comparing it to "n" in
    iteration i3.
  • And we also know that T4"n", so there is no
    point making the same comparison in iteration
    i4.

8
Skipping outer iterations
  • Try overlapping the partial match you've found
    with the new match you want to find
  • i2 n a n
  • i3 n a n o
  • we know from the i2 iteration that T3 and T4
    are "a" and "n", so they can't be the "n" and "a"
    that the i3 iteration is looking for. We can
    keep skipping positions until we find one that
    doesn't conflict
  • i2 n a n
  • i4 n a n o

9
String matching with skipped iterations
  • i0
  • while (iltn)
  • for (j0 Tij ! '\0' Pj ! '\0'
    TijPj j)
  • if (Pj '\0') found a match
  • i i max(1, j-overlap(P0..j-1,P0..m))

10
Skipping inner iterations
  • The other optimization that can be done is to
    skip some iterations in the inner loop. Let's
    look at the same example, in which we skipped
    from i2 to i4
  • i2 n a n
  • i4 n a n o
  • the "n" that overlaps has already been tested by
    the i2 iteration. There's no need to test it
    again in the i4 iteration. In general, if we
    have a nontrivial overlap with the last partial
    match, we can avoid testing a number of
    characters equal to the length of the overlap.

11
KMP, version 1
  •   i0
  • o0
  • while (iltn)
  • for (jo Tij ! '\0' Pj ! '\0'
    TijPj j)
  • if (Pj '\0') found a match
  • o overlap(P0..j-1,P0..m)
  • i i max(1, j-o)

The only remaining detail is how to compute the
overlap function. This is a function only of j,
and not of the characters in T, so we can
compute it once in a preprocessing stage
12
KMP time analysis (1/2)
  • We still have an outer loop and an inner loop, so
    it looks like the time might still be O(mn). But
    we can count it a different way to see that it's
    actually always less than that.
  • We split the comparisons into two groups
  • those that return true, and those that return
    false.
  • If a comparison returns true, we've determined
    the value of Tij. Then in future iterations,
    as long as there is a nontrivial overlap
    involving Tij, we'll skip past that overlap
    and not make a comparison with that position
    again.

13
KMP time analysis (2/2)
  • So each position of T is only involved in one
    true comparison, and there can be n such
    comparisons total.
  • On the other hand, there is at most one false
    comparison per iteration of the outer loop, so
    there can also only be n of those. As a result we
    see that this part of the KMP algorithm makes at
    most 2n comparisons and takes time O(n).

14
Finite State Machine
Finite automaton for P AABC
15
KMP Flow chart
16
KMP Flow chart
17
Example Action of KMP flowchart
P ABABCB TACABAABABA
Write a Comment
User Comments (0)
About PowerShow.com