Joint Advanced Student School 2004 Complexity Analysis of String Algorithms - PowerPoint PPT Presentation

About This Presentation
Title:

Joint Advanced Student School 2004 Complexity Analysis of String Algorithms

Description:

We have to show: is (almost) subadditive: Approach: An l-convergent sequential algorithm satisfies: KMP: Establishing subadditivity (2) Proof: : the smallest ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 33
Provided by: Tobias93
Category:

less

Transcript and Presenter's Notes

Title: Joint Advanced Student School 2004 Complexity Analysis of String Algorithms


1
  • Joint Advanced Student School 2004Complexity
    Analysis of String Algorithms
  • Sequential Pattern MatchingAnalysis of
    Knuth-Morris-Pratt type algorithms using the
    Subadditive Ergodic Theorem
  • 26 May 2015

2
Overview
  • Pattern Matching
  • Sequential Algorithms
  • Knuth-Morris-Pratt-Algorithm
  • Probabilistic tools
  • Subadditive Ergodic Theorem
  • Martingales and Azuma's Inequality
  • Analysis of KMP-Algorithms
  • Properties of KMP
  • Establishing subadditivity
  • Analysis

3
Pattern Matching
Pattern-text comparison M(l,k)1
Pattern p
abcde
Text t
xxxxxabxxxabcxxxabcde
Alignment position AP
  • Text , pattern
  • Comparison
  • Alignment Positionfor some k.

4
Sequential Algorithms - Definition
  1. Semi-sequential AP are non-decreasing.
  2. Strongly semi-sequential (i) and comparisons
    define non-decreasing text
    positions .
  3. Sequential (i) and
  4. Strongly sequential (i), (ii) and (iii)

abcde
Text is compared only if following a prefix of
the pattern. Example
xxxxxabxxxabcxxxabcde
5
Example Naive / brute force algorithm
1
1
abcde
1
abcde
abcde
xxxxxabxxxabcxxxabcde
  • Every text position is alignment position.
  • Text is scanned until...
  • pattern is found - then done.
  • mismatch occurs - then shift by one and retry.
  • Sequential algorithm.

6
Knuth-Morris-Pratt type algorithms (1)
S
ababcde
ababcde
xxxxxabxxxabcxxxabcde
  • Idea (Morris-Pratt) Disreagard APs already known
    not to be followed by a prefix of p.
  • Knowledge
  • Already processed pattern
  • Pre-processing of p.
  • Strongly sequential algorithm.

7
Knuth-Morris-Pratt type algorithms (2)
  • Morris-Pratt
  • Knuth-Morris-Pratt

ababcde
ababcde
xxxxxabxxxabcxxxabcde
ababcde
(KMP also skips mismatching letters)
ababcde
xxxxxabxxxabcxxxabcde
8
Pattern Matching - Complexity
  • Overall complexity
  • Pattern or text is a realization of random
    sequence
  • Question complexity of KMP?

9
Subadditivity Deterministic Sequence
  • Fekete (1923)
  • Subadditivity
  • Superadditivity

10
Example Longest Common Subsequence
ababcafbcdabcde
ababcafb
cdabcde
abcdeabcdfabcab
abcdeabc
dfabcab
LCS "abcabcdabc" (10)
LCS "abcab" (5), "dabc" (4)
  • Superadditive
  • Hence

(Conjectured by Steele in 1982)
11
Subadditivity "Almost subadditive"
  • DeBruijn and Erdös (1952)
  • positive and non-decreasing sequence
  • "Almost subadditive"

12
Subadditive Ergodic Theorem
  • Kingman (1976), Liggett (1985)
  • is a
    stationary sequence
  • does not depend on m

13
Almost Subadditive Ergodic Theorem
  • Deriennic (1983)
  • Subadditivity can be relaxed towith
  • Then, too

14
Martingales
  • A sequenceis a martingale with respect to the
    filtration if for
    all
  • defines a random variable
    depending on the knowledge contained in
    .

15
Martingale Differences
  • The martingale difference is defined asso
    that
  • Observe

16
Azuma's Inequality (1)
  • Let be a
    martingale
  • Define the martingale difference as(The mean
    of the same element but depending on different
    knowledge)
  • Observe

(Deviation from the mean)
17
Hoeffding's Inequality
  • Let be a martingale
  • Let there exist constant
  • Then

18
Azuma's Inequality (2)
  • Summary
  • If is bounded, we know how to assess the
    deviation from the mean.
  • So now we need a bound on .
  • Trick Let be an independent copy of .
  • Then

19
Azuma's Inequality (3)
  • Hence
  • And we can postulate

20
Azuma's Inequality (4)
  • Let be a
    martingale
  • If there exists constant such thatwhere
    is an independent copy of
  • Then

21
KMP Unavoidable alignment positions
  • A position in the text is called unavoidable AP
    if for any r,l
    it's an AP when run on .
  • KMP-like algorithms have the same set of
    unavoidable alignment positionswhere
  • Example

abcde
xxxxxabxxxabcxxxabcde
22
Pattern Matching l-convergence
  • An algorithm is l-convergent if there exists an
    increasing sequence of unavoidable alignment
    positions satisfying
  • l-convergence indicates the maximum size "jumps"
    for an algorithm.

23
KMP Establishing m-convergence
  • Let AP be an alignment position
  • Define
  • Hence and so KMP-like
    algorithms are m-convergent.

24
KMP Establishing subadditivity (1)
  • If (number of comparisons) is subadditive
    we can prove linear complexity of KMP-like
    algorithms.
  • We have to show is (almost) subadditive
  • ApproachAn l-convergent sequential algorithm
    satisfies

25
KMP Establishing subadditivity (2)
  • Proof
  • the smallest unavoidable AP greater than
    r.
  • We split into
    and .

26
KMP Establishing subadditivity (3)
  • Comparisons done after r with AP before r
  • Comparisons with AP between r and
  • No more than m comparisons can be saved at

Contributing to only
Contributing to and
?
?
?
S2
Contributing to and
?
?
S1
?
27
KMP Establishing subadditivity (4)
  • Comparisons with AP between r and
  • No more than m comparisons can be saved at

Contributing to only
?
?
?
S3
Contributing to and
?
28
KMP Establishing subadditivity (5)
  • So we are able to bound
  • We have shown is (almost) subadditive
  • Now we are able to apply the Subadditive Ergodic
    Theorem.

29
KMP Different Modeling Assumptions
  • Deterministic ModelText and pattern are non
    random.
  • Semi-Random ModelText is a realization of a
    stationary and ergodic sequence, pattern is
    given.
  • Stationary modelBoth text and pattern are
    realizations of a stationary and ergodic sequence.

30
KMP Applying the Subadditive Ergodic Theorem
  • We have shown is (almost) subadditive
  • Deterministic Model
  • Semi-Random Model
  • Stationary Model

31
KMP Applying Azuma's Inequality
  • satisfieswhere is an independent
    copy of .
  • So, using Azuma's Inequality
  • is concentrated around its mean

32
Conclusion
  • Using the Subadditive Ergodic Theorem we can show
    there exists a linearity constant for the worst
    and average case resp. KMP has linear
    complexity.
  • The Subadditive Ergodic Theorem proves the
    existence of this constant but says nothing how
    to compute it.
  • Using Azuma's Inequality we can show that the
    number of comparisons is well concentrated around
    its mean.
Write a Comment
User Comments (0)
About PowerShow.com