Title: Joint Advanced Student School 2004 Complexity Analysis of String Algorithms
1 - Joint Advanced Student School 2004Complexity
Analysis of String Algorithms - Sequential Pattern MatchingAnalysis of
Knuth-Morris-Pratt type algorithms using the
Subadditive Ergodic Theorem - 26 May 2015
2Overview
- Pattern Matching
- Sequential Algorithms
- Knuth-Morris-Pratt-Algorithm
- Probabilistic tools
- Subadditive Ergodic Theorem
- Martingales and Azuma's Inequality
- Analysis of KMP-Algorithms
- Properties of KMP
- Establishing subadditivity
- Analysis
3Pattern Matching
Pattern-text comparison M(l,k)1
Pattern p
abcde
Text t
xxxxxabxxxabcxxxabcde
Alignment position AP
- Text , pattern
- Comparison
- Alignment Positionfor some k.
4Sequential Algorithms - Definition
- Semi-sequential AP are non-decreasing.
- Strongly semi-sequential (i) and comparisons
define non-decreasing text
positions . - Sequential (i) and
- Strongly sequential (i), (ii) and (iii)
abcde
Text is compared only if following a prefix of
the pattern. Example
xxxxxabxxxabcxxxabcde
5Example Naive / brute force algorithm
1
1
abcde
1
abcde
abcde
xxxxxabxxxabcxxxabcde
- Every text position is alignment position.
- Text is scanned until...
- pattern is found - then done.
- mismatch occurs - then shift by one and retry.
- Sequential algorithm.
6Knuth-Morris-Pratt type algorithms (1)
S
ababcde
ababcde
xxxxxabxxxabcxxxabcde
- Idea (Morris-Pratt) Disreagard APs already known
not to be followed by a prefix of p. - Knowledge
- Already processed pattern
- Pre-processing of p.
- Strongly sequential algorithm.
7Knuth-Morris-Pratt type algorithms (2)
- Morris-Pratt
- Knuth-Morris-Pratt
ababcde
ababcde
xxxxxabxxxabcxxxabcde
ababcde
(KMP also skips mismatching letters)
ababcde
xxxxxabxxxabcxxxabcde
8Pattern Matching - Complexity
- Overall complexity
- Pattern or text is a realization of random
sequence - Question complexity of KMP?
9Subadditivity Deterministic Sequence
- Fekete (1923)
- Subadditivity
- Superadditivity
10Example Longest Common Subsequence
ababcafbcdabcde
ababcafb
cdabcde
abcdeabcdfabcab
abcdeabc
dfabcab
LCS "abcabcdabc" (10)
LCS "abcab" (5), "dabc" (4)
(Conjectured by Steele in 1982)
11Subadditivity "Almost subadditive"
- DeBruijn and Erdös (1952)
- positive and non-decreasing sequence
- "Almost subadditive"
12Subadditive Ergodic Theorem
- Kingman (1976), Liggett (1985)
-
- is a
stationary sequence - does not depend on m
-
13Almost Subadditive Ergodic Theorem
- Deriennic (1983)
- Subadditivity can be relaxed towith
- Then, too
14Martingales
- A sequenceis a martingale with respect to the
filtration if for
all -
-
- defines a random variable
depending on the knowledge contained in
.
15Martingale Differences
- The martingale difference is defined asso
that - Observe
16Azuma's Inequality (1)
- Let be a
martingale - Define the martingale difference as(The mean
of the same element but depending on different
knowledge) - Observe
(Deviation from the mean)
17Hoeffding's Inequality
- Let be a martingale
- Let there exist constant
- Then
18Azuma's Inequality (2)
- Summary
- If is bounded, we know how to assess the
deviation from the mean. - So now we need a bound on .
- Trick Let be an independent copy of .
- Then
19Azuma's Inequality (3)
- Hence
- And we can postulate
20Azuma's Inequality (4)
- Let be a
martingale - If there exists constant such thatwhere
is an independent copy of - Then
21KMP Unavoidable alignment positions
- A position in the text is called unavoidable AP
if for any r,l
it's an AP when run on . - KMP-like algorithms have the same set of
unavoidable alignment positionswhere - Example
abcde
xxxxxabxxxabcxxxabcde
22Pattern Matching l-convergence
- An algorithm is l-convergent if there exists an
increasing sequence of unavoidable alignment
positions satisfying - l-convergence indicates the maximum size "jumps"
for an algorithm.
23KMP Establishing m-convergence
- Let AP be an alignment position
- Define
-
- Hence and so KMP-like
algorithms are m-convergent.
24KMP Establishing subadditivity (1)
- If (number of comparisons) is subadditive
we can prove linear complexity of KMP-like
algorithms. - We have to show is (almost) subadditive
- ApproachAn l-convergent sequential algorithm
satisfies
25KMP Establishing subadditivity (2)
- Proof
- the smallest unavoidable AP greater than
r. - We split into
and .
26KMP Establishing subadditivity (3)
- Comparisons done after r with AP before r
- Comparisons with AP between r and
- No more than m comparisons can be saved at
Contributing to only
Contributing to and
?
?
?
S2
Contributing to and
?
?
S1
?
27KMP Establishing subadditivity (4)
- Comparisons with AP between r and
- No more than m comparisons can be saved at
Contributing to only
?
?
?
S3
Contributing to and
?
28KMP Establishing subadditivity (5)
- So we are able to bound
- We have shown is (almost) subadditive
- Now we are able to apply the Subadditive Ergodic
Theorem.
29KMP Different Modeling Assumptions
- Deterministic ModelText and pattern are non
random. - Semi-Random ModelText is a realization of a
stationary and ergodic sequence, pattern is
given. - Stationary modelBoth text and pattern are
realizations of a stationary and ergodic sequence.
30KMP Applying the Subadditive Ergodic Theorem
- We have shown is (almost) subadditive
- Deterministic Model
- Semi-Random Model
- Stationary Model
31KMP Applying Azuma's Inequality
- satisfieswhere is an independent
copy of . - So, using Azuma's Inequality
- is concentrated around its mean
32Conclusion
- Using the Subadditive Ergodic Theorem we can show
there exists a linearity constant for the worst
and average case resp. KMP has linear
complexity. - The Subadditive Ergodic Theorem proves the
existence of this constant but says nothing how
to compute it. - Using Azuma's Inequality we can show that the
number of comparisons is well concentrated around
its mean.