Title: CS 3343: Analysis of Algorithms
1CS 3343 Analysis of Algorithms
- Lecture 14 Order Statistics
2Order statistics
- The ith order statistic in a set of n elements is
the ith smallest element - The minimum is thus the 1st order statistic
- The maximum is the nth order statistic
- The median is the n/2 order statistic
- If n is even, there are 2 medians
- How can we calculate order statistics?
- What is the running time?
3Order statistics selection problem
- Select the ith smallest of n elements
- Naive algorithm Sort.
- Worst-case running time Q(n log n)
- using merge sort or heapsort (not quicksort).
- We will show
- A practical randomized algorithm with Q(n)
expected running time - A cool algorithm of theoretical interest only
with Q(n) worst-case running time
4Recall Quicksort
- The function Partition gives us the rank of the
pivot - If we are lucky, k i. done!
- If not, at least get a smaller subarray to work
with - k gt i ith smallest is on the left subarray
- k lt i ith smallest is on the right subarray
- Divide and conquer
- If we are lucky, k close to n/2, or desired is
in smaller subarray - If unlucky, desired is in larger subarray
(possible size n-1)
5Randomized divide-and-conquer algorithm
RAND-SELECT(A, p, q, i) ? i th smallest of A p .
. q if p q i gt 1 then error! r ?
RAND-PARTITION(A, p, q) k ? r p 1 ? k
rank(Ar) if i k then return A r if i lt k
then return RAND-SELECT( A, p, r 1, i ) else
return RAND-SELECT( A, r 1, q, i k )
6Randomized Partition
- Randomly choose an element as pivot
- Every time need to do a partition, throw a die to
decide which element to use as the pivot - Each element has 1/n probability to be selected
Rand-Partition(A, p, q) d random() //
draw a random number between 0 and 1 index
p floor((q-p1) d) // pltindexltq
swap(Ap, Aindex) Partition(A, p, q)
// now use Ap as pivot
7Example
Select the i 6th smallest
i 6
7
10
5
8
11
3
2
13
pivot
8Complete example select the 6th smallest element.
7
10
5
8
11
3
2
13
i 6
Note here we always used first element as pivot
to do the partition (instead of rand-partition).
9Intuition for analysis
(All our analyses today assume that all elements
are distinct.)
Lucky
T(n) T(9n/10) Q(n) Q(n)
CASE 3
10Running time of randomized selection
T(max(0, n1)) n if 0 n1 split, T(max(1,
n2)) n if 1 n2 split, M T(max(n1, 0))
n if n1 0 split,
T(n)
- For upper bound, assume ith element always falls
in larger side of partition - The expected running time is an average of all
cases
Expectation
11Substitution method
Want to show T(n) O(n). So need to prove T(n)
cn for n gt n0
Assume T(k) ck for all k lt n
if c 4
Therefore, T(n) O(n)
12Summary of randomized selection
- Works fast linear expected time.
- Excellent algorithm in practice.
- But, the worst case is very bad Q(n2).
Q. Is there an algorithm that runs in linear time
in the worst case?
13Worst-case linear-time selection
Same as RAND-SELECT
14Choosing the pivot
15Choosing the pivot
- Divide the n elements into groups of 5.
16Choosing the pivot
- Divide the n elements into groups of 5. Find the
median of each 5-element group by rote.
17Choosing the pivot
x
- Divide the n elements into groups of 5. Find the
median of each 5-element group by rote. - Recursively SELECT the median x of the ë n/5û
group medians to be the pivot.
18Analysis
x
At least half the group medians are x, which is
at least ë ë n/5û /2û ë n/10û group medians.
19Analysis
x
- At least half the group medians are x, which is
at least ë ë n/5û /2û ë n/10û group medians. - Therefore, at least 3 ë n/10û elements are x.
(Assume all elements are distinct.)
20Analysis
x
- At least half the group medians are x, which is
at least ë ë n/5û /2û ë n/10û group medians. - Therefore, at least 3 ë n/10û elements are x.
- Similarly, at least 3 ë n/10û elements are ³ x.
21Analysis
Need at most for worst-case runtime
- At least 3 ë n/10û elements are x ? at most
n-3 ë n/10û elements are ? x - At least 3 ë n/10û elements are ? x ? at most
n-3 ë n/10û elements are ? x - The recursive call to SELECT in Step 4 is
executed recursively on at most n-3 ë n/10û
elements.
22Analysis
- Use fact that ë a/bû gt a/b-1
- n-3 ë n/10û lt n-3(n/10-1) ? 7n/10 3
- ? 3n/4 if n 60
- The recursive call to SELECT in Step 4 is
executed recursively on at most 7n/103 elements.
23Developing the recurrence
T(n)
Q(n)
T(n/5)
Q(n)
T(7n/103)
24Solving the recurrence
Assumption T(k) ck for all k lt n
if n 60
if c 20 and n 60
25Conclusions
- Since the work at each level of recursion is
basically a constant fraction (19/20) smaller,
the work per level is a geometric series
dominated by the linear work at the root. - In practice, this algorithm runs slowly, because
the constant in front of n is large. - The randomized algorithm is far more practical.
Exercise Try to divide into groups of 3 or
7. Exercise Think about an application in
sorting.