#
3.2.2. Selection of the k-th Smallest Element
In
- ref. [A. Aho, J. Hopcroft, and J. Ullman, Design and Analysis of Algorithms, Addison-Wesley, 1974. 3.6]
- Problem
- Given a sequence of S of n elements and an integer k (1 <= k <= n), find the k^{th} smallest element of S.
- Solution 1:
- Choose the smallest element repeatedly k times.
- C = c(n-1)+c(n-2)+c(n-3)+...+c(n-k) = c \cdot k \cdot n - c \cdot \frac {k(k+1)} 2
- if k= \frac n 2 then C = c \cdot \frac {n^2} 2 - c \cdot \frac {n^2 + 2n} 8 = O(n^2)
- Choose the smallest element repeatedly k times.
- Solution 2:
- Build a min-heap, and then extract the smallest element repeatedly k times.
- C = c \cdot n + d \cdot k \cdot log n
- if k= \frac n 2 then C = c \cdot n + d \cdot \frac n 2 \cdot \log n = O(n \log n)
- Build a min-heap, and then extract the smallest element repeatedly k times.
- Can we design an O(n)-time algorithm?
#
1. Observation
At least O(n) time is necessary.
If we use a divide-and-conquer scheme like the merge sort,
What about T(n) = 3T(\frac n 3) + cn?
Can we design an O(n)-time algorithm for this selection problem?
What about T(n) = T(an) + T(bn) + cn with a + b < 1?
cost : cn\{1+(a+b)+(a+b)^2+...\} \leq cn \frac 1 {1-a+b}
- so O(n)
#
2. Algorithm
- Step 1: Divide S into \lfloor \frac {|S|} 5 \rfloor sequence of 5 elements each with up to four leftover elements.
- Step 2: Sort each 5-element sequence.
- Step 3: Let M be the sequence of medians of the 5-element sets. Then, let m be the median of the elements in M.
- Step 4: Let S1, S2, and S3 be the sequences of elements in S less than, equal to, and greater than m, respectively.
- If |S1| >= k, then find the k-th smallest element of S1.
- else if (|S1| + |S2| >= k), then m is the k^{th} smallest element of S.
- else find the (k – |S1| - |S2|)^{th} smallest element of S3.
procedure SELECT(k,s):
if |S|<50 then
begin
sort S;
return kth smallest element in S
end
else
begin
divide S into |S|/5 sequences of 5 elements each with up to four leftover elements;
sort each 5-element sequence;
let M be the sequence of medians of the 5-element sets;
m <- SELECt (M/2, M);
let s1, s2 and s3 be the sequences of elements in S less than, equal to, and greater than m, respectively;
if |s1|>= k
then return m
else
return SELECT (k-|s1|-|s2|, s3)
end
- A divide-and-conquer strategy
- Facts
- At least one-fourth of the elements of S are less than or equal to m.
- At least one-fourth of the elements of S are greater than or equal to m.
- |S_1| <= \frac {3n} 4
- |S_3| <= \frac {3n} 4
- S1: the set of all elements less than m
- S2: the set of all elements equal to m
- S3: the set of all elements greater than m
#
3.3. Time Complexity
- Input size n = |S|
- |M| <= \lceil( \frac n 5)\rceil
- |S_1| <= \frac {3n} 4
- |S_3| <= \frac {3n} 4
procedure SELECT(k,s):
if |S|<50 then
begin
sort S;
return kth smallest element in S
end
else
begin
divide S into |S|/5 sequences of 5 elements each with up to four leftover elements;
sort each 5-element sequence;
let M be the sequence of medians of the 5-element sets;
m <- SELECt (M/2, M);
let s1, s2 and s3 be the sequences of elements in S less than, equal to, and greater than m, respectively;
if |s1|>= k
then return m
else
return SELECT (k-|s1|-|s2|, s3)
end