# 3.2.2. Selection of the k-th Smallest Element

In 
  • ref. [A. Aho, J. Hopcroft, and J. Ullman, Design and Analysis of Algorithms, Addison-Wesley, 1974. 3.6]
  • Problem
    • Given a sequence of S of n elements and an integer k (1 <= k <= n), find the k^{th} smallest element of S.
  • Solution 1:
    • Choose the smallest element repeatedly k times.
      • C = c(n-1)+c(n-2)+c(n-3)+...+c(n-k) = c \cdot k \cdot n - c \cdot \frac {k(k+1)} 2
      • if k= \frac n 2 then C = c \cdot \frac {n^2} 2 - c \cdot \frac {n^2 + 2n} 8 = O(n^2)
  • Solution 2:
    • Build a min-heap, and then extract the smallest element repeatedly k times.
      • C = c \cdot n + d \cdot k \cdot log n
      • if k= \frac n 2 then C = c \cdot n + d \cdot \frac n 2 \cdot \log n = O(n \log n)
  • Can we design an O(n)-time algorithm?

# 1. Observation

  • At least O(n) time is necessary.

  • If we use a divide-and-conquer scheme like the merge sort,

  • What about T(n) = 3T(\frac n 3) + cn?

  • image

  • Can we design an O(n)-time algorithm for this selection problem?

  • image

  • What about T(n) = T(an) + T(bn) + cn with a + b < 1?

  • cost : cn\{1+(a+b)+(a+b)^2+...\} \leq cn \frac 1 {1-a+b}

    • so O(n)

# 2. Algorithm

  • Step 1: Divide S into \lfloor \frac {|S|} 5 \rfloor sequence of 5 elements each with up to four leftover elements.
    • image
  • Step 2: Sort each 5-element sequence.
    • image
  • Step 3: Let M be the sequence of medians of the 5-element sets. Then, let m be the median of the elements in M.
  • Step 4: Let S1, S2, and S3 be the sequences of elements in S less than, equal to, and greater than m, respectively.
    • image
    • If |S1| >= k, then find the k-th smallest element of S1.
    • else if (|S1| + |S2| >= k), then m is the k^{th} smallest element of S.
    • else find the (k – |S1| - |S2|)^{th} smallest element of S3.
procedure SELECT(k,s):
if |S|<50 then	
	begin
		sort S;
		return kth smallest element in S
	end
else
	begin
		divide S into |S|/5 sequences of 5 elements each with up to four leftover elements;
		sort each 5-element sequence;
		let M be the sequence of medians of the 5-element sets;
		m <- SELECt (M/2, M);
		let s1, s2 and s3 be the sequences of elements in S less than, equal to, and greater than m, respectively;
		if |s1|>= k 
			then return m
		else 
			return SELECT (k-|s1|-|s2|, s3)
	end
  • A divide-and-conquer strategy
  • Facts
    • At least one-fourth of the elements of S are less than or equal to m.
    • At least one-fourth of the elements of S are greater than or equal to m.
    • |S_1| <= \frac {3n} 4
    • |S_3| <= \frac {3n} 4
  • S1: the set of all elements less than m
  • S2: the set of all elements equal to m
  • S3: the set of all elements greater than m

image
image

# 3.3. Time Complexity

  • Input size n = |S|
  • |M| <= \lceil( \frac n 5)\rceil
  • |S_1| <= \frac {3n} 4
  • |S_3| <= \frac {3n} 4
procedure SELECT(k,s):
if |S|<50 then	
	begin
		sort S;
		return kth smallest element in S
	end
else
	begin
		divide S into |S|/5 sequences of 5 elements each with up to four leftover elements;
		sort each 5-element sequence;
		let M be the sequence of medians of the 5-element sets;
		m <- SELECt (M/2, M);
		let s1, s2 and s3 be the sequences of elements in S less than, equal to, and greater than m, respectively;
		if |s1|>= k 
			then return m
		else 
			return SELECT (k-|s1|-|s2|, s3)
	end