Kth Largest Element in an Array

[!NOTE] This problem is a classic interview question that tests your understanding of partial sorting and priority structures. While sorting the entire array is an option, it is rarely the most efficient for large datasets when you only need a single value.


1. Problem Definition

Given an integer array nums and an integer k, return the kth largest element in the array.

Note that it is the kth largest element in the sorted order, not the kth distinct element.

Example 1:

  • Input: nums = [3,2,1,5,6,4], k = 2
  • Output: 5

Example 2:

  • Input: nums = [3,2,3,1,2,4,5,5,6], k = 4
  • Output: 4

Constraints:

  • 1 <= k <= nums.length <= 10^5
  • -10^4 <= nums[i] <= 10^4

Edge Cases & Clarifying Questions

  • Q: Can the array contain duplicates?
    • A: Yes, and we need to account for them. The $k^{th}$ largest element is by sorted position, so in [3, 3, 3], the 1st, 2nd, and 3rd largest are all 3.
  • Q: Can $k$ be larger than the array length?
    • A: The constraints specify $1 \le k \le ext{nums.length}$, so we don’t need to handle out-of-bounds $k$.
  • Q: Are we mutating the array?
    • A: Modifying the input array is generally acceptable for this problem (especially for Quickselect) to achieve $O(1)$ space, but in a real-world API, we might want to clarify if we should work on a copy.

2. Interactive Analysis

Choose a strategy to visualize the process.

3. Intuition (The “Genesis”)

How can we find the $k^{th}$ largest element without sorting everything? The problem asks us to find a single element based on its rank, which hints at partial sorting or selection patterns.

The Brute Force Approach

The most obvious solution is to sort the entire array in descending order and return the element at index k - 1.

  • Time Complexity: $O(N \log N)$
  • Space Complexity: $O(1)$ or $O(N)$ depending on the sorting algorithm.

While this works and is trivial to implement, it does redundant work. We are sorting $N$ elements when we only care about the exact position of one element. We need to optimize this by avoiding full array sorting.

Pattern: The “Bucket” Method (Min-Heap)

Imagine you have a bucket that can only hold $K$ items. As you walk through the array:

  1. Put the current item into the bucket.
  2. If the bucket has more than $K$ items, throw away the smallest one.
  3. After looking at every item, the smallest item remaining in your bucket is the $k^{th}$ largest in the whole array. Complexity: $O(N \log K)$ — we iterate $N$ times, and each heap operation takes $\log K$.

The “Narrowing Down” Method (Quickselect)

Based on the Quick Sort partition algorithm. When you pick a pivot and partition the array, that pivot lands in its final sorted position.

  • If that position is exactly the index we want ($N-K$), we’re done.
  • If it’s too high, we only search the left side.
  • If it’s too low, we only search the right side. Complexity: $O(N)$ on average. In the worst case (bad pivots), it becomes $O(N^2)$, but random pivot selection makes this extremely rare.

Common Pitfalls

  • Off-by-One Errors in K: Remembering that the $k^{th}$ largest element corresponds to the index N - K when the array is sorted in ascending order.
  • Choosing a Bad Pivot in Quickselect: Always use a randomized pivot. Consistently picking the first or last element can lead to $O(N^2)$ time complexity on already sorted arrays.
  • Heap Type Confusion: A common mistake is using a Max-Heap instead of a Min-Heap. If you use a Max-Heap of size $K$, you’ll keep the $K$ smallest elements. We want a Min-Heap so we can easily evict the smallest element of the top $K$ seen so far.

4. Solutions

Java
Go
class Solution {
    private Random rand = new Random();
    
    /**
     * Approach 1: Quickselect 
     * Average Case: O(N) | Space: O(1)
     */
    public int findKthLargest(int[] nums, int k) {
        // Kth largest is at index (length - k) in sorted order
        return quickSelect(nums, 0, nums.length - 1, nums.length - k);
    }
    
    private int quickSelect(int[] nums, int left, int right, int target) {
        if (left == right) return nums[left];
        
        int pivot = left + rand.nextInt(right - left + 1);
        swap(nums, pivot, right);
        
        int partition = partition(nums, left, right);
        
        if (partition == target) {
            return nums[target];
        } else if (partition < target) {
            return quickSelect(nums, partition + 1, right, target);
        } else {
            return quickSelect(nums, left, partition - 1, target);
        }
    }
    
    private int partition(int[] nums, int left, int right) {
        int pivotValue = nums[right];
        int fillPointer = left;
        
        for (int i = left; i < right; i++) {
            if (nums[i] < pivotValue) {
                swap(nums, fillPointer, i);
                fillPointer++;
            }
        }
        
        swap(nums, fillPointer, right);
        return fillPointer;
    }
    
    private void swap(int[] nums, int i, int j) {
        int temp = nums[i];
        nums[i] = nums[j];
        nums[j] = temp;
    }
    
    /**
     * Approach 2: Min-Heap
     * Time: O(N log K) | Space: O(K)
     */
    public int findKthLargestHeap(int[] nums, int k) {
        PriorityQueue<Integer> heap = new PriorityQueue<>();
        for (int num : nums) {
            heap.add(num);
            if (heap.size() > k) {
                heap.poll();
            }
        }
        return heap.peek();
    }
}
import (
    "container/heap"
    "math/rand"
    "time"
)

/**
 * Approach 1: Quickselect
 * Average Case: O(N) | Space: O(1)
 */
func findKthLargest(nums []int, k int) int {
    rand.Seed(time.Now().UnixNano())
    return quickSelect(nums, 0, len(nums)-1, len(nums)-k)
}

func quickSelect(nums []int, left, right, target int) int {
    if left == right {
        return nums[left]
    }

    pivotIdx := left + rand.Intn(right-left+1)
    nums[pivotIdx], nums[right] = nums[right], nums[pivotIdx]

    p := partition(nums, left, right)

    if p == target {
        return nums[target]
    } else if p < target {
        return quickSelect(nums, p+1, right, target)
    } else {
        return quickSelect(nums, left, p-1, target)
    }
}

func partition(nums []int, left, right int) int {
    pivot := nums[right]
    fill := left
    for i := left; i < right; i++ {
        if nums[i] < pivot {
            nums[i], nums[fill] = nums[fill], nums[i]
            fill++
        }
    }
    nums[fill], nums[right] = nums[right], nums[fill]
    return fill
}

/**
 * Approach 2: Min-Heap
 * Time: O(N log K) | Space: O(K)
 */
type IntHeap []int
func (h IntHeap) Len() int           { return len(h) }
func (h IntHeap) Less(i, j int) bool { return h[i] < h[j] }
func (h IntHeap) Swap(i, j int)      { h[i], h[j] = h[j], h[i] }
func (h *IntHeap) Push(x interface{}) { *h = append(*h, x.(int)) }
func (h *IntHeap) Pop() interface{} {
    old := *h
    n := len(old)
    x := old[n-1]
    *h = old[0 : n-1]
    return x
}

func findKthLargestHeap(nums []int, k int) int {
    h := &IntHeap{}
    heap.Init(h)
    for _, num := range nums {
        heap.Push(h, num)
        if h.Len() > k {
            heap.Pop(h)
        }
    }
    return (*h)[0]
}

5. Complexity Analysis

Strategy Time Complexity Space Complexity Notes
Full Sorting O(N log N) O(1) or O(N) Redundant work; sorts everything.
Min-Heap O(N log K) O(K) Best for streaming data where $N$ is unknown.
Quickselect $O(N)$ (avg) $O(1)$ Mathematically optimal average time.