Radix Sort
SummaryCovers LSD (least significant digit) radix sort using...
Covers LSD (least significant digit) radix sort using...
Covers LSD (least significant digit) radix sort using counting sort as a stable subroutine, achieving O(d(n + k)) time, and comparison with other sorting approaches.
Radix Sort
Counting sort handles integers in a small range. But what about sorting integers up to 1,000,000 or even 2^32? You can’t allocate a count array of size 2^32 and call it a day. Radix sort solves this by decomposing large numbers into individual digits and sorting one digit at a time.
The result: linear-time sorting for fixed-width integers, with no comparison between elements.
The Core Idea
Instead of sorting by the entire value, radix sort sorts by one digit position per pass. After processing all digits, the array is fully sorted.
There are two variants:
- LSD (Least Significant Digit): process digits from right to left (ones → tens → hundreds). This is the standard version.
- MSD (Most Significant Digit): process digits from left to right. Better for strings. More complex to implement.
LSD radix sort is the workhorse. Let’s master it first.
Why Least Significant Digit First?
This feels backwards — shouldn’t the most significant digit matter most? The trick is that each digit-level sort must be stable. Stability means equal elements retain their relative order from the previous pass.
Here’s why this works:
- After sorting by the ones digit, elements with the same ones digit are grouped together.
- When sorting by the tens digit, stability ensures that within each tens-digit group, the ones-digit ordering from the previous pass is preserved.
- Each subsequent pass refines the ordering without destroying the work of previous passes.
If the subroutine were unstable, sorting by the tens digit would scramble the ones-digit ordering, and the final result would be wrong.
The Algorithm
- Find the maximum number to determine the number of digits d.
- For each digit position (ones, tens, hundreds, …):
- Use counting sort on that digit — base 10 means k = 10.
- After d passes, the array is sorted.
Java 25 Implementation
void radixSort(int[] arr) {
int n = arr.length;
if (n <= 1) return;
// Find the maximum to determine number of digits
int max = arr[0];
for (int i = 1; i < n; i++) {
max = Math.max(max, arr[i]);
}
// Sort by each digit position: exp = 1 (ones), 10 (tens), 100 (hundreds), ...
for (int exp = 1; max / exp > 0; exp *= 10) {
countingSortByDigit(arr, n, exp);
}
}
void countingSortByDigit(int[] arr, int n, int exp) {
int[] output = new int[n];
int[] count = new int[10]; // Digits 0–9
// Count occurrences of each digit at this position
for (int i = 0; i < n; i++) {
int digit = (arr[i] / exp) % 10;
count[digit]++;
}
// Prefix sums
for (int i = 1; i < 10; i++) {
count[i] += count[i - 1];
}
// Build output array (backwards for stability)
for (int i = n - 1; i >= 0; i--) {
int digit = (arr[i] / exp) % 10;
output[count[digit] - 1] = arr[i];
count[digit]--;
}
// Copy output back to arr
System.arraycopy(output, 0, arr, 0, n);
}
The expression (arr[i] / exp) % 10 extracts the digit at position exp:
exp = 1: ones digit.exp = 10: tens digit (divide by 10, take mod 10).exp = 100: hundreds digit, and so on.
Visual Walkthrough
Input: [170, 45, 75, 90, 802, 24, 2, 66]
Maximum value: 802 → 3 digits → 3 passes.
Pass 1 — Sort by Ones Digit
Extract ones: [0, 5, 5, 0, 2, 4, 2, 6]
After stable counting sort by ones digit:
[170, 90, 802, 2, 24, 45, 75, 66]
0 0 2 2 4 5 5 6
Elements with ones digit 0 come first (170, 90 in original order — stability), then digit 2 elements (802, 2), etc.
Pass 2 — Sort by Tens Digit
Extract tens: [7, 9, 0, 0, 2, 4, 7, 6]
After stable counting sort by tens digit:
[802, 2, 24, 45, 66, 170, 75, 90]
0 0 2 4 6 7 7 9
802 and 2 both have tens digit 0. Stability preserves their order from Pass 1 (802 came before 2). Similarly, 170 and 75 both have tens digit 7 — stability keeps 170 before 75.
Pass 3 — Sort by Hundreds Digit
Extract hundreds: [8, 0, 0, 0, 0, 1, 0, 0]
After stable counting sort by hundreds digit:
[2, 24, 45, 66, 75, 90, 170, 802]
0 0 0 0 0 0 1 8
All elements with hundreds digit 0 appear first, in the order established by Pass 2. The array is now fully sorted.
Analysis
Time Complexity: O(d × (n + k))
Each pass runs counting sort in O(n + k) time, where k is the base (number of possible digit values). With d digit positions:
$$T(n) = O(d \times (n + k))$$
For base 10 with d-digit numbers: O(d × (n + 10)) = O(d × n).
For 32-bit integers using base 256 (byte-level processing):
- k = 256 (byte values 0–255)
- d = 4 (a 32-bit integer has 4 bytes)
- Time: O(4 × (n + 256)) = O(n)
That’s constant-time per element — radix sort provides true linear-time sorting for fixed-width integers.
Formal Comparison with Comparison Sorts
For n integers with max value M:
- d = ⌈log_k(M)⌉ digits in base k
- Radix sort: O(d × (n + k)) = O(n × log_k(M) + k × log_k(M))
Choosing k = n: O(n × log_n(M)). When M = n^c (polynomial in n), this becomes O(c × n) = O(n).
Compare with O(n log n) for comparison sorts. Radix sort wins when d is small — many elements with few digits.
Space Complexity: O(n + k)
The counting sort subroutine requires an output array of size n and a count array of size k. These are reused across passes, so total auxiliary space is O(n + k).
Stability: Yes
Radix sort inherits stability from counting sort. It’s crucial that this property holds, because the correctness of LSD radix sort depends on each pass preserving the ordering established by earlier passes.
Base Selection: A Critical Trade-Off
The choice of base k controls the trade-off between number of passes and counting array size:
| Base | Digits (32-bit int) | Count Array Size | Passes |
|---|---|---|---|
| 10 | 10 | 10 | 10 |
| 16 | 8 | 16 | 8 |
| 256 | 4 | 256 | 4 |
| 65,536 | 2 | 65,536 | 2 |
Base 256 (byte-level) is the sweet spot for 32-bit integers:
- 4 passes is few enough to be fast.
- Count array of 256 fits comfortably in L1 cache.
- Digit extraction uses bitwise operations:
(arr[i] >> (8 * pass)) & 0xFF.
Base 65,536 (word-level) cuts passes to 2 but needs a 65K count array. Still practical on modern machines.
Base 10 is the worst choice for performance — 10 passes for a 32-bit integer — but it’s the clearest for teaching and interviews.
Byte-level radix sort in Java:
void radixSort256(int[] arr) {
int n = arr.length;
int[] output = new int[n];
for (int shift = 0; shift < 32; shift += 8) {
int[] count = new int[256];
// Count
for (int val : arr) {
int digit = (val >> shift) & 0xFF;
count[digit]++;
}
// Prefix sums
for (int i = 1; i < 256; i++) {
count[i] += count[i - 1];
}
// Place (backwards for stability)
for (int i = n - 1; i >= 0; i--) {
int digit = (arr[i] >> shift) & 0xFF;
output[count[digit] - 1] = arr[i];
count[digit]--;
}
System.arraycopy(output, 0, arr, 0, n);
}
}
The shift-and-mask approach (val >> shift) & 0xFF is faster than division and modulo, and it naturally processes one byte per pass.
MSD (Most Significant Digit) Radix Sort
MSD radix sort processes digits from left to right and sorts recursively within each bucket:
- Sort all elements by the most significant digit into buckets.
- Recursively sort each bucket by the next digit.
- Concatenate buckets.
void msdRadixSort(int[] arr, int lo, int hi, int exp) {
if (lo >= hi || exp <= 0) return;
int[] count = new int[12]; // 10 digits + 2 for boundaries
int[] output = new int[hi - lo + 1];
// Count occurrences
for (int i = lo; i <= hi; i++) {
int digit = (arr[i] / exp) % 10;
count[digit + 2]++;
}
// Prefix sums
for (int i = 2; i < 12; i++) {
count[i] += count[i - 1];
}
// Distribute
for (int i = lo; i <= hi; i++) {
int digit = (arr[i] / exp) % 10;
output[count[digit + 1]++] = arr[i];
}
System.arraycopy(output, 0, arr, lo, hi - lo + 1);
// Recursively sort each bucket
for (int d = 0; d < 10; d++) {
msdRadixSort(arr, lo + count[d], lo + count[d + 1] - 1, exp / 10);
}
}
MSD advantages:
- Short-circuits: single-element buckets don’t need further sorting.
- Better for strings: lexicographic ordering naturally follows left-to-right digit processing.
- Can stop early if all remaining elements in a bucket share the same prefix.
MSD disadvantages:
- More complex implementation (recursive, bucket management).
- Poor cache performance due to recursive bucket subdivision.
- LSD is simpler and faster for fixed-width integers.
Handling Negative Numbers
Standard radix sort treats numbers as unsigned. To handle negatives:
void radixSortWithNegatives(int[] arr) {
// Separate negatives and positives
int negCount = 0;
for (int val : arr) {
if (val < 0) negCount++;
}
int[] negatives = new int[negCount];
int[] positives = new int[arr.length - negCount];
int ni = 0, pi = 0;
for (int val : arr) {
if (val < 0) negatives[ni++] = -val; // Store absolute values
else positives[pi++] = val;
}
// Sort both halves
if (negatives.length > 0) radixSort(negatives);
if (positives.length > 0) radixSort(positives);
// Merge: reversed negatives, then positives
int idx = 0;
for (int i = negatives.length - 1; i >= 0; i--) {
arr[idx++] = -negatives[i]; // Reverse and negate back
}
for (int val : positives) {
arr[idx++] = val;
}
}
The negatives, when sorted by absolute value, come out in ascending order of magnitude. Reversing them and negating produces descending negative values, which is the correct sorted order.
When Radix Sort Beats Comparison Sorts
Radix sort outperforms comparison sorts in specific scenarios:
Large n, small d: Sorting 10 million 32-bit integers? Radix sort with base 256 does 4 passes of O(n) work = O(4n). Quicksort does O(n log n) ≈ O(23n). Radix sort wins by ~6×.
Fixed-width integers: 32-bit or 64-bit integers have a bounded number of digits. The d factor becomes a constant, making radix sort genuinely O(n).
Predictable performance: No worst-case blowup like quicksort. No data dependency. Every input takes the same amount of time. This predictability matters in systems with real-time constraints.
When it doesn’t win: If n is small (fewer than ~1000 elements), the constant factors in radix sort (multiple passes, counting array setup) outweigh the O(n log n) cost of comparison sorts. Quicksort with its cache-friendly access pattern is faster for small to medium arrays.
Also, radix sort works poorly when values span an extreme range with few elements. Sorting 100 elements where the maximum is 10^18 requires d = 19 passes in base 10 — far worse than quicksort’s O(100 × 7) ≈ 700 comparisons.
Radix Sort in Practice
Java’s standard library doesn’t include radix sort. But it appears in:
- Database engines: sorting integer columns with known bounds.
- Graphics pipelines: sorting depth values for rendering order (fixed-point 24-bit depth → 3 byte-level passes).
- Suffix arrays: string processing algorithms use MSD radix sort for lexicographic ordering.
- Network packet processing: sorting packets by integer fields (IP addresses, port numbers).
Interviewer Tips
- Lead with the insight: radix sort exploits the structure of integers (fixed number of digits) to avoid comparisons entirely. Frame it as “sorting by position, not by comparison.”
- Explain the stability requirement: when asked “why LSD?” answer that each pass must be stable so that digit-level orderings compose correctly. This is the core correctness argument.
- Know the complexity precisely: state O(d × (n + k)), then specialize. For 32-bit integers with base 256: O(4 × (n + 256)) = O(n). Interviewers want to see you manipulate the parameters.
- Draw the walkthrough: the three-pass example with
[170, 45, 75, 90, 802, 24, 2, 66]is compact enough to trace on a whiteboard. Showing each pass builds confidence that you understand the mechanics. - Compare honestly: radix sort isn’t always faster. When asked “is radix sort always better than quicksort?” answer no — it depends on n, d, k, and cache effects. This nuance demonstrates real understanding beyond textbook recitation.
- Mention the base trade-off: bring up base 256 vs. base 10 without being prompted. It shows you’ve thought about practical performance, not only asymptotic behavior.