Sieve of Eratosthenes
SummaryCovers the classic sieve algorithm for finding all...
Covers the classic sieve algorithm for finding all...
Covers the classic sieve algorithm for finding all primes up to n in O(n log log n), space optimization with bitset, segmented sieve for large ranges, and interview applications.
Sieve of Eratosthenes
Prime numbers fuel cryptography, hashing, and a surprising number of interview questions. The naive approach — testing each number individually — is painfully slow. The Sieve of Eratosthenes eliminates composites in bulk, finding all primes up to n in O(n log log n) time. Let’s build it from the ground up.
The Problem
Find all prime numbers from 2 to a given limit n.
Naive Primality Testing
The direct approach: for each number k from 2 to n, check if any integer from 2 to √k divides it evenly.
List<Integer> naivePrimes(int n) {
var primes = new ArrayList<Integer>();
for (int k = 2; k <= n; k++) {
boolean isPrime = true;
for (int d = 2; d * d <= k; d++) {
if (k % d == 0) {
isPrime = false;
break;
}
}
if (isPrime) primes.add(k);
}
return primes;
}
Each primality test costs up to O(√k), giving O(n√n) total. For n = 10 million, that’s roughly 31 billion operations. Far too slow.
The Sieve Algorithm
Instead of testing numbers one at a time, the sieve works by elimination. Start by assuming every number is prime, then systematically cross off multiples of each prime you find.
- Create a boolean array
isPrime[0..n], initialize all entries totrue. - Mark
isPrime[0]andisPrime[1]asfalse. - For each number
pstarting from 2:- If
isPrime[p]istrue, thenpis prime. - Mark all multiples of
pasfalse— they’re composite.
- If
- Stop iterating when
p > √n(all remaining composites have already been marked). - Collect all indices where
isPrime[i]remainstrue.
List<Integer> sieve(int n) {
boolean[] isPrime = new boolean[n + 1];
Arrays.fill(isPrime, true);
isPrime[0] = false;
isPrime[1] = false;
for (int p = 2; p * p <= n; p++) {
if (isPrime[p]) {
for (int multiple = p * p; multiple <= n; multiple += p) {
isPrime[multiple] = false;
}
}
}
var primes = new ArrayList<Integer>();
for (int i = 2; i <= n; i++) {
if (isPrime[i]) primes.add(i);
}
return primes;
}
The p² Optimization
Notice the inner loop starts at p * p, not 2 * p. Why? Every multiple of p smaller than p² has a prime factor less than p — and that smaller prime already eliminated it in an earlier iteration.
Example: when p = 5, the multiples 10, 15, and 20 were already crossed off by primes 2 and 3. The first multiple of 5 that hasn’t been handled is 5 × 5 = 25.
This optimization cuts the work significantly without missing any composites.
Memory-Optimized Sieve with BitSet
A boolean[] uses one byte per element. For large n, a BitSet compresses that to one bit per element — an 8× memory savings.
List<Integer> sieveBitSet(int n) {
var isComposite = new BitSet(n + 1);
isComposite.set(0);
isComposite.set(1);
for (int p = 2; (long) p * p <= n; p++) {
if (!isComposite.get(p)) {
for (int multiple = p * p; multiple <= n; multiple += p) {
isComposite.set(multiple);
}
}
}
var primes = new ArrayList<Integer>();
for (int i = 2; i <= n; i++) {
if (!isComposite.get(i)) primes.add(i);
}
return primes;
}
For n = 100 million, the boolean[] version consumes ~100 MB. The BitSet version needs ~12.5 MB.
Complexity Analysis
Time: O(n log log n). The inner loop for prime p performs n/p operations. Summing over all primes up to √n:
$$\sum_{p \leq \sqrt{n}} \frac{n}{p} = n \sum_{p \leq \sqrt{n}} \frac{1}{p} \approx n \cdot \ln(\ln(n))$$
The sum of reciprocals of primes grows as ln(ln(n)) — a result from analytic number theory (Mertens’ theorem). This makes the sieve extraordinarily efficient: log log of a billion is about 3.
Space: O(n) — the boolean array or bitset.
Segmented Sieve
Problem: find all primes in range [L, R] where R can be as large as 10¹² — far too large for a single array.
Solution: sieve in segments of size √R.
Step 1: Find all primes up to √R using the basic sieve. These “small primes” are sufficient to eliminate all composites in any range.
Step 2: Process the range [L, R] in chunks of size √R. For each chunk, use the small primes to mark composites.
List<Long> segmentedSieve(long low, long high) {
int limit = (int) Math.sqrt(high) + 1;
// Step 1: find small primes
List<Integer> smallPrimes = sieve(limit);
var primes = new ArrayList<Long>();
// Step 2: process segments
int segmentSize = Math.max(limit, 1);
for (long segStart = low; segStart <= high; segStart += segmentSize) {
long segEnd = Math.min(segStart + segmentSize - 1, high);
int size = (int) (segEnd - segStart + 1);
boolean[] isComposite = new boolean[size];
for (int p : smallPrimes) {
// Find first multiple of p in [segStart, segEnd]
long firstMultiple = ((segStart + p - 1) / p) * p;
if (firstMultiple == p) firstMultiple += p; // skip p itself
for (long j = firstMultiple; j <= segEnd; j += p) {
isComposite[(int) (j - segStart)] = true;
}
}
for (int i = 0; i < size; i++) {
long num = segStart + i;
if (num >= 2 && !isComposite[i]) {
primes.add(num);
}
}
}
return primes;
}
The segmented sieve uses only O(√R) memory regardless of the range size — a massive improvement when R is in the billions.
Applications
Count Primes (LeetCode #204)
Count the number of primes strictly less than n. Direct sieve application:
int countPrimes(int n) {
if (n <= 2) return 0;
var isComposite = new BitSet(n);
int count = 0;
for (int p = 2; p < n; p++) {
if (!isComposite.get(p)) {
count++;
if ((long) p * p < n) {
for (int multiple = p * p; multiple < n; multiple += p) {
isComposite.set(multiple);
}
}
}
}
return count;
}
Smallest Prime Factor (SPF) Sieve
Precompute the smallest prime factor for every number up to n. This turns prime factorization into an O(log k) operation for any number k.
int[] buildSPF(int n) {
int[] spf = new int[n + 1];
for (int i = 0; i <= n; i++) spf[i] = i; // initialize to self
for (int p = 2; (long) p * p <= n; p++) {
if (spf[p] == p) { // p is prime
for (int multiple = p * p; multiple <= n; multiple += p) {
if (spf[multiple] == multiple) {
spf[multiple] = p;
}
}
}
}
return spf;
}
List<Integer> primeFactors(int k, int[] spf) {
var factors = new ArrayList<Integer>();
while (k > 1) {
factors.add(spf[k]);
k /= spf[k];
}
return factors;
}
Once the SPF array is built, factorizing any number k ≤ n takes O(log k) — because each division by spf[k] at least halves the value.
Euler’s Totient Function
Euler’s totient φ(n) counts integers from 1 to n that are coprime with n. A sieve-like approach computes φ for all numbers up to n simultaneously:
int[] eulerTotient(int n) {
int[] phi = new int[n + 1];
for (int i = 0; i <= n; i++) phi[i] = i;
for (int p = 2; p <= n; p++) {
if (phi[p] == p) { // p is prime
for (int multiple = p; multiple <= n; multiple += p) {
phi[multiple] -= phi[multiple] / p;
}
}
}
return phi;
}
This exploits the formula φ(n) = n × ∏(1 - 1/p) for each prime factor p of n.
Linear Sieve: O(n)
The standard sieve marks some composites more than once (e.g., 30 gets marked by both 2, 3, and 5). The linear sieve ensures each composite is marked exactly once — by its smallest prime factor. It achieves strict O(n) time, though in practice the constant factor makes it comparable to the standard sieve for most interview-relevant input sizes. Worth knowing it exists, but the O(n log log n) sieve is the one you’ll implement in interviews.
Interviewer Tips
When interviewers ask about primes, they’re typically testing two things:
-
Optimization awareness: Do you jump straight to trial division, or do you recognize the sieve pattern? Moving from O(n√n) to O(n log log n) demonstrates algorithmic maturity.
-
Mathematical reasoning: Understanding why you start marking from p², why you stop at √n, and why the complexity is O(n log log n) shows depth beyond memorized code.
Common interview variations: “Count primes up to n” (direct sieve), “Find the kth prime” (sieve + indexing), “Check if a large number is prime” (Miller-Rabin, not sieve — know when the sieve doesn’t apply), “Find prime factors of n” (SPF sieve or trial division up to √n).
The sieve is a building block. Master it, and prime-related problems become routine.