Sieve of Eratosthenes

Prime numbers fuel cryptography, hashing, and a surprising number of interview questions. The naive approach — testing each number individually — is painfully slow. The Sieve of Eratosthenes eliminates composites in bulk, finding all primes up to n in O(n log log n) time. Let’s build it from the ground up.

The Problem

Find all prime numbers from 2 to a given limit n.

Naive Primality Testing

The direct approach: for each number k from 2 to n, check if any integer from 2 to √k divides it evenly.

List<Integer> naivePrimes(int n) {
    var primes = new ArrayList<Integer>();
    for (int k = 2; k <= n; k++) {
        boolean isPrime = true;
        for (int d = 2; d * d <= k; d++) {
            if (k % d == 0) {
                isPrime = false;
                break;
            }
        }
        if (isPrime) primes.add(k);
    }
    return primes;
}

Each primality test costs up to O(√k), giving O(n√n) total. For n = 10 million, that’s roughly 31 billion operations. Far too slow.

The Sieve Algorithm

Instead of testing numbers one at a time, the sieve works by elimination. Start by assuming every number is prime, then systematically cross off multiples of each prime you find.

Create a boolean array isPrime[0..n], initialize all entries to true.
Mark isPrime[0] and isPrime[1] as false.
For each number p starting from 2:
- If isPrime[p] is true, then p is prime.
- Mark all multiples of p as false — they’re composite.
Stop iterating when p > √n (all remaining composites have already been marked).
Collect all indices where isPrime[i] remains true.

List<Integer> sieve(int n) {
    boolean[] isPrime = new boolean[n + 1];
    Arrays.fill(isPrime, true);
    isPrime[0] = false;
    isPrime[1] = false;

    for (int p = 2; p * p <= n; p++) {
        if (isPrime[p]) {
            for (int multiple = p * p; multiple <= n; multiple += p) {
                isPrime[multiple] = false;
            }
        }
    }

    var primes = new ArrayList<Integer>();
    for (int i = 2; i <= n; i++) {
        if (isPrime[i]) primes.add(i);
    }
    return primes;
}

The p² Optimization

Notice the inner loop starts at p * p, not 2 * p. Why? Every multiple of p smaller than p² has a prime factor less than p — and that smaller prime already eliminated it in an earlier iteration.

Example: when p = 5, the multiples 10, 15, and 20 were already crossed off by primes 2 and 3. The first multiple of 5 that hasn’t been handled is 5 × 5 = 25.

This optimization cuts the work significantly without missing any composites.

Memory-Optimized Sieve with BitSet

A boolean[] uses one byte per element. For large n, a BitSet compresses that to one bit per element — an 8× memory savings.

List<Integer> sieveBitSet(int n) {
    var isComposite = new BitSet(n + 1);
    isComposite.set(0);
    isComposite.set(1);

    for (int p = 2; (long) p * p <= n; p++) {
        if (!isComposite.get(p)) {
            for (int multiple = p * p; multiple <= n; multiple += p) {
                isComposite.set(multiple);
            }
        }
    }

    var primes = new ArrayList<Integer>();
    for (int i = 2; i <= n; i++) {
        if (!isComposite.get(i)) primes.add(i);
    }
    return primes;
}

For n = 100 million, the boolean[] version consumes ~100 MB. The BitSet version needs ~12.5 MB.

Complexity Analysis

Time: O(n log log n). The inner loop for prime p performs n/p operations. Summing over all primes up to √n:

$$\sum_{p \leq \sqrt{n}} \frac{n}{p} = n \sum_{p \leq \sqrt{n}} \frac{1}{p} \approx n \cdot \ln(\ln(n))$$

The sum of reciprocals of primes grows as ln(ln(n)) — a result from analytic number theory (Mertens’ theorem). This makes the sieve extraordinarily efficient: log log of a billion is about 3.

Space: O(n) — the boolean array or bitset.

Segmented Sieve

Problem: find all primes in range [L, R] where R can be as large as 10¹² — far too large for a single array.

Solution: sieve in segments of size √R.

Step 1: Find all primes up to √R using the basic sieve. These “small primes” are sufficient to eliminate all composites in any range.

Step 2: Process the range [L, R] in chunks of size √R. For each chunk, use the small primes to mark composites.

List<Long> segmentedSieve(long low, long high) {
    int limit = (int) Math.sqrt(high) + 1;
    // Step 1: find small primes
    List<Integer> smallPrimes = sieve(limit);

    var primes = new ArrayList<Long>();
    // Step 2: process segments
    int segmentSize = Math.max(limit, 1);

    for (long segStart = low; segStart <= high; segStart += segmentSize) {
        long segEnd = Math.min(segStart + segmentSize - 1, high);
        int size = (int) (segEnd - segStart + 1);
        boolean[] isComposite = new boolean[size];

        for (int p : smallPrimes) {
            // Find first multiple of p in [segStart, segEnd]
            long firstMultiple = ((segStart + p - 1) / p) * p;
            if (firstMultiple == p) firstMultiple += p; // skip p itself

            for (long j = firstMultiple; j <= segEnd; j += p) {
                isComposite[(int) (j - segStart)] = true;
            }
        }

        for (int i = 0; i < size; i++) {
            long num = segStart + i;
            if (num >= 2 && !isComposite[i]) {
                primes.add(num);
            }
        }
    }
    return primes;
}

The segmented sieve uses only O(√R) memory regardless of the range size — a massive improvement when R is in the billions.

Applications

Count Primes (LeetCode #204)

Count the number of primes strictly less than n. Direct sieve application:

int countPrimes(int n) {
    if (n <= 2) return 0;
    var isComposite = new BitSet(n);
    int count = 0;
    for (int p = 2; p < n; p++) {
        if (!isComposite.get(p)) {
            count++;
            if ((long) p * p < n) {
                for (int multiple = p * p; multiple < n; multiple += p) {
                    isComposite.set(multiple);
                }
            }
        }
    }
    return count;
}

Smallest Prime Factor (SPF) Sieve

Precompute the smallest prime factor for every number up to n. This turns prime factorization into an O(log k) operation for any number k.

int[] buildSPF(int n) {
    int[] spf = new int[n + 1];
    for (int i = 0; i <= n; i++) spf[i] = i; // initialize to self

    for (int p = 2; (long) p * p <= n; p++) {
        if (spf[p] == p) { // p is prime
            for (int multiple = p * p; multiple <= n; multiple += p) {
                if (spf[multiple] == multiple) {
                    spf[multiple] = p;
                }
            }
        }
    }
    return spf;
}

List<Integer> primeFactors(int k, int[] spf) {
    var factors = new ArrayList<Integer>();
    while (k > 1) {
        factors.add(spf[k]);
        k /= spf[k];
    }
    return factors;
}

Once the SPF array is built, factorizing any number k ≤ n takes O(log k) — because each division by spf[k] at least halves the value.

Euler’s Totient Function

Euler’s totient φ(n) counts integers from 1 to n that are coprime with n. A sieve-like approach computes φ for all numbers up to n simultaneously:

int[] eulerTotient(int n) {
    int[] phi = new int[n + 1];
    for (int i = 0; i <= n; i++) phi[i] = i;

    for (int p = 2; p <= n; p++) {
        if (phi[p] == p) { // p is prime
            for (int multiple = p; multiple <= n; multiple += p) {
                phi[multiple] -= phi[multiple] / p;
            }
        }
    }
    return phi;
}

This exploits the formula φ(n) = n × ∏(1 - 1/p) for each prime factor p of n.

Linear Sieve: O(n)

The standard sieve marks some composites more than once (e.g., 30 gets marked by both 2, 3, and 5). The linear sieve ensures each composite is marked exactly once — by its smallest prime factor. It achieves strict O(n) time, though in practice the constant factor makes it comparable to the standard sieve for most interview-relevant input sizes. Worth knowing it exists, but the O(n log log n) sieve is the one you’ll implement in interviews.

Interviewer Tips

When interviewers ask about primes, they’re typically testing two things:

Optimization awareness: Do you jump straight to trial division, or do you recognize the sieve pattern? Moving from O(n√n) to O(n log log n) demonstrates algorithmic maturity.
Mathematical reasoning: Understanding why you start marking from p², why you stop at √n, and why the complexity is O(n log log n) shows depth beyond memorized code.

Common interview variations: “Count primes up to n” (direct sieve), “Find the kth prime” (sieve + indexing), “Check if a large number is prime” (Miller-Rabin, not sieve — know when the sieve doesn’t apply), “Find prime factors of n” (SPF sieve or trial division up to √n).

The sieve is a building block. Master it, and prime-related problems become routine.