Minimum Spanning Trees

Imagine you need to connect 50 office buildings with fiber optic cable. Each pair of buildings has a different wiring cost. You want every building to reach every other building — directly or through intermediate connections — while spending as little money as possible. This is the Minimum Spanning Tree problem, and it sits at the intersection of graph theory and greedy algorithm design.

MST problems appear in interviews at companies building infrastructure: networking, logistics, and telecommunications. The two classic algorithms — Kruskal’s and Prim’s — each use a different greedy strategy, and knowing both gives you flexibility to pick the right tool for the problem constraints.

What is a Minimum Spanning Tree?

Given a connected, undirected, weighted graph G = (V, E), a spanning tree is a subgraph that includes every vertex and is a tree (connected with no cycles). A spanning tree has exactly V - 1 edges.

A Minimum Spanning Tree (MST) is a spanning tree whose total edge weight is the smallest among all possible spanning trees of G.

Key properties to internalize:

An MST has exactly V - 1 edges for V vertices.
If all edge weights are distinct, the MST is unique. If some weights are equal, multiple valid MSTs can exist.
Adding any non-tree edge to an MST creates exactly one cycle.
Removing the heaviest edge from that cycle produces a new spanning tree with equal or lower total weight.

The Cut Property

The cut property is the theoretical foundation for both Kruskal’s and Prim’s algorithms. Understanding it turns the algorithms from memorized procedures into logical consequences of a provable principle.

Definition: A cut partitions the vertices of a graph into two non-empty sets S and V - S. An edge crosses the cut if one endpoint is in S and the other is in V - S.

Cut Property: For any cut of the graph, the minimum-weight edge crossing that cut belongs to some MST.

Proof sketch: Assume the minimum-weight crossing edge e does not belong to a particular MST T. Add e to T — this creates a cycle. That cycle must contain another edge f that also crosses the same cut (because the tree connected both sides before adding e). Since weight(e) ≤ weight(f), removing f and keeping e produces a spanning tree with equal or lower total weight. Therefore e belongs to some MST.

Both Kruskal’s and Prim’s algorithms repeatedly apply the cut property to greedily build the MST one edge at a time.

Kruskal’s Algorithm

Kruskal’s takes an edge-centric approach: sort all edges by weight, then add edges from lightest to heaviest, skipping any edge that would create a cycle.

Strategy

Sort all edges by weight in ascending order.
Initialize a Union-Find structure with each vertex in its own set.
Iterate through sorted edges. For each edge (u, v):
- If u and v are in different sets, add the edge to the MST and union the two sets.
- If u and v are already in the same set, skip the edge — adding it would create a cycle.
Stop after adding V - 1 edges.

Union-Find (Disjoint Set Union)

Kruskal’s algorithm needs efficient cycle detection. Union-Find provides near-constant-time operations for determining whether two vertices are connected and for merging components.

public class UnionFind {
    private final int[] parent;
    private final int[] rank;

    public UnionFind(int size) {
        parent = new int[size];
        rank = new int[size];
        for (int i = 0; i < size; i++) {
            parent[i] = i;  // Each vertex is its own root
        }
    }

    public int find(int x) {
        if (parent[x] != x) {
            parent[x] = find(parent[x]);  // Path compression
        }
        return parent[x];
    }

    public boolean union(int x, int y) {
        int rootX = find(x);
        int rootY = find(y);
        if (rootX == rootY) return false;  // Already connected

        // Union by rank: attach shorter tree under taller tree
        if (rank[rootX] < rank[rootY]) {
            parent[rootX] = rootY;
        } else if (rank[rootX] > rank[rootY]) {
            parent[rootY] = rootX;
        } else {
            parent[rootY] = rootX;
            rank[rootX]++;
        }
        return true;
    }

    public boolean connected(int x, int y) {
        return find(x) == find(y);
    }
}

Path compression in find() flattens the tree structure so future queries run faster. Union by rank ensures the shorter tree always attaches under the taller one, keeping the tree balanced. Together, these optimizations give O(α(n)) amortized time per operation, where α is the inverse Ackermann function — effectively constant for all practical input sizes.

Kruskal’s Implementation

public record Edge(int u, int v, int weight) implements Comparable<Edge> {
    @Override
    public int compareTo(Edge other) {
        return Integer.compare(this.weight, other.weight);
    }
}

public List<Edge> kruskal(int vertexCount, List<Edge> edges) {
    Collections.sort(edges);  // Sort by weight ascending
    var uf = new UnionFind(vertexCount);
    var mst = new ArrayList<Edge>();

    for (Edge edge : edges) {
        if (uf.union(edge.u(), edge.v())) {
            mst.add(edge);
            if (mst.size() == vertexCount - 1) break;  // MST complete
        }
    }

    if (mst.size() != vertexCount - 1) {
        throw new IllegalArgumentException("Graph is not connected — no spanning tree exists");
    }
    return mst;
}

Time Complexity: O(E log E) for sorting edges, plus O(E × α(V)) for Union-Find operations. Since α(V) is effectively constant, the total is O(E log E), dominated by the sort. Note that O(E log E) = O(E log V) because E ≤ V², so log E ≤ 2 log V.

Space Complexity: O(V) for the Union-Find structure, plus O(E) for the edge list.

Trace Through an Example

Consider a graph with 4 vertices and these edges:

Edges: (0,1,10), (0,2,6), (0,3,5), (1,3,15), (2,3,4)

After sorting by weight: (2,3,4), (0,3,5), (0,2,6), (0,1,10), (1,3,15)

Step	Edge	Weight	Action	MST Edges
1	(2,3)	4	2 and 3 in different sets → add	[(2,3,4)]
2	(0,3)	5	0 and 3 in different sets → add	[(2,3,4), (0,3,5)]
3	(0,2)	6	0 and 2 already connected (via 0→3→2) → skip	—
4	(0,1)	10	0 and 1 in different sets → add	[(2,3,4), (0,3,5), (0,1,10)]

MST total weight: 4 + 5 + 10 = 19. Three edges for 4 vertices — exactly V - 1.

Prim’s Algorithm

Prim’s takes a vertex-centric approach: start from any vertex and grow the MST by always adding the cheapest edge that connects a tree vertex to a non-tree vertex.

Strategy

Start from an arbitrary vertex. Mark it as part of the MST.
Add all edges from the starting vertex to a min-heap (priority queue).
While the MST has fewer than V - 1 edges:
- Extract the minimum-weight edge from the heap.
- If the destination vertex is already in the MST, discard the edge.
- Otherwise, add the edge to the MST and add all edges from the newly included vertex to the heap.

Prim’s Implementation

public record HeapEntry(int weight, int vertex, int from) implements Comparable<HeapEntry> {
    @Override
    public int compareTo(HeapEntry other) {
        return Integer.compare(this.weight, other.weight);
    }
}

public List<Edge> prim(int vertexCount, Map<Integer, List<Edge>> adjacencyList) {
    var mst = new ArrayList<Edge>();
    var inMST = new boolean[vertexCount];
    var minHeap = new PriorityQueue<HeapEntry>();

    // Start from vertex 0
    inMST[0] = true;
    for (Edge edge : adjacencyList.getOrDefault(0, List.of())) {
        minHeap.add(new HeapEntry(edge.weight(), edge.to(), 0));
    }

    while (!minHeap.isEmpty() && mst.size() < vertexCount - 1) {
        HeapEntry entry = minHeap.poll();

        if (inMST[entry.vertex()]) continue;  // Already in MST — skip

        inMST[entry.vertex()] = true;
        mst.add(new Edge(entry.from(), entry.vertex(), entry.weight()));

        for (Edge edge : adjacencyList.getOrDefault(entry.vertex(), List.of())) {
            if (!inMST[edge.to()]) {
                minHeap.add(new HeapEntry(edge.weight(), edge.to(), entry.vertex()));
            }
        }
    }

    if (mst.size() != vertexCount - 1) {
        throw new IllegalArgumentException("Graph is not connected — no spanning tree exists");
    }
    return mst;
}

Time Complexity: Each edge is inserted into the heap at most once: O(E log E) total. Since E ≤ V², this is equivalent to O(E log V) with a binary heap.

Space Complexity: O(V + E) — the boolean array and the heap.

Trace Through the Same Example

Using the same graph: vertices {0, 1, 2, 3}, edges: (0,1,10), (0,2,6), (0,3,5), (1,3,15), (2,3,4).

Start from vertex 0:

Step	Heap Extract	Weight	Action	MST Edges
Init	—	—	Add edges from 0: (5,3,0), (6,2,0), (10,1,0)	[]
1	(5, vertex=3)	5	3 not in MST → add. Push edges from 3: (4,2,3), (15,1,3)	[(0,3,5)]
2	(4, vertex=2)	4	2 not in MST → add. Push edges from 2: (6,0,2) — but 0 in MST	[(0,3,5), (3,2,4)]
3	(6, vertex=2)	6	2 already in MST → skip	—
4	(10, vertex=1)	10	1 not in MST → add. Push edges from 1: (15,3,1) — but 3 in MST	[(0,3,5), (3,2,4), (0,1,10)]

MST total weight: 5 + 4 + 10 = 19. Same result as Kruskal’s — both algorithms find the same MST (guaranteed when edge weights are distinct).

Kruskal’s vs. Prim’s

Aspect	Kruskal’s	Prim’s
Approach	Edge-centric (sort all edges)	Vertex-centric (grow from a vertex)
Data structure	Union-Find	Priority Queue (min-heap)
Time complexity	O(E log E)	O(E log V) with binary heap
Best for	Sparse graphs (E ≈ V)	Dense graphs (E ≈ V²)
Edge list required?	Yes — needs all edges upfront	No — works with adjacency list
Parallelizable?	Sorting is parallelizable	Harder to parallelize
Handles disconnected graphs	Detects if graph is disconnected	Requires connected graph

When to choose Kruskal’s: The graph is sparse, edges arrive as a list, or you already have a Union-Find structure.

When to choose Prim’s: The graph is dense, you have an adjacency list, or the graph is stored as a matrix where iterating neighbors is O(V).

In interviews, default to Kruskal’s for edge-list inputs and Prim’s for adjacency-list or adjacency-matrix inputs.

Second-Best MST

A follow-up question interviewers ask: “Find the second-best MST.”

Approach: The second-best MST differs from the best MST in exactly one edge swap. For each of the V - 1 edges in the MST:

Remove that edge — the MST splits into two components.
Find the lightest non-MST edge that reconnects the two components.
The total weight of this modified tree is a candidate for the second-best MST.

The answer is the minimum total weight among all V - 1 candidates.

Time: O(V²) with preprocessing to find the maximum edge weight on the MST path between any two vertices. This uses LCA (Lowest Common Ancestor) techniques for efficiency, but the O(V²) brute-force approach is sufficient for interview settings.

MST Properties Worth Memorizing

Uniqueness with distinct weights: If all edge weights in the graph are distinct, the MST is unique. Duplicate weights allow multiple valid MSTs.
Cycle property: The heaviest edge in any cycle does NOT belong to any MST (assuming distinct weights). This is the dual of the cut property.
Adding a non-tree edge creates exactly one cycle. That cycle contains the added edge and the unique path in the MST between the edge’s endpoints.
Removing the heaviest edge from that cycle gives a new MST. If the added edge is lighter than the heaviest tree edge in the cycle, you found a better spanning tree — contradicting the MST’s optimality unless the weights were equal.
Negative weights are fine. MST algorithms work correctly with negative edge weights. Unlike shortest path algorithms, MST algorithms never encounter issues with negative weights because they compare edges, not accumulated path costs.

Applications

Network Design

Connect all servers in a data center with minimum total cable length. Each server is a vertex, each possible cable route is a weighted edge. The MST gives the cheapest wiring plan that maintains full connectivity.

Clustering

Build an MST, then remove the K - 1 heaviest edges to produce K clusters. Each resulting connected component is a cluster. Points within a cluster are more similar (lower edge weight) than points in different clusters. This approach, called single-linkage clustering, is used in hierarchical clustering algorithms.

Approximation Algorithms for TSP

The Traveling Salesman Problem (TSP) is NP-hard, but the MST provides a 2-approximation: the MST’s total weight is at most half the optimal TSP tour. Double the MST edges to form an Eulerian circuit, then shortcut to eliminate repeated vertices — the result is at most twice the optimal tour length.

Classic Interview Problems

Min Cost to Connect All Points

Problem: Given n points on a 2D plane, find the minimum cost to connect all points, where the cost between two points is the Manhattan distance.

Insight: This is a complete graph (every pair of points has an edge). With n up to 1000, there are ~500,000 edges. Kruskal’s works, but Prim’s is more efficient here because the graph is dense:

public int minCostConnectPoints(int[][] points) {
    int n = points.length;
    var inMST = new boolean[n];
    var minDist = new int[n];
    Arrays.fill(minDist, Integer.MAX_VALUE);
    minDist[0] = 0;

    int totalCost = 0;
    for (int added = 0; added < n; added++) {
        // Find the nearest vertex not yet in MST
        int u = -1;
        for (int i = 0; i < n; i++) {
            if (!inMST[i] && (u == -1 || minDist[i] < minDist[u])) {
                u = i;
            }
        }

        inMST[u] = true;
        totalCost += minDist[u];

        // Update distances to remaining vertices
        for (int v = 0; v < n; v++) {
            if (!inMST[v]) {
                int dist = Math.abs(points[u][0] - points[v][0])
                         + Math.abs(points[u][1] - points[v][1]);
                minDist[v] = Math.min(minDist[v], dist);
            }
        }
    }
    return totalCost;
}

This is Prim’s without an explicit priority queue — the O(V²) linear scan replaces the heap. For dense graphs where E = O(V²), this is optimal.

Time: O(V²). Space: O(V).

Connecting Cities With Minimum Cost

Problem: Given n cities numbered 1 to n and a list of connections [city1, city2, cost], find the minimum cost to connect all cities. Return -1 if not all cities can be connected.

Insight: Direct application of Kruskal’s algorithm. Sort connections by cost, use Union-Find to add edges greedily:

public int minimumCost(int n, int[][] connections) {
    Arrays.sort(connections, (a, b) -> Integer.compare(a[2], b[2]));
    var uf = new UnionFind(n + 1);  // 1-indexed cities
    int totalCost = 0;
    int edgesAdded = 0;

    for (int[] conn : connections) {
        if (uf.union(conn[0], conn[1])) {
            totalCost += conn[2];
            edgesAdded++;
            if (edgesAdded == n - 1) return totalCost;
        }
    }
    return -1;  // Not all cities connected
}

Time: O(E log E). Space: O(V).

Edge Cases and Interviewer Tips

Disconnected graph: Neither algorithm produces a spanning tree for a disconnected graph. Detect this by checking if the MST contains exactly V - 1 edges after the algorithm completes. If not, report that no MST exists.
Single vertex: The MST of a single vertex is empty — zero edges, zero cost. Handle this as a base case.
All edges same weight: Every spanning tree is an MST. The algorithm still produces a correct result, but the MST is not unique.
Negative edge weights: Both algorithms handle negative weights correctly. Do not confuse MST with shortest paths — negative weights cause problems for Dijkstra’s algorithm but not for MST algorithms.
Duplicate edges: If multiple edges connect the same pair of vertices, the MST selects the lightest one. Your implementation handles this naturally — Union-Find skips redundant edges, and Prim’s discards edges to already-visited vertices.
State your algorithm choice: Tell the interviewer which algorithm you are using and why. “The input is an edge list and the graph is sparse, so Kruskal’s with Union-Find is the right fit” demonstrates engineering judgment.
Know the cut property: If the interviewer asks “why does this greedy approach work?”, explain the cut property in two sentences. This separates you from candidates who memorize algorithms without understanding proofs.