Scheduling and Node Placement

When you create a Pod, you rarely think about which node it lands on. The Kubernetes scheduler picks a node automatically, balancing resource availability, constraints, and preferences across the cluster. For most workloads, this default behavior is exactly right. But some workloads need more control.

A machine learning Pod needs a node with a GPU. A database Pod should avoid sharing a node with another database instance. An API server Pod should land near its cache Pod to minimize network latency. A compliance-sensitive workload must run only in a specific availability zone. These requirements can’t be met by random placement — they need explicit scheduling rules.

How the Scheduler Decides

The Kubernetes scheduler runs a two-phase algorithm for every unscheduled Pod:

Phase 1 — Filtering. The scheduler eliminates nodes that can’t run the Pod. A node is filtered out if it lacks sufficient CPU or memory, if the Pod’s nodeSelector doesn’t match the node’s labels, if the node has a taint the Pod doesn’t tolerate, or if the Pod’s affinity rules exclude it. After filtering, only feasible nodes remain.

Phase 2 — Scoring. Each feasible node receives a score based on multiple factors: how well the node’s resources match the Pod’s requests, whether the Pod’s preferred affinity rules favor it, how balanced the node’s existing workload is, and other priority functions. The node with the highest score wins. If multiple nodes tie, the scheduler picks one at random.

The key insight is that filtering is binary (pass or fail) while scoring is a gradient (better or worse). This maps directly to the Kubernetes API: hard rules like requiredDuringSchedulingIgnoredDuringExecution are filtering constraints — they eliminate nodes. Soft rules like preferredDuringSchedulingIgnoredDuringExecution are scoring preferences — they influence the ranking without eliminating options.

The Tools for Controlling Placement

Kubernetes provides several mechanisms for influencing where Pods land, each suited to different use cases:

nodeSelector is the most straightforward tool. It’s a key-value label match: schedule this Pod only on nodes that have these labels. No complex expressions, no weights — if the labels match, the node passes the filter.

nodeAffinity is the expressive version of nodeSelector. It supports operators like In, NotIn, Exists, DoesNotExist, and Gt/Lt, plus both hard requirements and soft preferences with weighted scoring.

podAffinity and podAntiAffinity control placement relative to other Pods, not nodes. “Schedule this Pod on the same node as Pods labeled app=cache” is pod affinity. “Don’t schedule this Pod on any node that already has a Pod labeled app=db” is pod anti-affinity.

Taints and tolerations work in reverse. Instead of Pods choosing nodes, nodes repel Pods. A taint on a node says “no Pods allowed unless they explicitly tolerate me.” This is how Kubernetes keeps workloads off control plane nodes and reserves specialized hardware for specific teams.

Topology spread constraints distribute Pods evenly across failure domains — nodes, zones, or regions — preventing concentration of replicas on a single host.

Chapter Structure

The first section covers node selection and affinity rules: how the scheduler’s two-phase process works, direct node assignment with nodeName, simple matching with nodeSelector, expressive rules with nodeAffinity, and inter-Pod placement with podAffinity and podAntiAffinity.

The second section covers taints, tolerations, and topology spread constraints, and concludes with practice exercises that integrate concepts from Chapters 6 through 9.