Job and Scheduling Solutions
SummaryStep-by-step solutions for Exercises 3-5: creating a parallel...
Step-by-step solutions for Exercises 3-5: creating a parallel...
Step-by-step solutions for Exercises 3-5: creating a parallel Job with completions and parallelism, using nodeSelector for Pod placement, and configuring taints with tolerations.
Job and Scheduling Solutions
Solution: Exercise 3 — Parallel Job with Completions
This exercise requires a Job that runs 6 total completions with 3 Pods executing in parallel.
Step 1: Write the Job Manifest
cat > parallel-job.yaml << 'EOF'
apiVersion: batch/v1
kind: Job
metadata:
name: batch-processor
spec:
completions: 6
parallelism: 3
backoffLimit: 4
template:
metadata:
labels:
job: batch-processor
spec:
restartPolicy: Never
containers:
- name: worker
image: busybox:1.36
command: ["sh", "-c"]
args:
- |
echo "Processing batch item on $(hostname) at $(date)"
sleep 5
echo "Done"
EOF
Key fields and their effects:
completions: 6: The Job requires 6 successful Pod completions before it is considered done.parallelism: 3: Kubernetes runs up to 3 Pods simultaneously. When one completes, a new one starts until all 6 completions are reached.backoffLimit: 4: If a Pod fails, the Job retries. After 4 consecutive failures, the Job is marked as failed.restartPolicy: Never: Failed containers are not restarted within the same Pod. Instead, the Job controller creates a new Pod (up to the backoff limit). This is required for Jobs —Alwaysis not allowed.
Step 2: Apply and Monitor
kubectl apply -f parallel-job.yaml
Watch the Pods as they execute:
kubectl get pods -l job=batch-processor -w
Expected progression:
NAME READY STATUS RESTARTS AGE
batch-processor-abc12 1/1 Running 0 2s
batch-processor-def34 1/1 Running 0 2s
batch-processor-ghi56 1/1 Running 0 2s
batch-processor-abc12 0/1 Completed 0 7s
batch-processor-jkl78 1/1 Running 0 8s
batch-processor-def34 0/1 Completed 0 7s
batch-processor-mno90 1/1 Running 0 8s
batch-processor-ghi56 0/1 Completed 0 7s
batch-processor-pqr12 1/1 Running 0 8s
Three Pods start simultaneously. As each completes, a new one launches until all 6 have run.
Step 3: Verify Completion
kubectl get job batch-processor
Expected output:
NAME COMPLETIONS DURATION AGE
batch-processor 6/6 14s 20s
The 6/6 confirms all completions succeeded. Check individual Pod logs:
kubectl logs -l job=batch-processor --prefix
Each Pod should show its processing message and “Done”. The --prefix flag prepends the Pod name to each log line, making it clear which Pod produced which output.
Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
Completions stuck at 3/6 | Pod failures exhausted backoffLimit | Check failed Pod logs: kubectl describe pod <name> |
| Only 1 Pod runs at a time | parallelism not set or set to 1 | Verify parallelism: 3 is in the Job spec |
Job shows BackoffLimitExceeded | Command exits non-zero | Fix the container command and recreate the Job |
| Pods remain after Job completes | Expected behavior | Jobs retain completed Pods for log inspection; use ttlSecondsAfterFinished to auto-clean |
Cleanup
kubectl delete job batch-processor
Solution: Exercise 4 — nodeSelector for Pod Placement
This exercise requires labeling a worker node and creating a Pod that schedules exclusively on that node using nodeSelector.
Step 1: List Nodes and Their Roles
kubectl get nodes --show-labels
In a Kind cluster, you’ll see nodes named like kind-worker and kind-worker2. Identify a worker node to target.
Step 2: Label the Target Node
kubectl label node kind-worker disk=ssd
Expected output:
node/kind-worker labeled
Verify the label:
kubectl get node kind-worker --show-labels | grep disk=ssd
Step 3: Create the Pod with nodeSelector
cat > nodeselector-pod.yaml << 'EOF'
apiVersion: v1
kind: Pod
metadata:
name: ssd-pod
labels:
app: ssd-workload
spec:
nodeSelector:
disk: ssd
containers:
- name: app
image: nginx:1.25
ports:
- containerPort: 80
EOF
The nodeSelector field tells the scheduler to place this Pod only on nodes that have the label disk=ssd. If no node matches, the Pod stays in Pending indefinitely — the scheduler will not compromise on nodeSelector constraints.
kubectl apply -f nodeselector-pod.yaml
Step 4: Verify Placement
kubectl get pod ssd-pod -o wide
Expected output:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ssd-pod 1/1 Running 0 5s 10.244.1.3 kind-worker <none> <none>
The NODE column must show kind-worker — the node you labeled. If the Pod landed on a different node, the nodeSelector was not applied correctly.
Step 5: Test the Constraint
Create a second Pod targeting a label that no node has:
kubectl run no-match --image=nginx:1.25 --dry-run=client -o yaml | \
kubectl patch --local -f - -p '{"spec":{"nodeSelector":{"disk":"nvme"}}}' --type merge -o yaml | \
kubectl apply -f -
Or write the manifest directly:
cat > no-match-pod.yaml << 'EOF'
apiVersion: v1
kind: Pod
metadata:
name: no-match
spec:
nodeSelector:
disk: nvme
containers:
- name: app
image: nginx:1.25
EOF
kubectl apply -f no-match-pod.yaml
Check the Pod status:
kubectl get pod no-match
Expected output:
NAME READY STATUS RESTARTS AGE
no-match 0/1 Pending 0 10s
The Pod remains Pending because no node carries disk=nvme. Run kubectl describe pod no-match and look at the Events section — you’ll see a message like:
Warning FailedScheduling 0/3 nodes are available: 3 node(s) didn't match Pod's node affinity/selector.
Cleanup
kubectl delete pod ssd-pod no-match
kubectl label node kind-worker disk-
The trailing - removes the disk label from the node.
Solution: Exercise 5 — Taints and Tolerations
This exercise requires tainting a node so that no regular Pod can schedule on it, then creating a Pod with a toleration that allows it past the taint.
Step 1: Taint a Worker Node
kubectl taint nodes kind-worker2 dedicated=special:NoSchedule
Expected output:
node/kind-worker2 tainted
This taint has three parts:
- Key:
dedicated - Value:
special - Effect:
NoSchedule— Pods without a matching toleration will not be placed on this node. Existing Pods are unaffected (useNoExecuteto evict running Pods).
Verify the taint:
kubectl describe node kind-worker2 | grep -A 3 Taints
Expected output:
Taints: dedicated=special:NoSchedule
Step 2: Prove the Taint Works
Create a Pod without any toleration:
kubectl run taint-test --image=nginx:1.25
If kind-worker2 is the only worker node, this Pod will stay Pending. If other worker nodes exist, the Pod schedules on one of them, avoiding kind-worker2:
kubectl get pod taint-test -o wide
The NODE column should not show kind-worker2.
Step 3: Create a Pod with the Matching Toleration
cat > toleration-pod.yaml << 'EOF'
apiVersion: v1
kind: Pod
metadata:
name: special-pod
labels:
app: special-workload
spec:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "special"
effect: "NoSchedule"
nodeSelector:
kubernetes.io/hostname: kind-worker2
containers:
- name: app
image: nginx:1.25
ports:
- containerPort: 80
EOF
Two fields work together here:
tolerations: Allows the Pod to schedule on nodes with thededicated=special:NoScheduletaint. The toleration does not force the Pod onto the tainted node — it permits it.nodeSelector: Forces the Pod ontokind-worker2specifically. Without this, the Pod could schedule on any node (tainted or not), since tolerations are permissive, not directive.
The operator: "Equal" means the toleration matches only when the taint’s key, value, and effect all match exactly. The alternative operator: "Exists" matches any taint with the specified key regardless of value.
kubectl apply -f toleration-pod.yaml
Step 4: Verify Placement
kubectl get pod special-pod -o wide
Expected output:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
special-pod 1/1 Running 0 4s 10.244.2.5 kind-worker2 <none> <none>
The Pod runs on kind-worker2 because it tolerates the taint and the nodeSelector directs it there.
Step 5: Verify the Taint Still Blocks Other Pods
cat > no-toleration.yaml << 'EOF'
apiVersion: v1
kind: Pod
metadata:
name: blocked-pod
spec:
nodeSelector:
kubernetes.io/hostname: kind-worker2
containers:
- name: app
image: nginx:1.25
EOF
kubectl apply -f no-toleration.yaml
kubectl get pod blocked-pod
Expected output:
NAME READY STATUS RESTARTS AGE
blocked-pod 0/1 Pending 0 5s
The scheduler cannot place this Pod on kind-worker2 (tainted, no toleration) and cannot place it on any other node (nodeSelector restricts to kind-worker2). The Pod is stuck in Pending.
Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
Tolerating Pod still Pending | Toleration key/value/effect mismatch | Compare kubectl describe node taint with Pod tolerations field-by-field |
| Pod schedules on tainted node without toleration | Taint was not applied | Re-run kubectl describe node and verify the Taints line |
NoExecute evicts running Pods | Wrong effect chosen | Use NoSchedule to block new Pods only; NoExecute evicts existing ones |
Toleration uses Exists but Pod still blocked | Effect mismatch | Exists ignores value but still requires matching effect |
Cleanup
kubectl delete pod taint-test special-pod blocked-pod
kubectl taint nodes kind-worker2 dedicated=special:NoSchedule-
The trailing - removes the taint. Verify:
kubectl describe node kind-worker2 | grep Taints
Expected output:
Taints: <none>