Skip to main content
mastering ckad certified kubernetes application developer

Job Completion Modes and Parallelism

11 min read Chapter 23 of 87
Summary

Covers the mechanics of Kubernetes Jobs: run-to-completion semantics...

Covers the mechanics of Kubernetes Jobs: run-to-completion semantics vs long-running Deployments, imperative Job creation, completion counts, parallelism, Indexed and NonIndexed completion modes, backoffLimit, activeDeadlineSeconds, restartPolicy constraints, TTL-based cleanup, and Job observation commands.

Job Completion Modes and Parallelism

Jobs vs Deployments: Two Different Contracts

A Deployment promises to keep a set of Pods running indefinitely. If a Pod crashes, the Deployment replaces it. If you scale up, the Deployment creates more replicas. The desired state is a count of running Pods, and the controller works continuously to maintain it.

A Job makes a different promise: it ensures a specified number of Pods complete successfully. Once those Pods exit with a zero exit code, the Job reports success and stops creating new Pods. The desired state is not “keep running” — it is “finish the work.”

This distinction matters because the restart behavior is different by design. A Deployment’s Pods have a restartPolicy of Always. A Job’s Pods must use either Never or OnFailure. Setting restartPolicy: Always on a Job’s Pod template causes validation errors, because a Pod that restarts indefinitely can never be counted as “completed.”

PropertyDeploymentJob
PurposeLong-running servicesBatch / run-to-completion
restartPolicyAlwaysNever or OnFailure
Completion conditionN/A (runs forever)N Pods exit successfully
Scaling modelReplicas (concurrent)Completions + parallelism

When a Deployment Pod exits with code 0, the controller treats it as a failure and restarts it. When a Job Pod exits with code 0, the controller treats it as a success and counts it toward the completion total. Same exit code, opposite interpretation — driven entirely by the controller type.

Creating a Job Imperatively

The fastest way to create a Job on the exam is the imperative command:

kubectl create job my-job --image=busybox -- echo "hello"

This creates a Job named my-job with a single Pod running the busybox image. The -- separator marks the beginning of the container command. Everything after -- becomes the container’s command array.

Check the Job status:

kubectl get jobs

Expected output:

NAME     COMPLETIONS   DURATION   AGE
my-job   1/1           4s         10s

The COMPLETIONS column shows 1/1 — one Pod completed successfully out of one required. The Job is done.

To generate YAML without executing:

kubectl create job my-job --image=busybox --dry-run=client -o yaml -- echo "hello"

This scaffold gives you a starting point to add completion counts, parallelism, and failure policies before applying.

Completions: How Many Pods Must Succeed

The .spec.completions field defines the total number of Pods that must exit successfully for the Job to be considered complete. The default is 1.

Setting completions: 5 means the Job creates Pods until five have exited with code 0. If a Pod fails, it doesn’t count — the Job keeps creating new Pods until either five succeed or the failure limit is reached.

apiVersion: batch/v1
kind: Job
metadata:
  name: data-processor
spec:
  completions: 5
  template:
    spec:
      containers:
        - name: processor
          image: busybox
          command: ["sh", "-c", "echo Processing item && sleep 5"]
      restartPolicy: Never

Apply this manifest and watch the Pods:

kubectl apply -f data-processor.yaml
kubectl get pods -w

You’ll see Pods created one at a time (the default parallelism is 1), each completing and being replaced by the next, until five have succeeded.

Parallelism: How Many Pods Run Concurrently

The .spec.parallelism field controls the maximum number of Pods that the Job controller runs simultaneously. The default is 1, meaning Pods execute sequentially — one after another.

Setting parallelism: 3 with completions: 5 tells the Job controller: “run up to 3 Pods at the same time, and keep going until 5 total have succeeded.”

The controller doesn’t necessarily launch all parallel Pods at once. It ramps up to the parallelism limit and maintains it until the remaining completions drop below that number. If three Pods are running and one completes (bringing the completed count to 4), the controller only needs one more completion, so it won’t launch three new Pods — it launches one.

apiVersion: batch/v1
kind: Job
metadata:
  name: parallel-processor
spec:
  completions: 5
  parallelism: 2
  template:
    spec:
      containers:
        - name: worker
          image: busybox
          command: ["sh", "-c", "echo Working on batch && sleep 10"]
      restartPolicy: Never

Apply and observe:

kubectl apply -f parallel-processor.yaml
kubectl get pods -w

You’ll see two Pods running concurrently. As each completes, the controller launches a replacement until the total of five completions is reached. The sequence looks like this:

parallel-processor-abc12   1/1     Running     0          2s
parallel-processor-def34   1/1     Running     0          2s
parallel-processor-abc12   0/1     Completed   0          12s
parallel-processor-ghi56   1/1     Running     0          1s
parallel-processor-def34   0/1     Completed   0          13s
parallel-processor-jkl78   1/1     Running     0          1s
parallel-processor-ghi56   0/1     Completed   0          12s
parallel-processor-mno90   1/1     Running     0          1s
parallel-processor-jkl78   0/1     Completed   0          12s
parallel-processor-mno90   0/1     Completed   0          12s

Two Pods running at any given time, five total completions. The Job finishes and reports 5/5.

Completion Modes: NonIndexed vs Indexed

Every Job has a .spec.completionMode field that defines how completions are tracked. There are two options.

NonIndexed (Default)

In NonIndexed mode, the Job counts successful Pod completions without distinguishing between them. Pod number three is the same as Pod number one — they’re interchangeable workers. The Job is done when the total count of successful exits reaches .spec.completions.

This is the mode you’ll use most often. It maps to the common pattern of “run this task N times, and I don’t care which instance handles which iteration.”

Indexed

In Indexed mode, each Pod receives a unique index from 0 to completions - 1, exposed through the JOB_COMPLETION_INDEX environment variable. The Job tracks which indices have been completed. It’s not enough for five Pods to succeed — specifically Pods with indices 0, 1, 2, 3, and 4 must each succeed.

apiVersion: batch/v1
kind: Job
metadata:
  name: indexed-job
spec:
  completions: 3
  parallelism: 3
  completionMode: Indexed
  template:
    spec:
      containers:
        - name: worker
          image: busybox
          command:
            - sh
            - -c
            - echo "Processing index $JOB_COMPLETION_INDEX" && sleep 5
      restartPolicy: Never

After applying, check the Pod logs:

kubectl apply -f indexed-job.yaml
kubectl logs -l job-name=indexed-job

Expected output (order may vary):

Processing index 0
Processing index 1
Processing index 2

Each Pod knows its own index. This allows workloads to partition data — index 0 processes records A through M, index 1 processes N through Z, and so on. The Job won’t mark as complete until every index has a successful Pod.

Indexed mode is less common on the CKAD, but understanding it helps distinguish Jobs from Deployments conceptually: replicas are interchangeable, but indexed completions are unique.

Failure Handling: backoffLimit

Not every Pod succeeds. Containers crash, images fail to pull, commands return non-zero exit codes. The .spec.backoffLimit field controls how many times the Job retries failed Pods before giving up entirely. The default is 6.

Each retry uses exponential backoff: the first retry waits 10 seconds, the second waits 20, then 40, capping at 6 minutes. This prevents a broken Job from hammering the cluster with rapid-fire Pod creations.

apiVersion: batch/v1
kind: Job
metadata:
  name: fragile-job
spec:
  backoffLimit: 3
  template:
    spec:
      containers:
        - name: might-fail
          image: busybox
          command: ["sh", "-c", "exit 1"]
      restartPolicy: Never

This Job will never succeed — the container always exits with code 1. After three failed attempts, the Job’s status changes to Failed:

kubectl get jobs fragile-job
NAME          COMPLETIONS   DURATION   AGE
fragile-job   0/1           45s        45s
kubectl describe job fragile-job | grep -A 5 Conditions
Conditions:
  Type    Status  Reason
  ----    ------  ------
  Failed  True    BackoffLimitExceeded

The distinction between restartPolicy: Never and restartPolicy: OnFailure affects how backoffLimit counts. With Never, each failure creates a new Pod — you’ll see multiple failed Pods. With OnFailure, the same Pod restarts in place — you’ll see one Pod with an increasing restart count. In both cases, backoffLimit tracks the total number of failures.

activeDeadlineSeconds: The Time Budget

Where backoffLimit caps the number of retries, activeDeadlineSeconds caps the total wall-clock time. If the Job hasn’t completed within the deadline, Kubernetes terminates all running Pods and marks the Job as failed — regardless of how many retries remain.

apiVersion: batch/v1
kind: Job
metadata:
  name: time-limited
spec:
  activeDeadlineSeconds: 30
  backoffLimit: 5
  template:
    spec:
      containers:
        - name: slow-task
          image: busybox
          command: ["sh", "-c", "echo Starting && sleep 60"]
      restartPolicy: Never

This Job’s Pod sleeps for 60 seconds, but the Job has a 30-second deadline. After 30 seconds, the Pod is terminated and the Job fails with reason DeadlineExceeded:

kubectl describe job time-limited | grep -A 5 Conditions
Conditions:
  Type    Status  Reason
  ----    ------  ------
  Failed  True    DeadlineExceeded

In practice, activeDeadlineSeconds serves as a safety net. If a batch process hangs or enters an infinite loop, the deadline ensures it doesn’t consume cluster resources indefinitely.

Both backoffLimit and activeDeadlineSeconds can be set together. Whichever triggers first causes the Job to fail.

restartPolicy Constraints

A Job’s Pod template requires restartPolicy set to either Never or OnFailure. The choice between them affects observable behavior:

  • Never: When a container fails, the Pod enters a Failed state and a new Pod is created for the retry. You’ll see multiple Pods listed.
  • OnFailure: When a container fails, the kubelet restarts the container within the same Pod. You’ll see one Pod with an increasing restart count.

For the CKAD, Never is the safer default — it’s easier to debug because each attempt produces a distinct Pod with its own logs. With OnFailure, logs from previous attempts within the same Pod can be overwritten.

# With restartPolicy: Never — multiple Pods visible
kubectl get pods -l job-name=fragile-job
NAME                READY   STATUS   RESTARTS   AGE
fragile-job-abc12   0/1     Error    0          30s
fragile-job-def34   0/1     Error    0          20s
fragile-job-ghi56   0/1     Error    0          10s

TTL After Finished: Automatic Cleanup

Completed and failed Jobs linger in the cluster by default. Their Pods remain in Completed or Error state, consuming space in etcd and cluttering kubectl get pods output. The .spec.ttlSecondsAfterFinished field handles cleanup automatically.

apiVersion: batch/v1
kind: Job
metadata:
  name: cleanup-demo
spec:
  ttlSecondsAfterFinished: 120
  template:
    spec:
      containers:
        - name: task
          image: busybox
          command: ["echo", "done"]
      restartPolicy: Never

Two minutes after this Job finishes (success or failure), the TTL controller deletes the Job and its Pods. Setting ttlSecondsAfterFinished: 0 deletes immediately after completion.

This field is particularly useful for Jobs spawned by CronJobs, where dozens of completed Jobs can accumulate over days.

Observing Jobs

Several commands give visibility into Job execution:

# List all Jobs in the current namespace
kubectl get jobs

# Watch Job progress in real time
kubectl get jobs -w

# Detailed Job information including events
kubectl describe job my-job

# Logs from the Job's Pod (works for single-completion Jobs)
kubectl logs job/my-job

# Logs from a specific Pod in a multi-completion Job
kubectl logs parallel-processor-abc12

# List Pods belonging to a specific Job
kubectl get pods -l job-name=my-job

The kubectl logs job/my-job shorthand works when the Job has created a single Pod. For multi-completion Jobs, use the label selector job-name=<job-name> to find individual Pods and inspect their logs.

Putting It Together: A Complete Job

Here’s a Job manifest combining the key fields discussed:

apiVersion: batch/v1
kind: Job
metadata:
  name: batch-report
spec:
  completions: 5
  parallelism: 2
  backoffLimit: 4
  activeDeadlineSeconds: 300
  ttlSecondsAfterFinished: 600
  template:
    spec:
      containers:
        - name: reporter
          image: busybox
          command:
            - sh
            - -c
            - |
              echo "Generating report segment $HOSTNAME"
              sleep 15
              echo "Report segment complete"
      restartPolicy: Never

This Job processes five completions with two Pods running concurrently. It allows up to four retries before failing, imposes a five-minute deadline on the entire Job, and automatically deletes itself ten minutes after finishing. Each Pod prints its hostname to distinguish its output.

Apply and monitor:

kubectl apply -f batch-report.yaml
kubectl get jobs batch-report -w
NAME           COMPLETIONS   DURATION   AGE
batch-report   0/5           1s         1s
batch-report   1/5           16s        16s
batch-report   2/5           17s        17s
batch-report   3/5           32s        32s
batch-report   4/5           33s        33s
batch-report   5/5           48s        48s

Two Pods complete roughly every 15 seconds, and the fifth completes in the final round. Total duration: about 48 seconds for five completions with parallelism of two — three rounds of parallel execution.

Understanding these fields gives you complete control over batch workloads. The next section covers CronJobs, which take everything from this section and put it on a recurring schedule.