Skip to main content
ship it and sleep

Knative Serving Configuration, Scale-to-Zero, and Cold Start Budget

4 min read Chapter 44 of 66

Knative Serving Configuration, Scale-to-Zero, and Cold Start Budget

The Failure

The team deployed the report generation service on Knative with default settings. The service used a Java 21 Spring Boot container that took 12 seconds to start. The default scale-to-zero-grace-period was 30 seconds, so after 30 seconds of no traffic the pod was terminated. A vendor would request a report, wait 15 seconds for the cold start, and then request another report 45 seconds later—triggering another cold start. The service scaled to zero between every single request.

The fix: increase scale-to-zero-pod-retention-period to match the service’s expected request interval, and optimize the container for faster startup.

The Mechanism

Autoscaler Parameters

ParameterDefaultDescription
scale-to-zero-grace-period30sGlobal: time after last request before scale-to-zero begins
scale-to-zero-pod-retention-period0sPer-revision: minimum time to keep last pod alive
target100Concurrent requests per pod before scaling up
target-utilization-percentage70%Scale up when pod reaches this % of target
min-scale0Minimum replicas (0 enables scale-to-zero)
max-scale0 (unlimited)Maximum replicas
initial-scale1Replicas on first deployment
scale-down-delay0sDelay before scaling down after load decreases
metricconcurrencyMetric type: concurrency or rps

Container Startup Optimization

The cold start budget has three components:

Cold Start = Image Pull + Container Init + App Startup + Readiness Probe

Each can be optimized independently:

ComponentOptimization
Image pullPre-pull images (DaemonSet), use small base images, image caching
Container initAvoid init containers, minimize volume mounts
App startupAOT compilation, lazy loading, fast frameworks (Quarkus, Go)
Readiness probeShort initialDelaySeconds, fast health endpoint

The Implementation

Service Profile Configurations

# Profile: Low-traffic API (product import, report generation)
# HARDENED: Optimized for infrequent traffic with acceptable cold start
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: product-import
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/min-scale: "0"
        autoscaling.knative.dev/max-scale: "5"
        autoscaling.knative.dev/target: "10"
        autoscaling.knative.dev/scale-to-zero-pod-retention-period: "15m"
        autoscaling.knative.dev/scale-down-delay: "5m"
    spec:
      containerConcurrency: 10
      timeoutSeconds: 300
      containers:
        - image: ghcr.io/acme/product-import:abc123
# Profile: Scheduled batch job (nightly reports)
# HARDENED: Scale-to-zero quickly, accept cold start
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: report-generator
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/min-scale: "0"
        autoscaling.knative.dev/max-scale: "3"
        autoscaling.knative.dev/target: "1"
        autoscaling.knative.dev/scale-to-zero-pod-retention-period: "2m"
    spec:
      containerConcurrency: 1
      timeoutSeconds: 600
      containers:
        - image: ghcr.io/acme/report-generator:abc123

Measuring Cold Start

# Measure cold start: ensure service is at zero, then time first request
# Step 1: Verify scale-to-zero
kubectl get pods -n production -l serving.knative.dev/service=product-import
# No pods should be listed

# Step 2: Time the first request
time curl -s -o /dev/null -w "%{http_code} %{time_total}s" \
  https://product-import.production.example.com/health

# Step 3: Check pod startup events
kubectl get events -n production --sort-by='.lastTimestamp' \
  --field-selector reason=Started | tail -5

Multi-Container Optimization (Init Before Serve)

# HARDENED: Multi-stage build for minimal cold start
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /server .

FROM gcr.io/distroless/static-debian12:nonroot
COPY --from=builder /server /server
EXPOSE 8080
ENTRYPOINT ["/server"]

Revision Management

# List revisions for a Knative Service
kubectl get revisions -n production -l serving.knative.dev/service=product-import

# Pin traffic to a specific revision (rollback)
kubectl patch ksvc product-import -n production --type merge -p '
spec:
  traffic:
    - revisionName: product-import-00005
      percent: 100
'

# Split traffic between revisions (canary)
kubectl patch ksvc product-import -n production --type merge -p '
spec:
  traffic:
    - revisionName: product-import-00005
      percent: 90
    - revisionName: product-import-00006
      percent: 10
'

The Gate

Cold start budget is the gate. Define a maximum acceptable cold start time for each service profile. If the cold start exceeds the budget after container optimization, the service should not use scale-to-zero. Set min-scale: 1.

Service ProfileCold Start BudgetAction if Exceeded
Background processor30sAcceptable, no change
Internal API5sOptimize container or min-scale: 1
User-facing API2smin-scale: 1 or use standard Deployment

The Recovery

Service keeps scaling to zero between requests: Increase scale-to-zero-pod-retention-period. Set it to 2-3x the expected gap between requests.

Cold start exceeds budget after optimization: Set min-scale: 1. The service keeps one warm pod at all times. You lose scale-to-zero but gain consistent latency.

Old revisions consume resources: Knative keeps old revisions. Set revisionHistoryLimit in the Knative global config to limit retained revisions. Alternatively, clean up with kubectl delete revision.