Layer Ordering, Cache Busting, and Build Context Optimization
Layer Ordering, Cache Busting, and Build Context Optimization
The Failure
The payments service (Java/Gradle) has a Dockerfile that copies the entire project before running the build:
# FRAGILE: Full project copy before build, cache busted on every change
FROM eclipse-temurin:21-jdk AS build
WORKDIR /app
COPY . .
RUN ./gradlew bootJar --no-daemon
FROM eclipse-temurin:21-jre
WORKDIR /app
COPY --from=build /app/build/libs/*.jar app.jar
CMD ["java", "-jar", "app.jar"]
Every source file change, every test file change, every README edit invalidates the COPY . . layer. Gradle re-downloads all dependencies (2 minutes) and recompiles the entire project (3 minutes). The build takes 5 minutes on every push.
With proper layer ordering and Gradle cache mounts, the same build takes 45 seconds when only source code changes.
The Mechanism
Docker processes instructions top to bottom. For each instruction, it checks whether the input has changed since the last build. For COPY, “changed” means any file in the source differs (by content hash). For RUN, “changed” means the command string differs (or a preceding layer was rebuilt).
The optimal ordering principle: instructions with inputs that change least frequently go first. Dependencies change less often than source code. Configuration files change less often than dependencies.
Frequency of change (low → high):
Base image → system packages → dependency files → dependency install → source code → build
Each horizontal division is a potential cache boundary. Changes below a boundary only rebuild layers below it. Changes above a boundary rebuild everything below.
BuildKit Cache Mounts
BuildKit (the default builder in modern Docker) supports cache mounts: persistent directories that survive across builds. A Gradle cache mount keeps downloaded dependencies and compiled outputs between builds, even when the layer is rebuilt.
RUN --mount=type=cache,target=/root/.gradle \
./gradlew bootJar --no-daemon
The /root/.gradle directory persists across builds. Gradle detects that dependencies are already downloaded and compilation outputs are partially valid. Incremental builds drop from 5 minutes to under 1 minute.
The Implementation
Java/Gradle Optimized Dockerfile
# HARDENED: Layer ordering + cache mounts for Java/Gradle
FROM eclipse-temurin:21-jdk@sha256:abc123... AS build
WORKDIR /app
# Layer 1: Gradle wrapper (changes rarely)
COPY gradlew ./
COPY gradle/ gradle/
RUN chmod +x gradlew
# Layer 2: Dependency resolution (changes when build.gradle changes)
COPY build.gradle.kts settings.gradle.kts ./
RUN --mount=type=cache,target=/root/.gradle \
./gradlew dependencies --no-daemon
# Layer 3: Source code and build (changes on every commit)
COPY src/ src/
RUN --mount=type=cache,target=/root/.gradle \
./gradlew bootJar --no-daemon -x test
# Production stage
FROM eclipse-temurin:21-jre@sha256:def456...
WORKDIR /app
COPY --from=build /app/build/libs/*.jar app.jar
USER 1001
HEALTHCHECK --interval=30s --timeout=3s \
CMD curl -f http://localhost:8080/actuator/health || exit 1
EXPOSE 8080
CMD ["java", "-jar", "app.jar"]
Go Optimized Dockerfile
# HARDENED: Go module cache + static binary
FROM golang:1.22@sha256:ghi789... AS build
WORKDIR /app
# Layer 1: Module download (changes when go.mod/go.sum change)
COPY go.mod go.sum ./
RUN go mod download && go mod verify
# Layer 2: Build (changes on source code change)
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /checkout ./cmd/server
# Production stage: distroless for minimal attack surface
FROM gcr.io/distroless/static-debian12@sha256:jkl012...
COPY --from=build /checkout /checkout
USER nonroot:nonroot
EXPOSE 8080
CMD ["/checkout"]
The Go image uses gcr.io/distroless/static as the production base. Distroless images contain no shell, no package manager, and no utilities. The attack surface is minimal. The image size for the checkout service is 18 MB.
Auditing Build Context Size
# Check what Docker sends as build context
docker build --no-cache --progress=plain . 2>&1 | grep "transferring context"
# Output: transferring context: 2.1MB
# Without .dockerignore, it might be:
# transferring context: 450MB (node_modules, .git, etc.)
# HARDENED: Build context audit in CI
- name: Audit build context
run: |
# Create a temporary Dockerfile that just copies context
echo "FROM scratch" > Dockerfile.audit
echo "COPY . ." >> Dockerfile.audit
context_size=$(docker build --no-cache -f Dockerfile.audit . 2>&1 | \
grep "transferring context" | \
grep -oP '[\d.]+[KMGT]?B')
rm Dockerfile.audit
echo "Build context size: $context_size" >> $GITHUB_STEP_SUMMARY
The Gate
Build context size and image size are observability metrics that can become gates. When the context size exceeds a threshold, it usually means .dockerignore is missing an entry. When the image size exceeds a threshold, it usually means dev dependencies leaked into the production stage.
The Recovery
When a build is slow due to poor layer ordering:
- Identify which layer is being rebuilt unnecessarily.
docker build --progress=plainshowsCACHEDorRUNfor each layer. - Reorder to put the expensive, rarely-changing layers first.
- Add cache mounts for language-specific caches (Gradle, Maven, pip, npm).
- Verify with two consecutive builds: change a source file and rebuild. Only the source copy and build layers should run.