Skip to main content
ship it and sleep

Pipeline Architecture: Stages, Jobs, Artifacts, and the Dependency Graph

7 min read Chapter 4 of 66

Pipeline Architecture

A pipeline is a directed acyclic graph. Jobs are nodes. Dependencies are edges. The critical path, the longest chain of sequential dependencies, determines how long the pipeline takes. Everything else is parallelism you are leaving on the table.

Most teams start with a pipeline that is a single job with 15 steps, running sequentially. Build, then test, then lint, then scan, then push, then deploy. Total duration: the sum of every step. If the scan takes 4 minutes and the integration tests take 6 minutes, the pipeline takes at least 10 minutes even though those two steps have no dependency on each other.

Pipeline stage dependency graph showing sequential vs parallel job execution

The diagram shows two pipeline architectures for the same set of tasks. On the left, a sequential pipeline runs seven steps one after another, totaling 23 minutes. On the right, the same steps are organized as a DAG with parallel branches: unit tests, integration tests, and security scanning run concurrently after the build step, reducing the total duration to 14 minutes. The critical path runs through the build and integration test stages. Every other branch completes while the critical path is still running.

The Failure

The checkout service pipeline runs in a single job. Build the Docker image (3 min), run unit tests (2 min), run integration tests (6 min), run security scan (4 min), push the image (1 min), update the infra repo (1 min). Total: 17 minutes.

A developer pushes a fix for a typo in a log message. They wait 17 minutes for the pipeline to complete. The security scan and integration tests have no dependency on each other, but they run sequentially because the pipeline is a flat list of steps, not a graph.

The team adds a Locust performance test (5 min) and a contract test suite (3 min). The pipeline is now 25 minutes. Developers start batching commits to avoid the wait. Batched commits make rollbacks harder because each deployment contains multiple changes. The pipeline’s structure is causing deployment risk.

The Mechanism

GitHub Actions workflows consist of jobs. Each job runs on a separate runner. Jobs can depend on other jobs via the needs: keyword. Jobs without dependencies run in parallel by default.

The key insight: a job is an isolation boundary. Each job gets a fresh runner, a clean filesystem, and no shared state with other jobs unless explicitly passed through artifacts or outputs. This isolation is a feature, not a limitation. It means that test jobs cannot accidentally depend on build artifacts that happen to be in the same working directory. Dependencies must be declared.

                    ┌──────────┐
                    │  build   │
                    └────┬─────┘

              ┌──────────┼──────────┐
              │          │          │
              v          v          v
        ┌─────────┐┌─────────┐┌─────────┐
        │  unit   ││ integr- ││  scan   │
        │  test   ││ ation   ││         │
        └────┬────┘└────┬────┘└────┬────┘
              │          │          │
              └──────────┼──────────┘

                         v
                  ┌──────────┐
                  │  push    │
                  └────┬─────┘

                       v
                ┌────────────┐
                │update-infra│
                └────────────┘

The critical path is: build → integration test → push → update-infra = 3 + 6 + 1 + 1 = 11 minutes. Unit tests (2 min) and scanning (4 min) complete while integration tests are still running. Total pipeline duration dropped from 17 to 11 minutes without removing any work.

The Implementation

# FRAGILE: Single job, sequential execution
name: ci
on:
  push:
    branches: [main]
  pull_request:

jobs:
  pipeline:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build image
        run: docker build -t ghcr.io/acme/checkout-service:${{ github.sha }} .
      - name: Unit tests
        run: docker run --rm ghcr.io/acme/checkout-service:${{ github.sha }} ./run-unit-tests.sh
      - name: Integration tests
        run: |
          docker compose -f docker-compose.test.yml up -d
          docker run --rm --network=host ghcr.io/acme/checkout-service:${{ github.sha }} ./run-integration-tests.sh
          docker compose -f docker-compose.test.yml down
      - name: Security scan
        run: trivy image ghcr.io/acme/checkout-service:${{ github.sha }}
      - name: Push image
        run: docker push ghcr.io/acme/checkout-service:${{ github.sha }}
      - name: Update infra repo
        run: |
          git clone https://x-access-token:${{ secrets.INFRA_TOKEN }}@github.com/acme/ecommerce-infra.git
          cd ecommerce-infra
          # ... update and push
# HARDENED: DAG with parallel jobs and explicit artifact passing
name: ci
on:
  push:
    branches: [main]
  pull_request:

env:
  IMAGE: ghcr.io/acme/checkout-service
  REGISTRY: ghcr.io

jobs:
  build:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
    outputs:
      image-tag: ${{ github.sha }}
      image-digest: ${{ steps.build.outputs.digest }}
    steps:
      - uses: actions/checkout@v4

      - uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Build and push
        id: build
        uses: docker/build-push-action@v6
        with:
          context: .
          push: true
          tags: ${{ env.IMAGE }}:${{ github.sha }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  unit-test:
    runs-on: ubuntu-latest
    needs: [build]
    steps:
      - uses: actions/checkout@v4
      - name: Run unit tests
        run: |
          docker run --rm \
            ${{ env.IMAGE }}@${{ needs.build.outputs.image-digest }} \
            ./run-unit-tests.sh

  integration-test:
    runs-on: ubuntu-latest
    needs: [build]
    steps:
      - uses: actions/checkout@v4
      - name: Start dependencies
        run: docker compose -f docker-compose.test.yml up -d --wait
      - name: Run integration tests
        run: |
          docker run --rm --network=host \
            ${{ env.IMAGE }}@${{ needs.build.outputs.image-digest }} \
            ./run-integration-tests.sh
      - name: Stop dependencies
        if: always()
        run: docker compose -f docker-compose.test.yml down

  scan:
    runs-on: ubuntu-latest
    needs: [build]
    steps:
      - name: Trivy vulnerability scan
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: ${{ env.IMAGE }}@${{ needs.build.outputs.image-digest }}
          exit-code: 1
          severity: CRITICAL,HIGH
          format: table

  push-summary:
    runs-on: ubuntu-latest
    needs: [build, unit-test, integration-test, scan]
    if: github.ref == 'refs/heads/main'
    steps:
      - name: Emit pipeline summary
        run: |
          echo "## Pipeline Complete" >> $GITHUB_STEP_SUMMARY
          echo "All gates passed for \`${{ env.IMAGE }}:${{ github.sha }}\`" >> $GITHUB_STEP_SUMMARY

  update-infra:
    runs-on: ubuntu-latest
    needs: [build, unit-test, integration-test, scan]
    if: github.ref == 'refs/heads/main'
    steps:
      - uses: actions/checkout@v4
        with:
          repository: acme/ecommerce-infra
          token: ${{ secrets.INFRA_REPO_TOKEN }}
          path: infra

      - name: Update staging image
        working-directory: infra
        run: |
          cd overlays/staging/checkout
          kustomize edit set image \
            ${{ env.IMAGE }}=${{ env.IMAGE }}:${{ needs.build.outputs.image-tag }}

      - name: Commit and push
        working-directory: infra
        run: |
          git config user.name "github-actions[bot]"
          git config user.email "github-actions[bot]@users.noreply.github.com"
          git add .
          git commit -m "checkout: promote ${{ needs.build.outputs.image-tag }} to staging"
          git push

The Gate

The update-infra job has needs: [build, unit-test, integration-test, scan]. All four jobs must succeed before the infra repo is updated. If any job fails, the graph stops. There is no path from a failed scan to a deployed image.

This is the fundamental value of modeling the pipeline as a DAG: the dependency edges are the gates. Adding a new gate (contract tests, Locust performance, SBOM validation) means adding a new job and adding it to the needs: list of the promotion job.

The Recovery

When a gate fails on a feature branch, the recovery is fixing the code and pushing again. When a gate fails on main, the recovery is the same, but with more urgency: the main branch has a broken pipeline, and no new code can be promoted until it is fixed.

To prevent this, use branch protection rules that require the CI workflow to pass before merging to main. The pipeline runs on the pull request, all gates pass, the PR is merged, and the pipeline runs again on main. The second run should be identical (reproducibility), but it catches integration issues that only appear when multiple PRs merge close together.

Measuring the Critical Path

The critical path is the longest chain of sequential job durations. To find it, trace every path from the first job to the last and sum the durations:

Path 1: build (3m) → unit-test (2m) → update-infra (1m) = 6m
Path 2: build (3m) → integration-test (6m) → update-infra (1m) = 10m
Path 3: build (3m) → scan (4m) → update-infra (1m) = 8m

Path 2 is the critical path at 10 minutes. Optimizing unit tests or scanning does not reduce the total pipeline duration. Only optimizing the build step or the integration test step matters.

This analysis determines where to invest optimization effort. If the team spends a week reducing scan time from 4 minutes to 1 minute, the pipeline still takes 10 minutes. If they spend the same week reducing integration test time from 6 minutes to 3 minutes, the pipeline drops to 7 minutes.