Skip to main content
ship it and sleep

Environments and Promotion: From Feature Branch to Production Without Guessing

5 min read Chapter 19 of 66

Environments and Promotion

A deployment pipeline without defined environments is a script that pushes code somewhere. A deployment pipeline with defined environments is a promotion chain: code moves from dev to staging to production through gates that verify it is safe to promote.

The word “environment” means a Kubernetes namespace with a specific configuration, a specific set of secrets, and a specific version of the application. Dev runs the latest commit on main. Staging runs the version that passed all CI gates. Production runs the version that passed staging validation.

Environment promotion flow

The Failure

The payments team deployed to production on a Friday. The deployment succeeded. The service started. Health checks passed. Ten minutes later, the on-call engineer got paged: payments were failing with a connection timeout to the payment processor.

The root cause: the staging environment used a different payment processor endpoint than production. The staging endpoint was a sandbox that accepted any request. The production endpoint required mutual TLS that the team had not configured. The deployment was “tested in staging” but staging did not match production.

Environment parity is not a nice-to-have. It is a prerequisite for trusting your staging validation.

The Mechanism

Environment Hierarchy

EnvironmentPurposeNamespaceImage SourceConfig SourcePromotion Trigger
DevLatest code, integration testingdevCI build from mainvalues-dev.yamlAutomatic on CI pass
StagingPre-production validationstagingPromoted from devvalues-staging.yamlAutomatic on dev validation pass
ProductionLive trafficproductionPromoted from stagingvalues-production.yamlManual approval + gate pass

Each environment is a Kubernetes namespace managed by ArgoCD. The ArgoCD Application for each environment points to the same Helm chart but uses different values files.

Promotion Flow

Promotion is not copying files. Promotion is updating a Git reference in the infra repo:

  1. CI builds the image, tags it with the commit SHA
  2. CI updates values-dev.yaml with the new image tag → ArgoCD syncs dev
  3. Dev validation passes (smoke tests, health checks)
  4. Pipeline updates values-staging.yaml with the same image tag → ArgoCD syncs staging
  5. Staging validation passes (contract tests, performance baseline)
  6. Team lead approves production promotion
  7. Pipeline updates values-production.yaml → ArgoCD syncs production

The image never changes. The same image digest moves through environments. Only the configuration changes.

The Implementation

Infra Repo Structure

ecommerce-infra/
├── apps/
│   ├── checkout-service/
│   │   ├── base/
│   │   │   ├── deployment.yaml
│   │   │   ├── service.yaml
│   │   │   └── kustomization.yaml
│   │   └── overlays/
│   │       ├── dev/
│   │       │   ├── kustomization.yaml
│   │       │   └── patch-replicas.yaml
│   │       ├── staging/
│   │       │   ├── kustomization.yaml
│   │       │   └── patch-replicas.yaml
│   │       └── production/
│   │           ├── kustomization.yaml
│   │           ├── patch-replicas.yaml
│   │           └── patch-resources.yaml

Promotion Workflow

# HARDENED: Automated promotion with validation gates
name: promote
on:
  workflow_dispatch:
    inputs:
      service:
        description: "Service to promote"
        required: true
        type: choice
        options:
          [
            checkout-service,
            catalog-service,
            inventory-service,
            payments-service,
            frontend-shell,
          ]
      from:
        description: "Source environment"
        required: true
        type: choice
        options: [dev, staging]
      to:
        description: "Target environment"
        required: true
        type: choice
        options: [staging, production]

jobs:
  validate-promotion:
    runs-on: ubuntu-latest
    outputs:
      image-tag: ${{ steps.get-tag.outputs.tag }}
    steps:
      - uses: actions/checkout@v4
        with:
          repository: acme/ecommerce-infra

      - name: Get current image tag in source environment
        id: get-tag
        run: |
          TAG=$(yq '.images[0].newTag' \
            apps/${{ inputs.service }}/overlays/${{ inputs.from }}/kustomization.yaml)
          echo "tag=$TAG" >> "$GITHUB_OUTPUT"
          echo "Promoting ${{ inputs.service }} image $TAG from ${{ inputs.from }} to ${{ inputs.to }}"

      - name: Verify source environment is healthy
        run: |
          kubectl --context=${{ inputs.from }} -n ${{ inputs.from }} \
            rollout status deployment/${{ inputs.service }} --timeout=60s

  approve:
    runs-on: ubuntu-latest
    needs: [validate-promotion]
    if: inputs.to == 'production'
    environment: production
    steps:
      - run: echo "Production promotion approved for ${{ inputs.service }}"

  promote:
    runs-on: ubuntu-latest
    needs: [validate-promotion, approve]
    if: always() && needs.validate-promotion.result == 'success' && (inputs.to != 'production' || needs.approve.result == 'success')
    steps:
      - uses: actions/checkout@v4
        with:
          repository: acme/ecommerce-infra
          token: ${{ secrets.INFRA_REPO_TOKEN }}

      - name: Update target environment
        run: |
          cd apps/${{ inputs.service }}/overlays/${{ inputs.to }}
          kustomize edit set image \
            ${{ inputs.service }}=ghcr.io/acme/${{ inputs.service }}:${{ needs.validate-promotion.outputs.image-tag }}

      - name: Commit and push
        run: |
          git config user.name "promotion-bot"
          git config user.email "[email protected]"
          git add -A
          git commit -m "promote(${{ inputs.to }}): ${{ inputs.service }} → ${{ needs.validate-promotion.outputs.image-tag }}"
          git push

ArgoCD Application per Environment

# HARDENED: ArgoCD Application with environment-specific config
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: checkout-service-staging
  namespace: argocd
  labels:
    app.kubernetes.io/part-of: ecommerce
    environment: staging
spec:
  project: ecommerce
  source:
    repoURL: https://github.com/acme/ecommerce-infra.git
    targetRevision: main
    path: apps/checkout-service/overlays/staging
  destination:
    server: https://kubernetes.default.svc
    namespace: staging
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true
    retry:
      limit: 3
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 1m

The Gate

Promotion from dev to staging is automatic if dev health checks pass. Promotion from staging to production requires:

  1. All staging health checks pass for at least 10 minutes
  2. Staging performance baseline is within 10% of previous deployment (CH17)
  3. No open P1/P2 incidents
  4. Manual approval from a team lead via GitHub environment protection rules

The environment: production setting in the approval job activates GitHub’s environment protection rules, which can require specific reviewers, wait timers, and branch restrictions.

The Recovery

Wrong image promoted to production: Revert the infra repo commit. ArgoCD will sync the previous image tag. No code changes needed.

Configuration drift detected in production: Someone manually changed a Kubernetes resource. ArgoCD’s selfHeal: true will revert it. If selfHeal is disabled, the ArgoCD dashboard shows the drift. Investigate who made the manual change and why. Then enable selfHeal.

Staging differs from production in infrastructure: Use the same Kustomize base for all environments. Differences should only be in patches (replicas, resource limits, external endpoints). Review overlays regularly to ensure they only contain intended differences.