Performance Regression Testing with Locust

Performance bugs are invisible to unit tests. A developer adds an N+1 query. All tests pass. The code merges. Under load, p99 latency jumps from 200ms to 4 seconds. The next Black Friday, the checkout service falls over.

Locust runs load tests as Python code. The scenarios are version-controlled, reviewed in PRs, and executed in CI. When latency exceeds the budget, the pipeline fails.

Performance test pyramid

The Failure

The team ran manual load tests before each release. The QA engineer would start Locust locally, ramp to 500 users, watch the dashboard for 10 minutes, and declare “looks fine.” No recorded baselines. No automated comparison. When p99 latency increased by 50ms over three releases, nobody noticed because nobody compared to a baseline. By the sixth release, the service was 300ms slower than it had been six months earlier.

Automated performance tests with recorded baselines and pass/fail criteria catch regressions at merge time.

The Mechanism

Performance Budget

A performance budget defines the maximum acceptable values for key metrics:

Metric	Budget	Measurement
p50 latency	100ms	50th percentile response time
p99 latency	500ms	99th percentile response time
Error rate	0.1%	Percentage of non-2xx responses
Throughput	200 RPS	Minimum requests per second at target load

Test Levels

Level	When	Duration	Users	Purpose
Smoke	Every PR	30s	10	Catch obvious regressions
Load	Pre-merge to main	5min	100	Validate performance budgets
Stress	Pre-release	30min	500+	Find breaking points

The Implementation

Locust Scenarios for E-Commerce

# tests/performance/locustfile.py
# HARDENED: E-commerce load test scenarios
from locust import HttpUser, task, between, tag


class ShopperUser(HttpUser):
    wait_time = between(1, 3)
    host = "http://localhost:8080"

    def on_start(self):
        # Login once per user
        response = self.client.post("/api/auth/login", json={
            "email": "[email protected]",
            "password": "loadtest-password"
        })
        self.token = response.json().get("token", "")
        self.client.headers.update({
            "Authorization": f"Bearer {self.token}"
        })

    @tag("browse")
    @task(5)
    def browse_catalog(self):
        self.client.get("/api/catalog/products?page=1&limit=20")

    @tag("browse")
    @task(3)
    def view_product(self):
        self.client.get("/api/catalog/products/SKU-1234")

    @tag("cart")
    @task(2)
    def add_to_cart(self):
        self.client.post("/api/cart/items", json={
            "sku": "SKU-1234",
            "quantity": 1
        })

    @tag("checkout")
    @task(1)
    def checkout(self):
        self.client.post("/api/checkout", json={
            "paymentMethod": "card",
            "shippingAddress": {
                "line1": "123 Test St",
                "city": "Testville",
                "zip": "12345"
            }
        })

CI Integration

# .github/workflows/performance.yml
# HARDENED: Locust as a pipeline gate
name: Performance Test
on:
  pull_request:
    branches: [main]
    paths:
      - "src/**"
      - "tests/performance/**"

jobs:
  smoke-test:
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_DB: test
          POSTGRES_PASSWORD: test
    steps:
      - uses: actions/checkout@v4

      - name: Start service
        run: |
          docker compose -f docker-compose.test.yml up -d
          sleep 10

      - name: Install Locust
        run: pip install locust

      - name: Run smoke test
        run: |
          locust -f tests/performance/locustfile.py \
            --headless \
            --users 10 \
            --spawn-rate 5 \
            --run-time 30s \
            --tags browse \
            --csv results/smoke \
            --html results/smoke.html

      - name: Check performance budget
        run: python tests/performance/check_budget.py results/smoke_stats.csv

      - name: Upload results
        uses: actions/upload-artifact@v4
        if: always()
        with:
          name: performance-results
          path: results/

Budget Checker Script

# tests/performance/check_budget.py
# HARDENED: Fail pipeline if performance budget exceeded
import csv
import sys

BUDGETS = {
    "p50": 100,      # ms
    "p99": 500,      # ms
    "error_rate": 0.1  # percent
}

def check_budget(csv_path):
    failures = []
    with open(csv_path) as f:
        reader = csv.DictReader(f)
        for row in reader:
            if row["Name"] == "Aggregated":
                p50 = float(row["50%"])
                p99 = float(row["99%"])
                total = int(row["Request Count"])
                failures_count = int(row["Failure Count"])
                error_rate = (failures_count / total * 100) if total > 0 else 0

                if p50 > BUDGETS["p50"]:
                    failures.append(f"p50 latency {p50}ms > budget {BUDGETS['p50']}ms")
                if p99 > BUDGETS["p99"]:
                    failures.append(f"p99 latency {p99}ms > budget {BUDGETS['p99']}ms")
                if error_rate > BUDGETS["error_rate"]:
                    failures.append(f"Error rate {error_rate:.2f}% > budget {BUDGETS['error_rate']}%")

    if failures:
        print("Performance budget EXCEEDED:")
        for f in failures:
            print(f"  - {f}")
        sys.exit(1)
    else:
        print("Performance budget OK")

if __name__ == "__main__":
    check_budget(sys.argv[1])

The Gate

The budget checker script is the gate. It reads Locust’s CSV output, compares against defined budgets, and exits with a non-zero code if any budget is exceeded. GitHub branch protection requires this check to pass.

The Recovery

Tests are flaky due to CI resource constraints: Use fixed user counts, not ramp-up. CI runners have variable CPU. If the runner is slow, latency is artificially high. Consider dedicated runners for performance tests.

Baseline latency varies between environments: Do not compare staging numbers to production numbers. Compare PR branch to main branch, both running in the same CI environment.

Locust tests take too long for PRs: Run smoke tests (30s, 10 users) on PRs. Run full load tests on merge to main. Run stress tests pre-release.