Performance Regression Testing with Locust
Performance Regression Testing with Locust
Performance bugs are invisible to unit tests. A developer adds an N+1 query. All tests pass. The code merges. Under load, p99 latency jumps from 200ms to 4 seconds. The next Black Friday, the checkout service falls over.
Locust runs load tests as Python code. The scenarios are version-controlled, reviewed in PRs, and executed in CI. When latency exceeds the budget, the pipeline fails.
The Failure
The team ran manual load tests before each release. The QA engineer would start Locust locally, ramp to 500 users, watch the dashboard for 10 minutes, and declare “looks fine.” No recorded baselines. No automated comparison. When p99 latency increased by 50ms over three releases, nobody noticed because nobody compared to a baseline. By the sixth release, the service was 300ms slower than it had been six months earlier.
Automated performance tests with recorded baselines and pass/fail criteria catch regressions at merge time.
The Mechanism
Performance Budget
A performance budget defines the maximum acceptable values for key metrics:
| Metric | Budget | Measurement |
|---|---|---|
| p50 latency | 100ms | 50th percentile response time |
| p99 latency | 500ms | 99th percentile response time |
| Error rate | 0.1% | Percentage of non-2xx responses |
| Throughput | 200 RPS | Minimum requests per second at target load |
Test Levels
| Level | When | Duration | Users | Purpose |
|---|---|---|---|---|
| Smoke | Every PR | 30s | 10 | Catch obvious regressions |
| Load | Pre-merge to main | 5min | 100 | Validate performance budgets |
| Stress | Pre-release | 30min | 500+ | Find breaking points |
The Implementation
Locust Scenarios for E-Commerce
# tests/performance/locustfile.py
# HARDENED: E-commerce load test scenarios
from locust import HttpUser, task, between, tag
class ShopperUser(HttpUser):
wait_time = between(1, 3)
host = "http://localhost:8080"
def on_start(self):
# Login once per user
response = self.client.post("/api/auth/login", json={
"email": "[email protected]",
"password": "loadtest-password"
})
self.token = response.json().get("token", "")
self.client.headers.update({
"Authorization": f"Bearer {self.token}"
})
@tag("browse")
@task(5)
def browse_catalog(self):
self.client.get("/api/catalog/products?page=1&limit=20")
@tag("browse")
@task(3)
def view_product(self):
self.client.get("/api/catalog/products/SKU-1234")
@tag("cart")
@task(2)
def add_to_cart(self):
self.client.post("/api/cart/items", json={
"sku": "SKU-1234",
"quantity": 1
})
@tag("checkout")
@task(1)
def checkout(self):
self.client.post("/api/checkout", json={
"paymentMethod": "card",
"shippingAddress": {
"line1": "123 Test St",
"city": "Testville",
"zip": "12345"
}
})
CI Integration
# .github/workflows/performance.yml
# HARDENED: Locust as a pipeline gate
name: Performance Test
on:
pull_request:
branches: [main]
paths:
- "src/**"
- "tests/performance/**"
jobs:
smoke-test:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:16
env:
POSTGRES_DB: test
POSTGRES_PASSWORD: test
steps:
- uses: actions/checkout@v4
- name: Start service
run: |
docker compose -f docker-compose.test.yml up -d
sleep 10
- name: Install Locust
run: pip install locust
- name: Run smoke test
run: |
locust -f tests/performance/locustfile.py \
--headless \
--users 10 \
--spawn-rate 5 \
--run-time 30s \
--tags browse \
--csv results/smoke \
--html results/smoke.html
- name: Check performance budget
run: python tests/performance/check_budget.py results/smoke_stats.csv
- name: Upload results
uses: actions/upload-artifact@v4
if: always()
with:
name: performance-results
path: results/
Budget Checker Script
# tests/performance/check_budget.py
# HARDENED: Fail pipeline if performance budget exceeded
import csv
import sys
BUDGETS = {
"p50": 100, # ms
"p99": 500, # ms
"error_rate": 0.1 # percent
}
def check_budget(csv_path):
failures = []
with open(csv_path) as f:
reader = csv.DictReader(f)
for row in reader:
if row["Name"] == "Aggregated":
p50 = float(row["50%"])
p99 = float(row["99%"])
total = int(row["Request Count"])
failures_count = int(row["Failure Count"])
error_rate = (failures_count / total * 100) if total > 0 else 0
if p50 > BUDGETS["p50"]:
failures.append(f"p50 latency {p50}ms > budget {BUDGETS['p50']}ms")
if p99 > BUDGETS["p99"]:
failures.append(f"p99 latency {p99}ms > budget {BUDGETS['p99']}ms")
if error_rate > BUDGETS["error_rate"]:
failures.append(f"Error rate {error_rate:.2f}% > budget {BUDGETS['error_rate']}%")
if failures:
print("Performance budget EXCEEDED:")
for f in failures:
print(f" - {f}")
sys.exit(1)
else:
print("Performance budget OK")
if __name__ == "__main__":
check_budget(sys.argv[1])
The Gate
The budget checker script is the gate. It reads Locust’s CSV output, compares against defined budgets, and exits with a non-zero code if any budget is exceeded. GitHub branch protection requires this check to pass.
The Recovery
Tests are flaky due to CI resource constraints: Use fixed user counts, not ramp-up. CI runners have variable CPU. If the runner is slow, latency is artificially high. Consider dedicated runners for performance tests.
Baseline latency varies between environments: Do not compare staging numbers to production numbers. Compare PR branch to main branch, both running in the same CI environment.
Locust tests take too long for PRs: Run smoke tests (30s, 10 users) on PRs. Run full load tests on merge to main. Run stress tests pre-release.