Threshold Tuning and the Ratchet Pattern
Threshold Tuning and the Ratchet Pattern
The Failure
The team enabled Trivy with exit-code: 1 on a codebase with 200 existing vulnerabilities. Every PR failed. Developers could not merge bug fixes. The security team said “fix the vulnerabilities first.” The development team said “we need to ship features.” After a week of deadlock, someone removed the exit-code and the scanner went back to advisory mode. The vulnerabilities stayed.
The ratchet pattern resolves this: accept the current baseline, but never allow new vulnerabilities. The count can only go down, never up.
The Mechanism
The Ratchet
- Run a full scan and record the baseline count
- On each PR, run the scan and compare to baseline
- If the count is lower than or equal to baseline → pass
- If the count is higher than baseline → fail
- When a PR fixes vulnerabilities and the count drops, update the baseline to the new lower count
The baseline file is committed to the repository. It is the ratchet: it can tighten (count goes down) but never loosen (count goes up).
Ratchet vs Fixed Threshold
| Approach | Existing Code | New Code | Migration Cost |
|---|---|---|---|
| Fixed threshold (0) | Blocks all PRs | Blocks correctly | Must fix all first |
| Ratchet (baseline) | Allows existing | Blocks new | Zero migration cost |
| Advisory only | No blocking | No blocking | Zero, but no protection |
The Implementation
Baseline File
// .security-baseline.json
// HARDENED: Ratchet baseline - count can only decrease
{
"trivy": {
"critical": 0,
"high": 12,
"medium": 45,
"lastUpdated": "2025-01-15",
"updatedBy": "security-scan-bot"
},
"codeql": {
"errors": 3,
"warnings": 28,
"lastUpdated": "2025-01-15"
}
}
Ratchet Script
#!/bin/bash
# scripts/security-ratchet.sh
# HARDENED: Fail if vulnerability count increases from baseline
set -euo pipefail
BASELINE_FILE=".security-baseline.json"
SCAN_RESULTS="trivy-results.json"
# Run Trivy and get counts
trivy fs --format json --output "$SCAN_RESULTS" --severity CRITICAL,HIGH .
CURRENT_CRITICAL=$(jq '[.Results[]?.Vulnerabilities[]? | select(.Severity == "CRITICAL")] | length' "$SCAN_RESULTS")
CURRENT_HIGH=$(jq '[.Results[]?.Vulnerabilities[]? | select(.Severity == "HIGH")] | length' "$SCAN_RESULTS")
BASELINE_CRITICAL=$(jq '.trivy.critical' "$BASELINE_FILE")
BASELINE_HIGH=$(jq '.trivy.high' "$BASELINE_FILE")
echo "Critical: $CURRENT_CRITICAL (baseline: $BASELINE_CRITICAL)"
echo "High: $CURRENT_HIGH (baseline: $BASELINE_HIGH)"
FAILED=0
if [[ "$CURRENT_CRITICAL" -gt "$BASELINE_CRITICAL" ]]; then
echo "::error::Critical vulnerabilities increased: $CURRENT_CRITICAL > $BASELINE_CRITICAL"
FAILED=1
fi
if [[ "$CURRENT_HIGH" -gt "$BASELINE_HIGH" ]]; then
echo "::error::High vulnerabilities increased: $CURRENT_HIGH > $BASELINE_HIGH"
FAILED=1
fi
# Auto-tighten: if count decreased, update baseline
if [[ "$CURRENT_CRITICAL" -lt "$BASELINE_CRITICAL" || "$CURRENT_HIGH" -lt "$BASELINE_HIGH" ]]; then
echo "Vulnerabilities decreased. Updating baseline."
jq --argjson c "$CURRENT_CRITICAL" --argjson h "$CURRENT_HIGH" \
'.trivy.critical = $c | .trivy.high = $h | .trivy.lastUpdated = (now | todate)' \
"$BASELINE_FILE" > tmp.json && mv tmp.json "$BASELINE_FILE"
# The updated baseline is committed by the CI bot
echo "BASELINE_UPDATED=true" >> "$GITHUB_ENV"
fi
exit $FAILED
CI Integration
# .github/workflows/security.yml
- name: Security ratchet check
run: bash scripts/security-ratchet.sh
- name: Commit updated baseline
if: env.BASELINE_UPDATED == 'true'
run: |
git config user.name "security-bot"
git config user.email "[email protected]"
git add .security-baseline.json
git commit -m "chore: tighten security baseline"
git push
Exception Workflow
When a vulnerability cannot be fixed immediately (no patch available, upstream issue):
// .security-exceptions.json
// HARDENED: Tracked exceptions with mandatory expiration
{
"exceptions": [
{
"cve": "CVE-2024-99999",
"severity": "HIGH",
"reason": "No upstream fix available. Mitigated by WAF rule.",
"trackingIssue": "https://github.com/acme/checkout-service/issues/456",
"addedBy": "[email protected]",
"addedDate": "2025-01-15",
"expiresDate": "2025-04-15",
"reviewed": true
}
]
}
The ratchet script accounts for exceptions when comparing counts. Expired exceptions are automatically removed and the vulnerability counts again.
The Gate
The ratchet is the gate. It combines two properties:
- Never worse: New vulnerabilities are always blocked
- Eventually better: Every fix tightens the baseline permanently
Over time, the baseline converges toward zero without ever blocking existing work.
The Recovery
Baseline drift between branches: The baseline file can conflict when multiple branches fix different vulnerabilities. Use jq to take the minimum of each count during merge conflict resolution.
Auto-tighten creates noisy commits: Move baseline updates to a scheduled job instead of per-PR. Run nightly, compare current scan to baseline, and tighten.
New service starts with high baseline: Set a policy: new services must start with a baseline of zero. The ratchet only applies to legacy services.