Feature Flags for Degraded Mode Control
Feature Flags for Degraded Mode Control
The degraded mode controller (Chapter 17) derives the operating mode from circuit breaker states and error budget. This is automatic: the system detects failure and adjusts. But some scenarios require manual override.
A planned maintenance window for the fraud detection service. The circuit breaker has not opened because the service is still running, but it will be shut down in 10 minutes. The operations team wants to pre-emptively enter DEGRADED_FRAUD mode to avoid the transition turbulence (20 failed calls before the circuit breaker opens, brief error spike, customer impact).
A fraud detection deployment that causes a subtle scoring regression. The circuit breaker does not open (no errors, no timeouts), but the fraud scores are incorrect. The operations team wants to force DEGRADED_FRAUD mode (fallback scores) while the deployment is rolled back.
Feature Flag-Based Mode Override
// PRODUCTION - Mode controller with feature flag overrides
@Component
public class DegradedModeController {
private final CircuitBreakerRegistry cbRegistry;
private final ErrorBudgetCalculator budgetCalculator;
private final FeatureFlagClient featureFlags;
private final MeterRegistry meterRegistry;
public OperatingMode currentMode() {
// Check for manual override first
Optional<OperatingMode> override = getManualOverride();
if (override.isPresent()) {
meterRegistry.counter("payment.mode.source",
"source", "manual_override").increment();
return override.get();
}
// Automatic mode detection
OperatingMode autoMode = detectAutomaticMode();
meterRegistry.counter("payment.mode.source",
"source", "automatic").increment();
return autoMode;
}
private Optional<OperatingMode> getManualOverride() {
String overrideValue = featureFlags.getString(
"payment.operating.mode.override");
if (overrideValue == null || overrideValue.equals("AUTO")) {
return Optional.empty();
}
try {
return Optional.of(OperatingMode.valueOf(overrideValue));
} catch (IllegalArgumentException e) {
log.warn("Invalid mode override value: {}", overrideValue);
return Optional.empty();
}
}
}
The feature flag payment.operating.mode.override accepts values: AUTO (use automatic detection), NORMAL, DEGRADED_FRAUD, DEGRADED_BALANCE, EMERGENCY. Setting it to DEGRADED_FRAUD forces that mode regardless of circuit breaker state.
Gradual Mode Transition
A sudden switch from NORMAL to DEGRADED_FRAUD affects all traffic instantly. A gradual transition lets the operations team verify the degraded mode works before committing all traffic:
// PRODUCTION - Percentage-based mode transition
public OperatingMode currentMode(String requestId) {
Optional<OperatingMode> override = getManualOverride();
if (override.isEmpty()) {
return detectAutomaticMode();
}
OperatingMode targetMode = override.get();
int percentage = featureFlags.getInt(
"payment.mode.rollout.percentage", 100);
// Use consistent hashing on requestId for sticky assignment
int bucket = Math.abs(requestId.hashCode() % 100);
if (bucket < percentage) {
return targetMode;
}
return detectAutomaticMode();
}
Setting payment.mode.rollout.percentage to 10 routes 10% of traffic through the degraded mode. The operations team monitors error rates, latency, and customer impact for the 10% cohort. If metrics are acceptable, they increase to 50%, then 100%.
Consistent hashing on requestId ensures the same request always gets the same mode assignment. Without consistency, a retry of the same request might get a different mode, leading to inconsistent behavior.
Audit Trail
Every mode change must be logged for compliance and post-incident review:
// PRODUCTION - Mode transition audit logging
@Component
public class ModeTransitionAuditor {
private volatile OperatingMode previousMode = OperatingMode.NORMAL;
private final MeterRegistry meterRegistry;
public void recordModeCheck(OperatingMode currentMode, String source) {
if (currentMode != previousMode) {
log.info("Operating mode transition: {} -> {} (source: {})",
previousMode, currentMode, source);
meterRegistry.counter("payment.mode.transition",
"from", previousMode.name(),
"to", currentMode.name(),
"source", source).increment();
// For financial services: write to audit log
auditLog.record(AuditEvent.builder()
.type("OPERATING_MODE_CHANGE")
.from(previousMode.name())
.to(currentMode.name())
.source(source)
.timestamp(Instant.now())
.build());
previousMode = currentMode;
}
}
}
The audit trail answers the compliance question: “Why were payments processed without fraud checking between 14:00 and 14:45?” The answer: “The system automatically entered DEGRADED_FRAUD mode at 14:02 due to fraud detection circuit breaker opening (source: automatic). It returned to NORMAL mode at 14:43 when the circuit breaker closed (source: automatic).” Or: “An operator manually set DEGRADED_FRAUD mode at 13:55 (source: manual_override) in preparation for planned fraud service maintenance.”
This documentation is not optional for a payment processing system. Regulators expect that every deviation from standard processing is recorded, explained, and reviewed.