The Fallacies in Production

Peter Deutsch’s eight fallacies are not a checklist you read once and move on from. They are failure modes you will encounter, repeatedly, for as long as you build software that touches a network. Three of them cause the most damage in production systems, and each one requires a specific engineering response — not just awareness.

Fallacy 1: The Network Is Reliable

Here is the most common way to call an external service in Python:

import requests

response = requests.get("https://inventory-service.internal/api/stock/SKU-4421")
data = response.json()

This code has a silent, catastrophic defect: no timeout. If the inventory service hangs — not crashes, not errors, but simply stops responding — this call blocks forever. Your thread is gone. Your worker is consumed. Do this enough times and your entire application freezes, not because of a bug in your code, but because you trusted the network to respond.

This isn’t hypothetical. In 2017, a major outage at a large SaaS company was traced to exactly this pattern. An internal service started responding slowly under heavy load. Callers had no timeout configured. Each slow response consumed a thread. Within minutes, the calling service exhausted its thread pool and stopped serving all traffic — including traffic that had nothing to do with the slow downstream service.

The fix isn’t complicated. It’s just not the default:

import requests
import time
import random

def call_with_backoff(url, max_retries=3, base_delay=0.5):
    """
    Retry with exponential backoff and jitter.
    
    Attempt 1: immediate
    Attempt 2: wait 0.5s  (± jitter)
    Attempt 3: wait 1.0s  (± jitter)
    Attempt 4: wait 2.0s  (± jitter)
    """
    for attempt in range(max_retries + 1):
        try:
            response = requests.get(url, timeout=(3, 10))
            # timeout=(connect_timeout, read_timeout) in seconds
            response.raise_for_status()
            return response.json()
        except requests.exceptions.ConnectionError:
            # Network-level failure — the target is unreachable
            if attempt == max_retries:
                raise
        except requests.exceptions.Timeout:
            # The server didn't respond in time
            if attempt == max_retries:
                raise
        except requests.exceptions.HTTPError as e:
            if e.response.status_code >= 500:
                # Server error — worth retrying
                if attempt == max_retries:
                    raise
            else:
                # Client error (4xx) — retrying won't help
                raise

        # Exponential backoff with jitter
        delay = base_delay * (2 ** attempt)
        jitter = random.uniform(0, delay * 0.5)
        time.sleep(delay + jitter)

Notice the two-tuple timeout: (3, 10). Three seconds to establish the connection. Ten seconds to receive the response. These are not arbitrary — they’re based on what’s acceptable for this particular call. A timeout that’s too short creates false failures. A timeout that’s too long ties up resources. There is no universal correct value.

The exponential backoff prevents a failure cascade. If the downstream service is struggling under load and you immediately retry at full speed, you’re making the load worse. Each retry waits longer, giving the downstream service breathing room. The jitter prevents a thundering herd — without it, all your retries fire at the same instant after the same delay, recreating the exact spike that caused the problem.

But retries have a limit. If the downstream service is genuinely down, retrying for 30 seconds per request isn’t resilience — it’s slow failure. You need a circuit breaker.

import time
from enum import Enum

class CircuitState(Enum):
    CLOSED = "closed"        # Normal operation — requests flow through
    OPEN = "open"            # Downstream is broken — fail immediately
    HALF_OPEN = "half_open"  # Testing if downstream recovered

class CircuitBreaker:
    def __init__(self, failure_threshold=5, recovery_timeout=30):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.failure_count = 0
        self.state = CircuitState.CLOSED
        self.last_failure_time = None

    def call(self, func, *args, **kwargs):
        if self.state == CircuitState.OPEN:
            if time.time() - self.last_failure_time > self.recovery_timeout:
                self.state = CircuitState.HALF_OPEN
            else:
                raise CircuitOpenError(
                    f"Circuit is open. Last failure: {self.failure_count} "
                    f"consecutive failures. Retry after "
                    f"{self.recovery_timeout}s."
                )

        try:
            result = func(*args, **kwargs)
            # Success — reset the circuit
            self.failure_count = 0
            self.state = CircuitState.CLOSED
            return result
        except Exception as e:
            self.failure_count += 1
            self.last_failure_time = time.time()
            if self.failure_count >= self.failure_threshold:
                self.state = CircuitState.OPEN
            raise

class CircuitOpenError(Exception):
    pass

The circuit breaker implements a state machine. Under normal conditions, the circuit is closed — requests pass through. When failures accumulate past a threshold, the circuit opens — requests fail immediately without even attempting the network call. After a recovery timeout, the circuit goes half-open — it allows one request through to test if the downstream has recovered. If that request succeeds, the circuit closes. If it fails, the circuit opens again.

This is the difference between a 30-minute outage and a 3-minute outage. Without a circuit breaker, every request to your service is blocked waiting for the broken downstream. With one, requests that depend on the broken service fail fast, and requests that don’t are unaffected.

The network is not reliable. Your code must account for that, explicitly, in every call that crosses a process boundary.

Fallacy 2: Latency Is Zero

A local function call takes 10-100 nanoseconds. A network call to a service in the same data center takes 1-5 milliseconds. That’s a 10,000x to 500,000x difference. And yet, when engineers decompose a monolith into microservices, they convert function calls into network calls and expect the same performance characteristics.

Here’s what a distributed trace looks like for a seemingly simple “place order” request:

[Trace ID: abc-123-def]
├── API Gateway                          0ms ──────────────────── 247ms
│   ├── Auth Service (verify token)      3ms ────── 18ms          (15ms)
│   ├── Order Service (create order)    20ms ────────────────── 230ms
│   │   ├── User Service (get profile)  25ms ──── 38ms           (13ms)
│   │   ├── Inventory Service (check)   40ms ──────── 67ms       (27ms)
│   │   │   └── Cache miss → DB query   45ms ──── 63ms          (18ms)
│   │   ├── Pricing Service (calculate) 69ms ────── 91ms         (22ms)
│   │   │   └── Discount Service        73ms ── 85ms            (12ms)
│   │   ├── Payment Service (charge)    93ms ────────── 198ms   (105ms)
│   │   │   └── Stripe API (external)  100ms ──────── 192ms     (92ms)
│   │   └── Notification Service (email)200ms ── 225ms           (25ms)
│   └── Response serialization         232ms ── 245ms            (13ms)
Total wall time: 247ms

Read that trace. The actual business logic — validate the order, check inventory, calculate price, charge payment, send confirmation — takes maybe 40ms of CPU time total. The other 207ms is network overhead: DNS lookups, TCP handshakes, TLS negotiation, serialization, deserialization, load balancer routing, and waiting.

Now look at the Payment Service call. It takes 105ms, and 92ms of that is the Stripe API call — an external network hop across the internet. That single call dominates the entire request latency. And you have zero control over it. Stripe could respond in 50ms or 500ms depending on their load, your network path, and the phase of the moon.

The danger isn’t one slow call. It’s the cumulative effect. Each service in that chain added 12-27ms of network overhead. Six internal hops created 114ms of pure network tax. Convert that monolith into 20 services instead of 6, and you’re looking at 400ms+ before anyone’s code does anything useful.

This is why tail latency matters. Your median latency might be 250ms. Your p99 might be 2 seconds. And somewhere in your system, once every few hundred requests, five of those network hops each hit their worst case simultaneously, and a user waits 8 seconds for a page load. You can’t reproduce it locally. It doesn’t appear in your average metrics. It shows up in your support tickets.

The mitigation is architectural. Parallelize calls that don’t depend on each other — the User Service and Inventory Service calls above could run concurrently instead of sequentially, saving 13ms. Batch calls where possible. Cache aggressively at service boundaries. And most critically: question whether each network hop is necessary. Every service boundary you add is latency you can never remove.

Fallacy 3: The Network Is Homogeneous

Your development environment runs on localhost. Every service resolves instantly. Every call takes sub-millisecond. Every request succeeds.

Production is a different animal entirely. Your services run across three availability zones. Zone A to Zone A is 0.5ms. Zone A to Zone B is 1.2ms. Zone A to Zone C is 2.1ms. Your load balancer distributes traffic evenly across zones, so one-third of your cross-service calls take 4x longer than another third. Your p99 latency is driven entirely by cross-zone calls, and your average hides it.

Then there’s DNS. Your service discovery uses DNS with a 30-second TTL. But the Java HTTP client caches DNS lookups for the lifetime of the JVM by default. You deploy a new version of the Inventory Service, the load balancer updates to point to new instances, and half your fleet is still sending traffic to the old instances for 10 minutes. Some requests succeed (new instances), some fail (old instances being drained), and the pattern is completely non-deterministic from the caller’s perspective.

Cloud regions make this worse. A service in us-east-1 calling a service in eu-west-1 adds 80-120ms of latency per call. If that call is in your hot path, you’ve just added 80ms to every request for every user. If you didn’t realize the service was in another region — maybe it was deployed there by another team, maybe the service discovery returned a cross-region endpoint during a failover — the latency spike appears without any code change on your side.

The asymmetry is the killer. In a homogeneous network, you can reason about performance uniformly. In a real network, the same logical call — Service A calls Service B — has different performance characteristics depending on which physical instance of A calls which physical instance of B, which availability zone they’re in, what the network conditions are at that moment, and whether any intermediate infrastructure is degraded.

You cannot test for this in staging. Staging is typically a single availability zone with minimal traffic. The heterogeneity only manifests at production scale, across production infrastructure, under production load patterns. By then, your architecture decisions are cast in concrete.

The response is defensive engineering: measure everything per availability zone, per region, per network path. Set latency budgets per service call and alert when they’re breached. Design for the worst-case network path, not the best case. And never, ever assume that because a call works from your development machine, it will work the same way from every production instance.

The Pattern

All three fallacies share the same underlying mechanism: the abstraction works perfectly in the common case and fails catastrophically in the edge case. The HTTP client abstracts away reliability. The service mesh abstracts away latency. Cloud infrastructure abstracts away network heterogeneity. And engineers, trained to trust their tools, build systems that only work when the abstractions hold.

The fallacies are not things you learn once and then know. They are things you learn, forget because the abstractions are so convincing, and then relearn the hard way in an incident postmortem at 3 AM. The code in this section — the retries, the circuit breaker, the timeout tuples — exists because the network will always be unreliable, will never have zero latency, and will never be homogeneous. These are not bugs to be fixed. They are physics to be respected.