Service Discovery and Load Balancing

When the SaaS backend’s order-service makes an HTTP call to notification-service, it does not use a hardcoded URL. It uses a logical service name. Something resolves that name to an actual host and port. Something picks one instance from a list. And something rewrites the URL before the HTTP call goes out. That something is three things working together: DiscoveryClient, ReactorLoadBalancer, and LoadBalancerInterceptor.

LoadBalanced RestTemplate flow showing logical URL resolution through ReactiveLoadBalancer to physical service instance

The DiscoveryClient Abstraction

DiscoveryClient is a Spring Cloud interface with two methods that matter:

public interface DiscoveryClient {
    List<ServiceInstance> getInstances(String serviceId);
    List<String> getServices();
}

ServiceInstance represents a single running instance of a service. It exposes host, port, URI, metadata, and whether HTTPS is used. Every service discovery backend implements DiscoveryClient. Eureka gives you EurekaDiscoveryClient. Consul gives you ConsulDiscoveryClient. Kubernetes gives you a KubernetesDiscoveryClient backed by the Kubernetes API. The abstraction is clean, but the implementations differ in critical ways.

Eureka uses a pull model. The client fetches the full registry on startup, then fetches deltas every 30 seconds by default. Consul supports both pull and push (blocking queries that long-poll until a change occurs). Kubernetes watches the endpoints API for changes and gets notified in near real-time.

These differences matter because they determine your stale window: the time between a service instance going down and your application learning about it.

ServiceInstance Cache and Refresh

The raw DiscoveryClient.getInstances() call can be expensive. Eureka’s client maintains a local copy of the registry. But Spring Cloud’s load balancer adds another layer: CachingServiceInstanceListSupplier.

The caching chain works as follows:

@Bean
public ServiceInstanceListSupplier discoveryClientServiceInstanceListSupplier(
        ConfigurableApplicationContext context) {
    return ServiceInstanceListSupplier.builder()
        .withDiscoveryClient()
        .withCaching()
        .build(context);
}

CachingServiceInstanceListSupplier wraps the discovery-backed supplier and caches its results for a configurable duration (default: 35 seconds, controlled by spring.cloud.loadbalancer.cache.ttl). The cache is backed by a LoadBalancerCacheManager, which defaults to Caffeine if available, falling back to a simple concurrent map.

The refresh cycle creates a stale window. Consider the timeline:

notification-service instance B crashes at T=0.
Eureka server detects the missed heartbeat after up to 30 seconds (T=30).
Eureka server evicts the instance.
order-service’s Eureka client fetches the delta on its next poll (up to 30 seconds later, T=60).
The local CachingServiceInstanceListSupplier cache may not have expired yet (up to 35 more seconds, T=95).

In the worst case, requests continue routing to a dead instance for up to 95 seconds. This is not a bug. This is the math of eventually consistent service discovery.

ReactorLoadBalancer: Instance Selection

Once you have a list of ServiceInstance objects, ReactorLoadBalancer picks one:

public interface ReactorServiceInstanceLoadBalancer {
    Mono<Response<ServiceInstance>> choose(Request request);
}

The default implementation is RoundRobinLoadBalancer. It uses an AtomicInteger position counter and selects instances by index modulo the list size:

int pos = this.position.incrementAndGet() & Integer.MAX_VALUE;
ServiceInstance instance = instances.get(pos % instances.size());

The & Integer.MAX_VALUE masks the sign bit, preventing negative indices after integer overflow. Simple, effective, and stateless. It does not track response times, error rates, or instance health. It picks the next one in the list.

Spring Cloud also provides RandomLoadBalancer. You can plug in your own by implementing ReactorServiceInstanceLoadBalancer and registering it as a bean in the load balancer client configuration:

@LoadBalancerClient(
    name = "order-service",
    configuration = CustomLBConfig.class
)
public class OrderServiceConfig {}

public class CustomLBConfig {
    @Bean
    public ReactorLoadBalancer<ServiceInstance> customLoadBalancer(
            Environment environment,
            LoadBalancerClientFactory clientFactory) {
        String name = environment.getProperty(
            LoadBalancerClientFactory.PROPERTY_NAME);
        return new WeightedResponseTimeLoadBalancer(
            clientFactory.getLazyProvider(name,
                ServiceInstanceListSupplier.class),
            name);
    }
}

Note: CustomLBConfig must not be a @Configuration class in the main component scan. If it is, it applies globally to all load balancer clients, not just order-service. This is the same component scan trap that affects Feign and Ribbon configurations.

The @LoadBalanced Qualifier Trap

Here is the mechanism most developers get wrong. @LoadBalanced is not a feature annotation. It is a @Qualifier.

Look at the source:

@Target({ ElementType.FIELD, ElementType.PARAMETER, ElementType.METHOD })
@Retention(RetentionPolicy.RUNTIME)
@Qualifier
public @interface LoadBalanced {
}

It extends @Qualifier. That is the entire definition. The magic happens in LoadBalancerAutoConfiguration. This auto-configuration class collects all RestTemplate beans that are annotated with @LoadBalanced:

@LoadBalanced
@Autowired(required = false)
private List<RestTemplate> restTemplates = Collections.emptyList();

Spring’s injection mechanism uses @LoadBalanced as a qualifier. Only RestTemplate beans that were created with @LoadBalanced on their @Bean method are injected into this list. Then the auto-configuration adds LoadBalancerInterceptor to each:

@Bean
public SmartInitializingSingleton loadBalancedRestTemplateInitializerDeprecated(
        ObjectProvider<List<RestTemplateCustomizer>> customizers) {
    return () -> customizers.ifAvailable(list -> {
        for (RestTemplateCustomizer customizer : list) {
            for (RestTemplate restTemplate : this.restTemplates) {
                customizer.customize(restTemplate);
            }
        }
    });
}

LoadBalancerInterceptor intercepts every request made through that RestTemplate. When you call restTemplate.getForObject("http://order-service/api/orders", ...), the interceptor:

Extracts the hostname (order-service) as the service ID.
Calls ReactorLoadBalancer.choose("order-service").
Gets a ServiceInstance (e.g., 192.168.1.10:8080).
Reconstructs the URL: http://192.168.1.10:8080/api/orders.
Executes the actual HTTP request.

If you create a RestTemplate without @LoadBalanced, no interceptor is added. If you call http://order-service/api/orders through it, you get a UnknownHostException because order-service is not a resolvable DNS name.

The Two-Bean Pattern

In the SaaS backend, the order-service calls both internal services (via discovery) and external payment APIs (via fixed URLs). You need two separate RestTemplate beans:

// BROKEN: Using @LoadBalanced for all HTTP calls
@Bean
@LoadBalanced
public RestTemplate restTemplate() {
    return new RestTemplate();
}

// This call to Stripe's API will fail because the interceptor
// tries to resolve "api.stripe.com" as a service name in the registry
// restTemplate.postForObject("https://api.stripe.com/v1/charges", ...)

// CORRECT: Separate beans for internal and external calls
@Bean
@LoadBalanced
public RestTemplate loadBalancedRestTemplate() {
    return new RestTemplate();
}

@Bean
public RestTemplate externalRestTemplate() {
    return new RestTemplate();
}

Inject them by qualifier:

@Service
public class PaymentService {

    private final RestTemplate serviceClient;
    private final RestTemplate externalClient;

    public PaymentService(
            @LoadBalanced RestTemplate serviceClient,
            RestTemplate externalClient) {
        this.serviceClient = serviceClient;
        this.externalClient = externalClient;
    }
}

WebClient Integration

For reactive applications, the equivalent mechanism uses ReactorLoadBalancerExchangeFilterFunction:

@Bean
@LoadBalanced
public WebClient.Builder loadBalancedWebClientBuilder() {
    return WebClient.builder();
}

The @LoadBalanced qualifier works identically on WebClient.Builder. LoadBalancerBeanPostProcessorAutoConfiguration finds qualified builders and adds ReactorLoadBalancerExchangeFilterFunction as an exchange filter. The filter performs the same service name resolution and URL rewriting, but within the reactive ExchangeFilterFunction chain.

The Stale Registry Window

The stale window is the fundamental tradeoff of client-side service discovery. You cannot eliminate it entirely, but you can minimize its impact.

Strategies:

Reduce cache TTL: Set spring.cloud.loadbalancer.cache.ttl=10s. Increases discovery client calls but reduces staleness.
Health-check-based filtering: Add spring.cloud.loadbalancer.health-check.path=/actuator/health and spring.cloud.loadbalancer.health-check.interval=15s. HealthCheckServiceInstanceListSupplier actively pings instances and removes unhealthy ones.
Retry on failure: Use spring.cloud.loadbalancer.retry.enabled=true to retry on a different instance when the first one fails. This does not prevent the stale window, but it prevents the stale window from causing user-visible errors.
Zone-aware selection: Configure spring.cloud.loadbalancer.zone to prefer instances in the same availability zone, reducing cross-zone latency and isolating blast radius.

The common mistake is treating service discovery as a solved problem. It is not. It is an eventually consistent system with a guaranteed stale window. Design for it.

// BROKEN: No retry, no health checks. Dead instance = failed request.
@Bean
@LoadBalanced
public RestTemplate restTemplate() {
    return new RestTemplate();
}

// CORRECT: Retry with load balancer integration
// application.yml:
// spring.cloud.loadbalancer.retry.enabled: true
// spring.cloud.loadbalancer.retry.max-retries-on-same-service-instance: 0
// spring.cloud.loadbalancer.retry.max-retries-on-next-service-instance: 2
// spring.cloud.loadbalancer.health-check.path: /actuator/health
// spring.cloud.loadbalancer.health-check.interval: 15s

Retrying on the same instance is pointless if it is down. Set max-retries-on-same-service-instance to 0 and max-retries-on-next-service-instance to a small number. The load balancer picks a different instance for each retry.

The key insight: service discovery gives you a list of instances that were alive at some point in the recent past. Load balancing picks one. Neither guarantees the instance is alive right now. Your HTTP client must be prepared for connection failures. The stale window is not a bug to fix. It is a constraint to design around.