DiscoveryClient and the Service Instance Cache

The SaaS backend runs five services: tenant-service, order-service, notification-service, billing-service, and api-gateway. Each service registers itself with a discovery server on startup. When order-service needs to call notification-service, it asks the local DiscoveryClient for available instances. But DiscoveryClient.getInstances() does not make a network call every time. There are layers of caching between you and the truth.

The DiscoveryClient Contract

DiscoveryClient is a read-only view of the service registry:

public interface DiscoveryClient extends Ordered {
    String description();
    List<ServiceInstance> getInstances(String serviceId);
    List<String> getServices();
}

Every call to getInstances("notification-service") returns a List<ServiceInstance>. Each ServiceInstance carries the data you need to construct an HTTP request:

public interface ServiceInstance {
    String getServiceId();
    String getHost();
    int getPort();
    boolean isSecure();
    URI getUri();
    Map<String, String> getMetadata();
}

In our SaaS backend using Eureka, EurekaDiscoveryClient wraps the Eureka client library’s InstanceInfo objects in EurekaServiceInstance adapters. The Eureka client maintains its own local cache, updated by periodic delta fetches from the Eureka server (every 30 seconds by default, controlled by eureka.client.registryFetchIntervalSeconds).

This is the first cache layer. The Eureka client library maintains a full local copy of the registry. When you call getInstances(), you are reading from this local copy, not from the Eureka server.

CachingServiceInstanceListSupplier

Spring Cloud LoadBalancer adds a second cache layer on top of the DiscoveryClient. The load balancer does not call DiscoveryClient.getInstances() directly for each request. It uses a ServiceInstanceListSupplier chain, and CachingServiceInstanceListSupplier is the default cache in that chain.

The supplier chain is constructed by ServiceInstanceListSupplierBuilder:

ServiceInstanceListSupplier.builder()
    .withDiscoveryClient()    // DiscoveryClientServiceInstanceListSupplier
    .withCaching()            // CachingServiceInstanceListSupplier
    .build(context);

CachingServiceInstanceListSupplier subscribes to the delegate supplier and caches the emitted List<ServiceInstance>. It uses a LoadBalancerCacheManager that, by default, creates a Caffeine cache (if Caffeine is on the classpath) or a ConcurrentMapCache.

The cache TTL is configured with:

spring:
  cloud:
    loadbalancer:
      cache:
        ttl: 35s # Default: 35 seconds
        capacity: 256 # Max entries per cache
        caffeine:
          spec: "initialCapacity=256,expireAfterWrite=35s"

When a request comes in and the cache entry is still valid, the cached list is used. When the entry expires, CachingServiceInstanceListSupplier calls its delegate, which calls DiscoveryClient.getInstances(), which reads from Eureka’s local registry copy.

The Two-Layer Stale Window

Now trace the timeline of a failure:

T=0: notification-service instance B (one of three instances) crashes. It stops sending heartbeats to the Eureka server.

T=0 to T=30: The Eureka server expects heartbeats every 30 seconds (eureka.instance.leaseRenewalIntervalInSeconds). It has not noticed yet.

T=30: The Eureka server detects the missed heartbeat. But it does not evict immediately. It waits for the lease to expire, which is 90 seconds by default (eureka.instance.leaseExpirationDurationInSeconds). However, if self-preservation mode is triggered (too many instances missing heartbeats), even expired leases are not evicted.

T=90: Lease expires. The Eureka server marks instance B as DOWN and removes it from the registry on the next eviction cycle (runs every 60 seconds by default).

T=90 to T=150: order-service’s Eureka client has not fetched a delta yet. Its local registry still contains instance B. Delta fetch interval: 30 seconds.

T=150: order-service’s Eureka client fetches the delta and removes instance B from its local cache.

T=150 to T=185: order-service’s CachingServiceInstanceListSupplier has a cached entry that still includes instance B. Cache TTL: 35 seconds.

T=185: Cache entry expires. Next load balancer request triggers a fresh fetch from the Eureka client, which no longer includes instance B.

Worst case: 185 seconds of routing requests to a dead instance. That is over three minutes.

Health Check Integration

To reduce the stale window, Spring Cloud LoadBalancer offers HealthCheckServiceInstanceListSupplier. Instead of trusting the registry blindly, it actively pings instances:

spring:
  cloud:
    loadbalancer:
      health-check:
        path: /actuator/health
        interval: 15s
        initial-delay: 5s

This creates a different supplier chain:

ServiceInstanceListSupplier.builder()
    .withDiscoveryClient()
    .withHealthChecks()
    .withCaching()
    .build(context);

HealthCheckServiceInstanceListSupplier periodically sends HTTP GET requests to each instance’s health endpoint. If an instance fails the health check, it is removed from the list before it reaches the cache. This can detect failures within 15 seconds instead of 185.

The tradeoff: your load balancer client now makes periodic HTTP calls to every instance of every service it calls. For a service with 50 instances, that is 50 health check requests every 15 seconds, per client. In a system with 20 client instances, that is 1000 health check requests every 15 seconds hitting notification-service. This is manageable for most systems but becomes a concern at scale.

// BROKEN: Default configuration with no health checks, no retry.
// Dead instances serve as request black holes for up to 185 seconds.
@Configuration
public class LoadBalancerConfig {
    // Using defaults: 35s cache TTL, no health checks.
    // Requests to dead instances return connection timeouts.
}

// CORRECT: Health checks with retry.
// application.yml:
// spring:
//   cloud:
//     loadbalancer:
//       health-check:
//         path: /actuator/health
//         interval: 15s
//       cache:
//         ttl: 10s
//       retry:
//         enabled: true
//         max-retries-on-same-service-instance: 0
//         max-retries-on-next-service-instance: 2
//         retry-on-all-exceptions: true

Instance Status and Metadata

ServiceInstance.getMetadata() returns a Map<String, String> populated from the registry. In Eureka, this includes custom metadata set on the instance:

eureka:
  instance:
    metadata-map:
      zone: us-east-1a
      version: 2.3.1
      tenant-affinity: premium

The SaaS backend uses metadata for tenant-aware routing. Premium tenants are routed to dedicated instances with tenant-affinity: premium. The custom load balancer reads this metadata from ServiceInstance.getMetadata() and filters accordingly.

Instance status is tracked separately. EurekaServiceInstance exposes the instance status (UP, DOWN, STARTING, OUT_OF_SERVICE). The Eureka client filters out non-UP instances by default, but if you use HealthCheckServiceInstanceListSupplier, it applies its own filtering based on the health check response, regardless of the registry status.

Programmatic Instance Filtering

For the SaaS backend’s multi-tenant routing, you can build a custom ServiceInstanceListSupplier that filters by tenant:

public class TenantAwareServiceInstanceListSupplier
        extends DelegatingServiceInstanceListSupplier {

    private final String tenantId;

    public TenantAwareServiceInstanceListSupplier(
            ServiceInstanceListSupplier delegate,
            String tenantId) {
        super(delegate);
        this.tenantId = tenantId;
    }

    @Override
    public Flux<List<ServiceInstance>> get() {
        return delegate.get().map(instances ->
            instances.stream()
                .filter(instance -> {
                    String affinity = instance.getMetadata()
                        .get("tenant-affinity");
                    return affinity == null
                        || affinity.equals(tenantId)
                        || affinity.equals("shared");
                })
                .toList()
        );
    }
}

This filter sits in the supplier chain before caching. Instances without a tenant-affinity metadata entry or with shared affinity are available to all tenants. Instances with a specific tenant affinity are only returned for that tenant.

The cache complicates this. If you cache per-service (the default), all tenants see the same cached list. You need per-tenant caching or move the tenant filtering after the cache. The simpler approach: skip tenant filtering in the supplier chain and apply it in a custom ReactorLoadBalancer.choose() implementation that filters at selection time.

Configuration Reference

Key properties for tuning the stale window in the SaaS backend:

eureka:
  client:
    registry-fetch-interval-seconds: 15 # Default: 30
  instance:
    lease-renewal-interval-in-seconds: 10 # Default: 30
    lease-expiration-duration-in-seconds: 30 # Default: 90

spring:
  cloud:
    loadbalancer:
      cache:
        ttl: 10s # Default: 35s
      health-check:
        path: /actuator/health
        interval: 10s

With these settings, the worst-case stale window drops from 185 seconds to approximately 40 seconds. Faster heartbeats increase network traffic. Shorter cache TTLs increase discovery client calls. Shorter health check intervals increase health check traffic. There is no free lunch. You trade network overhead for freshness.

The important principle: treat the instance list as a hint, not a guarantee. Every HTTP call to a discovered instance can fail. Your retry logic is not optional. It is the safety net for the inherent staleness of distributed service discovery.