Skip to main content
search at depth

ISM Policy Design and Rollover Strategies

4 min read Chapter 35 of 60

ISM Policy Design and Rollover Strategies

The Symptom

The team deploys a single ISM policy for all tenants. Tenant A writes 50,000 documents per day. Tenant B writes 200 documents per day. After 30 days, the rollover triggers for both. Tenant A’s index has 1.5 million documents in a 75GB shard—well above the target. Tenant B’s index has 6,000 documents in a 300MB shard—far below the minimum for efficient search. Both are rolled over at the same time, one too late and one too early.

The Internals

ISM policies evaluate conditions periodically (default: 5 minutes). When a transition condition is met, the policy initiates the configured actions. Actions execute in order, and the policy waits for each to complete before starting the next.

Rollover conditions support three criteria:

  • min_index_age: time since index creation
  • min_doc_count: number of documents in the index
  • min_size: total primary shard size

When multiple conditions are specified, rollover triggers when any condition is met. This is an OR operation, not AND. A large tenant hits the doc count threshold before the age threshold. A small tenant hits the age threshold before the doc count threshold.

The Implementation

ISM Policy Management via REST

public class ISMPolicyManager {

    private final RestClient restClient;

    public ISMPolicyManager(RestClient restClient) {
        this.restClient = restClient;
    }

    public void createPolicy(String policyId, String policyJson)
            throws IOException {
        Request request = new Request("PUT",
            "/_plugins/_ism/policies/" + policyId);
        request.setJsonEntity(policyJson);

        Response response = restClient.performRequest(request);
        if (response.getStatusLine().getStatusCode() != 201) {
            throw new ISMException(
                "Failed to create ISM policy: " + response.getStatusLine());
        }
    }

    public String getPolicyStatus(String indexName) throws IOException {
        Request request = new Request("GET",
            "/_plugins/_ism/explain/" + indexName);
        Response response = restClient.performRequest(request);
        return EntityUtils.toString(response.getEntity());
    }

    public void retryFailedPolicy(String indexName) throws IOException {
        Request request = new Request("POST",
            "/_plugins/_ism/retry/" + indexName);
        request.setJsonEntity("{\"state\": \"hot\"}");
        restClient.performRequest(request);
    }
}

Tenant-Sized Rollover Policies

// HARDENED: Assign rollover policies based on tenant write volume

public String buildPolicyForTier(String tier) {
    return switch (tier) {
        case "high-volume" -> """
            {
              "policy": {
                "description": "High-volume tenant lifecycle",
                "default_state": "hot",
                "states": [
                  {
                    "name": "hot",
                    "actions": [
                      {
                        "rollover": {
                          "min_doc_count": 500000,
                          "min_size": "40gb",
                          "min_index_age": "7d"
                        }
                      }
                    ],
                    "transitions": [
                      {"state_name": "warm", "conditions": {"min_index_age": "14d"}}
                    ]
                  }
                ]
              }
            }
            """;

        case "low-volume" -> """
            {
              "policy": {
                "description": "Low-volume tenant lifecycle",
                "default_state": "hot",
                "states": [
                  {
                    "name": "hot",
                    "actions": [
                      {
                        "rollover": {
                          "min_index_age": "90d"
                        }
                      }
                    ],
                    "transitions": [
                      {"state_name": "warm", "conditions": {"min_index_age": "180d"}}
                    ]
                  }
                ]
              }
            }
            """;

        default -> throw new IllegalArgumentException("Unknown tier: " + tier);
    };
}

ISM Health Monitor

public record ISMStatus(
    String indexName,
    String currentState,
    String failedReason,
    long retryCount
) {}

public List<ISMStatus> getUnhealthyPolicies() throws IOException {
    Request request = new Request("GET", "/_plugins/_ism/explain/*");
    Response response = restClient.performRequest(request);

    // Parse response and filter for failed or stuck policies
    JsonNode root = objectMapper.readTree(
        EntityUtils.toString(response.getEntity()));

    List<ISMStatus> unhealthy = new ArrayList<>();
    root.fields().forEachRemaining(entry -> {
        String indexName = entry.getKey();
        JsonNode status = entry.getValue();

        if (status.has("info") &&
                status.get("info").has("message") &&
                status.get("info").get("message").asText().contains("failed")) {
            unhealthy.add(new ISMStatus(
                indexName,
                status.path("state").path("name").asText(),
                status.path("info").path("message").asText(),
                status.path("retry_info").path("failed").asLong()
            ));
        }
    });

    return unhealthy;
}

The Measurement

Shard size distribution after 90 days with uniform vs tenant-tier policies:

Policy Typep10 Shard Sizep50 Shard Sizep90 Shard Size
Uniform (30d rollover)180MB12GB85GB
Tenant-tiered2GB18GB42GB

Tenant-tiered policies produce shard sizes within the 5-50GB target range. Uniform policies create both undersized shards (small tenants rolled over too early) and oversized shards (large tenants rolled over too late).

The Decision Rule

Classify tenants into write-volume tiers (high, medium, low) and assign corresponding ISM policies. The classification can be automated based on the trailing 30-day write rate.

Monitor ISM policy execution daily. Failed policies leave indices stranded in the wrong tier, consuming hot-tier resources for data that should be on warm hardware. The _plugins/_ism/explain endpoint is the primary diagnostic tool.

Set rollover conditions to achieve shard sizes between 10GB and 40GB. Shards below 5GB waste overhead. Shards above 50GB slow recovery and rebalancing operations.