Multi-Tenant Search: Index-per-Tenant vs Shared Index

The documentation platform serves 50 tenants. Each tenant has a separate index with 2 shards and 1 replica: 50 tenants x 2 shards x 2 copies = 200 shards. Manageable. The product launches successfully and grows to 500 tenants: 2,000 shards. The cluster manager node spends 30% of its CPU on cluster state management. At 1,000 tenants, the cluster state update takes 45 seconds. Search latency for all tenants degrades because every cluster state change propagates to every node.

Two Strategies

Index-per-Tenant

Each tenant gets a dedicated index. Complete data isolation. Independent mapping evolution. Independent scaling.

docs-acme-v1     (2 shards x 2 = 4 shard copies)
docs-globex-v1   (2 shards x 2 = 4 shard copies)
docs-initech-v1  (2 shards x 2 = 4 shard copies)
...
Total shards = tenants × shards_per_index × (1 + replicas)

Advantages:

Complete data isolation: a mapping change for Tenant A cannot affect Tenant B
Independent ILM: each tenant’s data lifecycle matches their contract
Simple deletion: dropping a tenant is DELETE /docs-tenant-v1
Independent reindexing: reindex one tenant without touching others

Scaling limit: Shard count. Each index adds shards to the cluster state. At 1,000+ tenants with 2 shards and 1 replica, the cluster manages 4,000+ shards. The cluster manager becomes the bottleneck.

Shared Index with Routing

All tenants share a single index. A tenant_id field and custom routing ensure data is partitioned by tenant at the shard level.

docs-shared-v1   (10 shards x 2 = 20 shard copies)
All 500 tenants' data in the same index
Custom routing on tenant_id → each tenant's data on 1-2 shards

Advantages:

Constant shard count regardless of tenant count
Efficient resource utilization: small tenants share shard space
Simple cluster management: cluster state is tiny

Trade-offs:

No per-tenant mapping changes: all tenants share the same mapping
Noisy neighbor risk: a large tenant’s heavy query load affects all shards
Complex deletion: deleting a tenant requires delete_by_query, not index deletion
Shared ILM: all tenants follow the same lifecycle policy

The Implementation

Shared Index with Filtered Aliases

// HARDENED: Shared index with per-tenant filtered aliases
// Combines the operational simplicity of shared indices
// with the access pattern simplicity of per-tenant indices

public class MultiTenantIndexManager {

    private final OpenSearchClient client;

    public MultiTenantIndexManager(OpenSearchClient client) {
        this.client = client;
    }

    public void createSharedIndex() throws IOException {
        client.indices().create(c -> c
            .index("docs-shared-v1")
            .settings(s -> s
                .numberOfShards("10")
                .numberOfReplicas("1")
                .putAll(Map.of(
                    "index.routing.allocation.total_shards_per_node",
                    JsonData.of(3)
                ))
            )
            .mappings(m -> m
                .properties("tenant_id", p -> p.keyword(k -> k))
                .properties("title", p -> p.text(t -> t
                    .fields("keyword", f -> f.keyword(k -> k))
                    .analyzer("standard")
                ))
                .properties("body", p -> p.text(t -> t
                    .analyzer("standard")))
                .routing(r -> r.required(true))
            )
        );
    }

    public void onboardTenant(String tenantId) throws IOException {
        // Create a filtered, routed alias for the tenant
        client.indices().updateAliases(ua -> ua
            .actions(a -> a.add(ad -> ad
                .index("docs-shared-v1")
                .alias("docs-" + tenantId)
                .filter(q -> q.term(t -> t
                    .field("tenant_id").value(tenantId)))
                .routing(tenantId)
                .searchRouting(tenantId)
            ))
        );
    }

    public void offboardTenant(String tenantId) throws IOException {
        // Remove alias
        client.indices().updateAliases(ua -> ua
            .actions(a -> a.remove(r -> r
                .index("docs-shared-v1")
                .alias("docs-" + tenantId)
            ))
        );

        // Delete tenant's documents
        client.deleteByQuery(d -> d
            .index("docs-shared-v1")
            .routing(tenantId)
            .query(q -> q.term(t -> t
                .field("tenant_id").value(tenantId)))
        );
    }
}

Hybrid Strategy: Large Tenants Get Dedicated Indices

// HARDENED: Route large tenants to dedicated indices,
// small tenants to the shared index

public class HybridTenantRouter {

    private final OpenSearchClient client;
    private static final long LARGE_TENANT_THRESHOLD = 500_000; // docs

    public String resolveIndex(String tenantId) throws IOException {
        // Check if tenant has a dedicated index
        boolean hasDedicatedIndex = client.indices().exists(e -> e
            .index("docs-" + tenantId + "-v*")).value();

        if (hasDedicatedIndex) {
            return "docs-" + tenantId;  // Alias to dedicated index
        }

        return "docs-" + tenantId;  // Alias to shared index (filtered)
    }

    public void promoteToDedicated(String tenantId) throws IOException {
        String dedicatedIndex = "docs-" + tenantId + "-v1";

        // Create dedicated index
        client.indices().create(c -> c
            .index(dedicatedIndex)
            .settings(s -> s
                .numberOfShards("2")
                .numberOfReplicas("1")
            )
        );

        // Reindex tenant's data from shared to dedicated
        client.reindex(r -> r
            .source(s -> s
                .index("docs-shared-v1")
                .query(q -> q.term(t -> t
                    .field("tenant_id").value(tenantId)))
            )
            .dest(d -> d.index(dedicatedIndex))
        );

        // Swap alias to point to dedicated index
        client.indices().updateAliases(ua -> ua
            .actions(a -> a.remove(r -> r
                .index("docs-shared-v1")
                .alias("docs-" + tenantId)
            ))
            .actions(a -> a.add(ad -> ad
                .index(dedicatedIndex)
                .alias("docs-" + tenantId)
            ))
        );

        // Delete tenant's data from shared index
        client.deleteByQuery(d -> d
            .index("docs-shared-v1")
            .routing(tenantId)
            .query(q -> q.term(t -> t
                .field("tenant_id").value(tenantId)))
        );
    }
}

Noisy Neighbor Prevention

// HARDENED: Per-tenant query rate limiting to prevent noisy neighbors

public class TenantRateLimiter {

    private final Map<String, RateLimiter> limiters =
        new ConcurrentHashMap<>();

    private static final double DEFAULT_QPS = 50.0;
    private static final double PREMIUM_QPS = 200.0;

    public void checkRateLimit(String tenantId, String tier)
            throws RateLimitExceededException {

        double qps = "premium".equals(tier) ? PREMIUM_QPS : DEFAULT_QPS;

        RateLimiter limiter = limiters.computeIfAbsent(tenantId,
            k -> RateLimiter.create(qps));

        if (!limiter.tryAcquire(Duration.ofMillis(100))) {
            throw new RateLimitExceededException(
                "Tenant " + tenantId + " exceeded " + qps + " QPS");
        }
    }
}

Multi-tenant architecture comparison showing index-per-tenant, shared index, and hybrid strategies with shard distribution

The Measurement

Strategy comparison at different tenant counts:

Metric	Index-per-Tenant (50)	Index-per-Tenant (500)	Shared (500)	Hybrid (500)
Total shards	200	2,000	20	220
Cluster state size	2MB	18MB	0.5MB	3MB
Cluster state update	200ms	4.5s	50ms	250ms
p99 search latency	22ms	45ms	28ms	25ms
Tenant deletion	Instant	Instant	30s (delete_by_query)	Mixed
Mapping isolation	Full	Full	None	Partial

The hybrid strategy combines the benefits of both approaches: dedicated indices for the 10 largest tenants (90% of the data) and a shared index for the remaining 490 small tenants (10% of the data). Total shard count stays manageable at 220.

The Decision Rule

Use the shared index strategy when tenant count exceeds 100 and most tenants have similar, small data volumes. The shard count advantage dominates at scale.

Use the index-per-tenant strategy when tenants require mapping isolation (different analyzers, different fields), independent ILM policies, or contractual data isolation guarantees.

Use the hybrid strategy when tenant sizes are heterogeneous: a few large tenants with millions of documents alongside hundreds of small tenants with thousands. Promote tenants to dedicated indices when their document count exceeds 500,000 or when they require custom mappings.

Implement application-layer rate limiting regardless of the indexing strategy. The shared index strategy is especially vulnerable to noisy neighbors, but even index-per-tenant deployments share cluster resources (CPU, memory, network, disk I/O).