Multi-Tenant Search: Index-per-Tenant vs Shared Index
Multi-Tenant Search: Index-per-Tenant vs Shared Index
The documentation platform serves 50 tenants. Each tenant has a separate index with 2 shards and 1 replica: 50 tenants x 2 shards x 2 copies = 200 shards. Manageable. The product launches successfully and grows to 500 tenants: 2,000 shards. The cluster manager node spends 30% of its CPU on cluster state management. At 1,000 tenants, the cluster state update takes 45 seconds. Search latency for all tenants degrades because every cluster state change propagates to every node.
Two Strategies
Index-per-Tenant
Each tenant gets a dedicated index. Complete data isolation. Independent mapping evolution. Independent scaling.
docs-acme-v1 (2 shards x 2 = 4 shard copies)
docs-globex-v1 (2 shards x 2 = 4 shard copies)
docs-initech-v1 (2 shards x 2 = 4 shard copies)
...
Total shards = tenants × shards_per_index × (1 + replicas)
Advantages:
- Complete data isolation: a mapping change for Tenant A cannot affect Tenant B
- Independent ILM: each tenant’s data lifecycle matches their contract
- Simple deletion: dropping a tenant is
DELETE /docs-tenant-v1 - Independent reindexing: reindex one tenant without touching others
Scaling limit: Shard count. Each index adds shards to the cluster state. At 1,000+ tenants with 2 shards and 1 replica, the cluster manages 4,000+ shards. The cluster manager becomes the bottleneck.
Shared Index with Routing
All tenants share a single index. A tenant_id field and custom routing ensure data is partitioned by tenant at the shard level.
docs-shared-v1 (10 shards x 2 = 20 shard copies)
All 500 tenants' data in the same index
Custom routing on tenant_id → each tenant's data on 1-2 shards
Advantages:
- Constant shard count regardless of tenant count
- Efficient resource utilization: small tenants share shard space
- Simple cluster management: cluster state is tiny
Trade-offs:
- No per-tenant mapping changes: all tenants share the same mapping
- Noisy neighbor risk: a large tenant’s heavy query load affects all shards
- Complex deletion: deleting a tenant requires
delete_by_query, not index deletion - Shared ILM: all tenants follow the same lifecycle policy
The Implementation
Shared Index with Filtered Aliases
// HARDENED: Shared index with per-tenant filtered aliases
// Combines the operational simplicity of shared indices
// with the access pattern simplicity of per-tenant indices
public class MultiTenantIndexManager {
private final OpenSearchClient client;
public MultiTenantIndexManager(OpenSearchClient client) {
this.client = client;
}
public void createSharedIndex() throws IOException {
client.indices().create(c -> c
.index("docs-shared-v1")
.settings(s -> s
.numberOfShards("10")
.numberOfReplicas("1")
.putAll(Map.of(
"index.routing.allocation.total_shards_per_node",
JsonData.of(3)
))
)
.mappings(m -> m
.properties("tenant_id", p -> p.keyword(k -> k))
.properties("title", p -> p.text(t -> t
.fields("keyword", f -> f.keyword(k -> k))
.analyzer("standard")
))
.properties("body", p -> p.text(t -> t
.analyzer("standard")))
.routing(r -> r.required(true))
)
);
}
public void onboardTenant(String tenantId) throws IOException {
// Create a filtered, routed alias for the tenant
client.indices().updateAliases(ua -> ua
.actions(a -> a.add(ad -> ad
.index("docs-shared-v1")
.alias("docs-" + tenantId)
.filter(q -> q.term(t -> t
.field("tenant_id").value(tenantId)))
.routing(tenantId)
.searchRouting(tenantId)
))
);
}
public void offboardTenant(String tenantId) throws IOException {
// Remove alias
client.indices().updateAliases(ua -> ua
.actions(a -> a.remove(r -> r
.index("docs-shared-v1")
.alias("docs-" + tenantId)
))
);
// Delete tenant's documents
client.deleteByQuery(d -> d
.index("docs-shared-v1")
.routing(tenantId)
.query(q -> q.term(t -> t
.field("tenant_id").value(tenantId)))
);
}
}
Hybrid Strategy: Large Tenants Get Dedicated Indices
// HARDENED: Route large tenants to dedicated indices,
// small tenants to the shared index
public class HybridTenantRouter {
private final OpenSearchClient client;
private static final long LARGE_TENANT_THRESHOLD = 500_000; // docs
public String resolveIndex(String tenantId) throws IOException {
// Check if tenant has a dedicated index
boolean hasDedicatedIndex = client.indices().exists(e -> e
.index("docs-" + tenantId + "-v*")).value();
if (hasDedicatedIndex) {
return "docs-" + tenantId; // Alias to dedicated index
}
return "docs-" + tenantId; // Alias to shared index (filtered)
}
public void promoteToDedicated(String tenantId) throws IOException {
String dedicatedIndex = "docs-" + tenantId + "-v1";
// Create dedicated index
client.indices().create(c -> c
.index(dedicatedIndex)
.settings(s -> s
.numberOfShards("2")
.numberOfReplicas("1")
)
);
// Reindex tenant's data from shared to dedicated
client.reindex(r -> r
.source(s -> s
.index("docs-shared-v1")
.query(q -> q.term(t -> t
.field("tenant_id").value(tenantId)))
)
.dest(d -> d.index(dedicatedIndex))
);
// Swap alias to point to dedicated index
client.indices().updateAliases(ua -> ua
.actions(a -> a.remove(r -> r
.index("docs-shared-v1")
.alias("docs-" + tenantId)
))
.actions(a -> a.add(ad -> ad
.index(dedicatedIndex)
.alias("docs-" + tenantId)
))
);
// Delete tenant's data from shared index
client.deleteByQuery(d -> d
.index("docs-shared-v1")
.routing(tenantId)
.query(q -> q.term(t -> t
.field("tenant_id").value(tenantId)))
);
}
}
Noisy Neighbor Prevention
// HARDENED: Per-tenant query rate limiting to prevent noisy neighbors
public class TenantRateLimiter {
private final Map<String, RateLimiter> limiters =
new ConcurrentHashMap<>();
private static final double DEFAULT_QPS = 50.0;
private static final double PREMIUM_QPS = 200.0;
public void checkRateLimit(String tenantId, String tier)
throws RateLimitExceededException {
double qps = "premium".equals(tier) ? PREMIUM_QPS : DEFAULT_QPS;
RateLimiter limiter = limiters.computeIfAbsent(tenantId,
k -> RateLimiter.create(qps));
if (!limiter.tryAcquire(Duration.ofMillis(100))) {
throw new RateLimitExceededException(
"Tenant " + tenantId + " exceeded " + qps + " QPS");
}
}
}
The Measurement
Strategy comparison at different tenant counts:
| Metric | Index-per-Tenant (50) | Index-per-Tenant (500) | Shared (500) | Hybrid (500) |
|---|---|---|---|---|
| Total shards | 200 | 2,000 | 20 | 220 |
| Cluster state size | 2MB | 18MB | 0.5MB | 3MB |
| Cluster state update | 200ms | 4.5s | 50ms | 250ms |
| p99 search latency | 22ms | 45ms | 28ms | 25ms |
| Tenant deletion | Instant | Instant | 30s (delete_by_query) | Mixed |
| Mapping isolation | Full | Full | None | Partial |
The hybrid strategy combines the benefits of both approaches: dedicated indices for the 10 largest tenants (90% of the data) and a shared index for the remaining 490 small tenants (10% of the data). Total shard count stays manageable at 220.
The Decision Rule
Use the shared index strategy when tenant count exceeds 100 and most tenants have similar, small data volumes. The shard count advantage dominates at scale.
Use the index-per-tenant strategy when tenants require mapping isolation (different analyzers, different fields), independent ILM policies, or contractual data isolation guarantees.
Use the hybrid strategy when tenant sizes are heterogeneous: a few large tenants with millions of documents alongside hundreds of small tenants with thousands. Promote tenants to dedicated indices when their document count exceeds 500,000 or when they require custom mappings.
Implement application-layer rate limiting regardless of the indexing strategy. The shared index strategy is especially vulnerable to noisy neighbors, but even index-per-tenant deployments share cluster resources (CPU, memory, network, disk I/O).