Security, Multi-Tenancy, and Governance Controls
[!NOTE] This module explores the core principles of Security, Multi-Tenancy, and Governance Controls, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. The Multi-Tenancy Dilemma
In a SaaS environment, multiple customers (tenants) share your infrastructure. You face a core architectural trade-off: Isolation vs. Density.
Strategy A: Index-per-Tenant
Every customer gets their own index (e.g., tenant_a_logs, tenant_b_logs).
- Pros: Absolute data isolation. You can delete a customer by dropping an index (instant, O(1)). You can back up individual tenants.
- Cons: The Oversharding Trap. If you have 10,000 customers, you have 10,000+ indices. Remember the shard limit from Chapter 1? Your JVM heap will explode from Lucene segment overhead long before your disk is full. This is disastrous for small tenants.
Strategy B: Shared Index with Custom Routing
All customers share a massive index (e.g., global_logs). Every document includes a tenant_id field.
- Pros: Incredible hardware density. Thousands of tenants live happily in 5 primary shards.
- Cons: Noisy Neighbors. A heavy query from Tenant A degrades performance for Tenant B. Deleting a customer requires a massive
_delete_by_queryoperation, which is slow and causes excessive I/O overhead (Lucene tombstones). - The Routing Fix: By default, Elasticsearch scatters a search across all shards. You must use Custom Routing. If you set
routing=tenant_a, Elasticsearch hashes the tenant ID and writes all their data to a single specific shard. At search time, it queries only that shard, avoiding scatter-gather latency.
2. Interactive: Multi-Tenancy Strategies
Compare the physical layout of Index-per-Tenant versus Shared Index with Routing.
3. Implementing Routing at Search Time
If you choose the Shared Index strategy, you must use routing to avoid querying the whole cluster for one tenant.
Java
// Java: Searching with Custom Routing
SearchRequest request = SearchRequest.of(s -> s
.index("global_logs")
.routing("tenant_123") // Crucial: Directs the query to ONLY one shard
.query(q -> q
.bool(b -> b
.filter(f -> f
.term(t -> t.field("tenant_id").value("tenant_123")) // Double-check isolation
)
.must(m -> m
.match(mt -> mt.field("message").query("error"))
)
)
)
);
SearchResponse<LogDoc> response = client.search(request, LogDoc.class);
Go
// Go: Searching with Custom Routing
queryJSON := `{
"query": {
"bool": {
"filter": [
{ "term": { "tenant_id": "tenant_123" } }
],
"must": [
{ "match": { "message": "error" } }
]
}
}
}`
req := esapi.SearchRequest{
Index: []string{"global_logs"},
Routing: []string{"tenant_123"}, // Only searches the single shard hashing to "tenant_123"
Body: strings.NewReader(queryJSON),
}
res, err := req.Do(context.Background(), esClient)
if err != nil {
log.Fatalf("Error searching with routing: %s", err)
}
defer res.Body.Close()
4. Governance & Document Level Security (DLS)
If multiple tenants share an index, filtering by tenant_id in your application layer is dangerous. A single bug could leak data between companies.
Elasticsearch offers Document Level Security (DLS). You create an RBAC Role that enforces a Lucene query template at the cluster level.
When a user authenticates with Role: Tenant_A, Elasticsearch automatically appends a silent filter ("tenant_id": "tenant_A") to every single search they execute. Even if they submit GET /global_logs/_search, they will never see Tenant B’s data.
[!WARNING] DLS adds a small compute overhead to every query because the security filter must be evaluated against the inverted index during search. For massive clusters, cache the security filters aggressively.