The Search Platform: Beyond the Cluster
[!NOTE] Architecting a global Search Platform. Gateway Services, Multi-Cluster Routing, and Blue/Green Deployments. Learn the trade-offs for a production system.
1. The Staff Engineer’s View
A Senior Engineer manages a Cluster. A Staff Engineer manages a Platform.
The Platform Components:
- Gateway Service: A proxy (Go/Java) between the App and ES.
- Schema Registry: Git-backed source of truth for mappings.
- Cross-Cluster Replication (CCR): Syncing data between
us-east-1andeu-west-1.
Why a Gateway?
- Protection: Prevent “Kill Queries” (
*.*regex). - Abstraction: Let apps query
/search/usersinstead of knowing index names (users-v1). - Circuit Breaking: Fail fast if ES is overloaded.
2. Blue/Green Deployments (Zero Downtime)
You need to change a mapping (e.g., text to keyword).
You cannot do this in-place.
The Dance:
- Green: Current Index (
users-v1). Aliasuserspoints here. - Blue: Create new Index (
users-v2) with new mapping. - Reindex: Copy data V1 -> V2.
- Swap: Atomic Alias switch.
usersnow points to V2. - Delete: V1.
3. Interactive: Multi-Region Routing
Simulate a global outage and failover.
👤
User App
Gateway
US-East
(Primary)
ONLINE
EU-West
(Failover)
STANDBY
Routing: US-East (Latency: 20ms)
4. Hardware Reality: Cost of Redundancy
- Cross-Cluster Replication: Bandwidth costs ($0.02/GB) apply.
- Storage: 2 Regions = 2x Storage Cost.
- Staff Decision: Is 99.99% uptime worth 2x the bill?
- Tier 1 (User Search): Yes.
- Tier 3 (Internal Logs): No. Only DR backup to S3.