The Search Platform: Beyond the Cluster

[!NOTE] Architecting a global Search Platform. Gateway Services, Multi-Cluster Routing, and Blue/Green Deployments. Learn the trade-offs for a production system.

1. The Staff Engineer’s View

A Senior Engineer manages a Cluster. A Staff Engineer manages a Platform.

The Platform Components:

  1. Gateway Service: A proxy (Go/Java) between the App and ES.
  2. Schema Registry: Git-backed source of truth for mappings.
  3. Cross-Cluster Replication (CCR): Syncing data between us-east-1 and eu-west-1.

Why a Gateway?

  • Protection: Prevent “Kill Queries” (*.* regex).
  • Abstraction: Let apps query /search/users instead of knowing index names (users-v1).
  • Circuit Breaking: Fail fast if ES is overloaded.

2. Blue/Green Deployments (Zero Downtime)

You need to change a mapping (e.g., text to keyword). You cannot do this in-place. The Dance:

  1. Green: Current Index (users-v1). Alias users points here.
  2. Blue: Create new Index (users-v2) with new mapping.
  3. Reindex: Copy data V1 -> V2.
  4. Swap: Atomic Alias switch. users now points to V2.
  5. Delete: V1.

3. Interactive: Multi-Region Routing

Simulate a global outage and failover.

👤
User App
Gateway
US-East
(Primary)
ONLINE
EU-West
(Failover)
STANDBY
Routing: US-East (Latency: 20ms)

4. Hardware Reality: Cost of Redundancy

  • Cross-Cluster Replication: Bandwidth costs ($0.02/GB) apply.
  • Storage: 2 Regions = 2x Storage Cost.
  • Staff Decision: Is 99.99% uptime worth 2x the bill?
  • Tier 1 (User Search): Yes.
  • Tier 3 (Internal Logs): No. Only DR backup to S3.