RPC and gRPC: High-Speed Microservices
Netflix migrated their entire internal microservice communication layer from REST to gRPC in 2019 — and measured a 60% reduction in CPU usage on their API gateway. Google uses gRPC for virtually all internal communication across services that handle billions of requests per day. But why? And what makes raw HTTP+JSON unsuitable when you control both the client and server?
The answer lies in a simple physics problem: parsing text is expensive. At 1 million RPC calls per second, saving 10μs per call on JSON parsing frees up 10 CPU-seconds of work per second. That’s the gRPC advantage.
[!IMPORTANT] In this lesson, you will master:
- Binary Advantage: Why Protobuf’s Varint and Field-Tag encoding beats JSON parsing at the CPU cache level.
- Streaming Architecture: Leveraging HTTP/2 to build Bi-directional, Server-side, and Client-side streams.
- The LB Trap & Deadlines: Why gRPC breaks standard Load Balancers and how context propagation prevents cascading failures.
1. What is gRPC?
gRPC (gRPC Remote Procedure Call) is Google’s open-source framework for high-performance communication.
- Protocol Buffers (Protobuf): Binary serialization (not JSON).
- HTTP/2: Multiplexing and streaming built-in.
- Strict Contracts: You define the API in
.protofiles first.
2. The Power of Protobuf (vs JSON)
Why is gRPC 10x faster?
- Size: JSON repeats keys (
"name": "Alice","name": "Bob"). Protobuf uses numbered tags (1: "Alice",1: "Bob"). - Parsing: Parsing text (JSON) is CPU expensive. Parsing binary (Protobuf) is instant.
[!NOTE] Hardware-First Intuition: JSON parsing is a “Branch Predictor Nightmare”. The CPU has to keep checking “Is this a quote? Is this a colon?”. Protobuf is a flat stream of tag-value pairs. Modern CPUs can pre-fetch the next field because the structure is predictable, making it significantly more cache-friendly.
Interactive Demo: Serialization Overhead Race
Parsing text (JSON) requires scanning every character for quotes, colons, and brackets. Protobuf just reads bytes directly into memory structs.
Size Comparator: JSON vs Protobuf
Size Comparator: JSON vs Protobuf
Type a value to see how Protobuf strips away the metadata overhead.
A. Elite Deep Dive: Protobuf Varint Encoding
How does Protobuf save so much space? It uses Varints (Variable-length Integers).
- JSON: The number
150takes 3 bytes ('1','5','0'). - Protobuf: Uses the MSB (Most Significant Bit) as a “continuation bit”.
150is encoded as0x96 0x01(2 bytes). Numbers below 128 take only 1 byte. - Field Tags: Instead of sending the string
"user_id", it sends a “Tag” (e.g.,1). This is why you never change a field ID in a.protofile.
[!TIP] Analogy: The Data Train Imagine JSON as shipping individual, self-contained boxes where each box has a giant label spelling out “DESTINATION: NEW YORK”. Protobuf is like a train where the order and ID of the train cars are known beforehand. The server just says “Put the passenger in Car 1, the luggage in Car 2.” This is why changing a field ID breaks the train—the receiving station puts the passenger in the luggage car!
3. The “Load Balancing” Nightmare
This is the most common gRPC interview trap. “How do you load balance gRPC?”
The Problem: Sticky Connections
- REST (HTTP/1.1): Client opens connection, sends request, gets response, closes connection. The Load Balancer (LB) can easily round-robin requests.
- gRPC (HTTP/2): Client opens One Persistent Connection and keeps it open for days.
- If you put a standard L4 LB (AWS NLB) in the middle, it just forwards that one TCP connection to one server.
- Result: Server A gets 100% of traffic. Server B gets 0%. (See L4 vs L7 Load Balancing).
[!NOTE] Analogy: The Water Hose vs. The Mail Sorter An L4 Load Balancer is a pipe routing water. Once the persistent gRPC connection (the hose) is connected to Server A, all water goes to Server A. Server B stays bone dry. An L7 Proxy (like Envoy) is a Mail Sorter. It terminates the connection, opens the envelopes (the HTTP/2 frames), and hands each individual letter to Server A, Server B, Server A, etc.
The Solutions
- L7 Load Balancing (Proxy): Use a smart proxy (e.g., Envoy, Nginx). It terminates the HTTP/2 connection, inspects individual requests, and distributes them. (Most common).
- Client-Side Balancing (Lookaside): The Client asks a Service Registry (e.g., Consul) for a list of IPs and connects to all of them, doing its own Round Robin. (Complex client logic).
3.1 Observability: The Envoy Edge
Using an L7 Proxy (Envoy) doesn’t just solve load balancing; it gives you Distributed Tracing for free.
- Trace Propagation: Envoy can inject
x-request-idor Zipkin/Jaeger headers into the gRPC metadata. - Retries & Circuit Breaking: You can configure retries for
UNAVAILABLEerrors at the proxy level without changing a single line of application code.
4. Interactive Demo: L4 vs L7 Load Balancing
Visualize why L4 fails for gRPC.
- Mode L4: All requests follow the Single Connection to Server 1. Server 2 is idle.
- Mode L7: The Proxy opens connections to both. Requests are distributed evenly.
The gRPC Load Balancing Trap
L4 sees one persistent TCP connection and sticks to it.
B. The 4 Types of gRPC
Unlike REST, which is strictly Request-Response, gRPC supports:
- Unary: One request, one response. (Most common).
- Server Streaming: One request (e.g., “Get all logs”), many responses (Live stream of logs).
- Client Streaming: many requests (e.g., “Upload 1GB file in chunks”), one response (“Success”).
- Bi-directional Streaming: Chat, real-time gaming, or complex orchestration. Both sides send data whenever they want.
5. Interactive Demo: Schema Evolution (Protobuf)
See why Protobuf is “Backward Compatible”.
- We start with a simple message.
- Click “Add Email Field”.
- Notice the Hex Output grows, but the original bytes (Tag 1 and 2) stay exactly the same. Old clients can still read the name and ID!
System Walkthrough: The gRPC Call
When you run client.GetUser({id: 150}), what happens?
- Stub: Code generated from
.prototakes your object. - Serialization: Converts
{id: 150}into08 96 01(Protobuf). - Framing (HTTP/2): Wraps it in a DATA frame.
- Adds 5 bytes prefix:
[Compressed Flag] [Length (4 bytes)].
- Adds 5 bytes prefix:
- Network: Sends over persistent TCP connection.
- Server: Decodes frame → Deserializes Protobuf → Calls actual Go/Java function.
6. Can I use gRPC in the Browser?
No, not directly.
The Problem
gRPC relies heavily on HTTP/2 Trailers (headers sent after the body) to send the Status Code (e.g., grpc-status: 0).
Browser JavaScript APIs (fetch, XHR) generally do not give you access to HTTP/2 Trailers. If the request fails, the browser hides the specific gRPC error.
The Solution: gRPC-Web
gRPC-Web is a protocol that wraps the gRPC data in a way browsers can understand (often base64 encoded text). You need a “Translation Layer” (Proxy) in the middle.
(HTTP/1.1 or 2)
Encoding
(HTTP/2)
7. gRPC vs HTTP Status Codes
gRPC doesn’t use 200/404. It uses its own Enum.
| gRPC Status | HTTP Code | Meaning |
|---|---|---|
| OK (0) | 200 | Success. |
| INVALID_ARGUMENT (3) | 400 | Bad Request (Validation failed). |
| NOT_FOUND (5) | 404 | Resource missing. |
| PERMISSION_DENIED (7) | 403 | Auth failed. |
| UNAUTHENTICATED (16) | 401 | Missing Token. |
| RESOURCE_EXHAUSTED (8) | 429 | Rate limit hit. |
| UNAVAILABLE (14) | 503 | Server down / Maintenance. |
7.1 Beyond Codes: The Rich Error Model
Standard gRPC status codes (0-16) are often not enough. What if validation failed and you need to tell the client which field was wrong?
- The Fix:
google.rpc.Status. - This is a special Protobuf message that includes the code, a message, and a list of Details (which are also Protobuf messages).
- Example: A
BadRequestdetail can contain a list ofFieldViolationobjects. This is much cleaner than parsing custom JSON error strings in REST.
7.2 The Silent Killer: No Deadlines (Timeouts)
In microservices, if Service A calls B, and B calls C, and C hangs… the whole chain hangs. gRPC solves this with Deadlines (Context Propagation).
- Service A: “I need this done in 100ms.” (Sends request to B with
grpc-timeout: 100m). - Service B: Takes 20ms to process. Calls Service C. Forwarding the remaining time (80ms).
- Service C: Takes 90ms.
- Result: At 80ms, Service B cancels the request to C and returns
DEADLINE_EXCEEDEDto A. The system fails fast instead of hanging.
[!TIP] Always set Deadlines. The default is “Infinite”, which is a production outage waiting to happen.
[!WARNING] War Story: The 3AM Cascading Failure A major streaming service experienced a massive outage when a tiny, non-critical logging microservice database locked up. Because the core video microservices used gRPC with infinite default timeouts, they waited forever for the logging service to reply. The thread pools exhausted, memory spiked, and the entire system died. A simple 50ms deadline on the logging call would have isolated the failure to lost logs, rather than lost customers.
8. Summary: REST vs gRPC
| Feature | REST (Open API) | gRPC (Internal) |
|---|---|---|
| Payload | JSON (Text) | Protobuf (Binary) |
| Contract | Loose (Swagger) | Strict (.proto) |
| Streaming | Request/Response only | Bi-directional Streaming |
| Best For | Mobile apps, Public APIs | Microservices, High throughput |
[!IMPORTANT] gRPC-Web: Browsers cannot speak raw gRPC because they don’t have access to HTTP/2 trailers. You need a proxy like Envoy to translate between the browser and the gRPC backend.
Staff Engineer Tip: Always Set gRPC Deadlines. The default gRPC timeout is “Infinite” — which means if Service C hangs, Service B hangs, and Service A hangs, and your whole system degrades silently for minutes. This is called a cascading failure. The fix is simple: always set a grpc-timeout on every call and propagate the remaining deadline downstream. When Service A gives Service B 100ms, and B takes 20ms processing, B should give Service C only 80ms (not a fresh 100ms). This is called deadline budget propagation and it’s the difference between a 10-second outage and a 100ms failure. Add this as a code review checklist item on your team.