RPC and gRPC: High-Speed Microservices

Netflix migrated their entire internal microservice communication layer from REST to gRPC in 2019 — and measured a 60% reduction in CPU usage on their API gateway. Google uses gRPC for virtually all internal communication across services that handle billions of requests per day. But why? And what makes raw HTTP+JSON unsuitable when you control both the client and server?

The answer lies in a simple physics problem: parsing text is expensive. At 1 million RPC calls per second, saving 10μs per call on JSON parsing frees up 10 CPU-seconds of work per second. That’s the gRPC advantage.

[!IMPORTANT] In this lesson, you will master:

Binary Advantage: Why Protobuf’s Varint and Field-Tag encoding beats JSON parsing at the CPU cache level.

Streaming Architecture: Leveraging HTTP/2 to build Bi-directional, Server-side, and Client-side streams.

The LB Trap & Deadlines: Why gRPC breaks standard Load Balancers and how context propagation prevents cascading failures.

1. What is gRPC?

gRPC (gRPC Remote Procedure Call) is Google’s open-source framework for high-performance communication.

Protocol Buffers (Protobuf): Binary serialization (not JSON).
HTTP/2: Multiplexing and streaming built-in.
Strict Contracts: You define the API in .proto files first.

2. The Power of Protobuf (vs JSON)

Why is gRPC 10x faster?

Size: JSON repeats keys ("name": "Alice", "name": "Bob"). Protobuf uses numbered tags (1: "Alice", 1: "Bob").
Parsing: Parsing text (JSON) is CPU expensive. Parsing binary (Protobuf) is instant.

[!NOTE] Hardware-First Intuition: JSON parsing is a “Branch Predictor Nightmare”. The CPU has to keep checking “Is this a quote? Is this a colon?”. Protobuf is a flat stream of tag-value pairs. Modern CPUs can pre-fetch the next field because the structure is predictable, making it significantly more cache-friendly.

Interactive Demo: Serialization Overhead Race

Parsing text (JSON) requires scanning every character for quotes, colons, and brackets. Protobuf just reads bytes directly into memory structs.

JSON Parsing (CPU Intensive)

Waiting...

Protobuf Parsing (Zero Copy)

Waiting...

Size Comparator: JSON vs Protobuf

Type a value to see how Protobuf strips away the metadata overhead.

Field: user_name

JSON (Text)

{"user_name":"Satoshi"}

22 bytes

{ }

Protobuf (Binary)

0a 07 53 61 74 6f 73 68 69

9 bytes

      SAVING: 59% SPACE
  

A. Elite Deep Dive: Protobuf Varint Encoding

How does Protobuf save so much space? It uses Varints (Variable-length Integers).

JSON: The number 150 takes 3 bytes ('1', '5', '0').
Protobuf: Uses the MSB (Most Significant Bit) as a “continuation bit”. 150 is encoded as 0x96 0x01 (2 bytes). Numbers below 128 take only 1 byte.
Field Tags: Instead of sending the string "user_id", it sends a “Tag” (e.g., 1). This is why you never change a field ID in a .proto file.

[!TIP] Analogy: The Data Train Imagine JSON as shipping individual, self-contained boxes where each box has a giant label spelling out “DESTINATION: NEW YORK”. Protobuf is like a train where the order and ID of the train cars are known beforehand. The server just says “Put the passenger in Car 1, the luggage in Car 2.” This is why changing a field ID breaks the train—the receiving station puts the passenger in the luggage car!

3. The “Load Balancing” Nightmare

This is the most common gRPC interview trap. “How do you load balance gRPC?”

The Problem: Sticky Connections

REST (HTTP/1.1): Client opens connection, sends request, gets response, closes connection. The Load Balancer (LB) can easily round-robin requests.
gRPC (HTTP/2): Client opens One Persistent Connection and keeps it open for days.
If you put a standard L4 LB (AWS NLB) in the middle, it just forwards that one TCP connection to one server.
Result: Server A gets 100% of traffic. Server B gets 0%. (See L4 vs L7 Load Balancing).

[!NOTE] Analogy: The Water Hose vs. The Mail Sorter An L4 Load Balancer is a pipe routing water. Once the persistent gRPC connection (the hose) is connected to Server A, all water goes to Server A. Server B stays bone dry. An L7 Proxy (like Envoy) is a Mail Sorter. It terminates the connection, opens the envelopes (the HTTP/2 frames), and hands each individual letter to Server A, Server B, Server A, etc.

The Solutions

L7 Load Balancing (Proxy): Use a smart proxy (e.g., Envoy, Nginx). It terminates the HTTP/2 connection, inspects individual requests, and distributes them. (Most common).
Client-Side Balancing (Lookaside): The Client asks a Service Registry (e.g., Consul) for a list of IPs and connects to all of them, doing its own Round Robin. (Complex client logic).

3.1 Observability: The Envoy Edge

Using an L7 Proxy (Envoy) doesn’t just solve load balancing; it gives you Distributed Tracing for free.

Trace Propagation: Envoy can inject x-request-id or Zipkin/Jaeger headers into the gRPC metadata.
Retries & Circuit Breaking: You can configure retries for UNAVAILABLE errors at the proxy level without changing a single line of application code.

4. Interactive Demo: L4 vs L7 Load Balancing

Visualize why L4 fails for gRPC.

Mode L4: All requests follow the Single Connection to Server 1. Server 2 is idle.
Mode L7: The Proxy opens connections to both. Requests are distributed evenly.

The gRPC Load Balancing Trap

CLI

CLIENT

PROXY

L4 BALANCER

SRV 1

REQS: 0

SRV 2

REQS: 0

ID: #99

L4 sees one persistent TCP connection and sticks to it.

B. The 4 Types of gRPC

Unlike REST, which is strictly Request-Response, gRPC supports:

Unary: One request, one response. (Most common).
Server Streaming: One request (e.g., “Get all logs”), many responses (Live stream of logs).
Client Streaming: many requests (e.g., “Upload 1GB file in chunks”), one response (“Success”).
Bi-directional Streaming: Chat, real-time gaming, or complex orchestration. Both sides send data whenever they want.

5. Interactive Demo: Schema Evolution (Protobuf)

See why Protobuf is “Backward Compatible”.

We start with a simple message.
Click “Add Email Field”.
Notice the Hex Output grows, but the original bytes (Tag 1 and 2) stay exactly the same. Old clients can still read the name and ID!

// user.proto
message User {

              int32 id = 1;
          
              string name = 2;
          
              string email = 3;
          
}

Payload Hex View

Blue = Field Tag (ID) Gray = Value (Data)

System Walkthrough: The gRPC Call

When you run client.GetUser({id: 150}), what happens?

Stub: Code generated from .proto takes your object.
Serialization: Converts {id: 150} into 08 96 01 (Protobuf).
Framing (HTTP/2): Wraps it in a DATA frame.
- Adds 5 bytes prefix: [Compressed Flag] [Length (4 bytes)].
Network: Sends over persistent TCP connection.
Server: Decodes frame → Deserializes Protobuf → Calls actual Go/Java function.

6. Can I use gRPC in the Browser?

No, not directly.

The Problem

gRPC relies heavily on HTTP/2 Trailers (headers sent after the body) to send the Status Code (e.g., grpc-status: 0). Browser JavaScript APIs (fetch, XHR) generally do not give you access to HTTP/2 Trailers. If the request fails, the browser hides the specific gRPC error.

The Solution: gRPC-Web

gRPC-Web is a protocol that wraps the gRPC data in a way browsers can understand (often base64 encoded text). You need a “Translation Layer” (Proxy) in the middle.

Browser

gRPC-Web
(HTTP/1.1 or 2)

→

Envoy Proxy

Translates
Encoding

→

Go/Java Svc

Pure gRPC
(HTTP/2)

      The Envoy Proxy strips the "Web" wrapper and talks pure gRPC to the backend.
  

7. gRPC vs HTTP Status Codes

gRPC doesn’t use 200/404. It uses its own Enum.

gRPC Status	HTTP Code	Meaning
OK (0)	200	Success.
INVALID_ARGUMENT (3)	400	Bad Request (Validation failed).
NOT_FOUND (5)	404	Resource missing.
PERMISSION_DENIED (7)	403	Auth failed.
UNAUTHENTICATED (16)	401	Missing Token.
RESOURCE_EXHAUSTED (8)	429	Rate limit hit.
UNAVAILABLE (14)	503	Server down / Maintenance.

7.1 Beyond Codes: The Rich Error Model

Standard gRPC status codes (0-16) are often not enough. What if validation failed and you need to tell the client which field was wrong?

The Fix: google.rpc.Status.
This is a special Protobuf message that includes the code, a message, and a list of Details (which are also Protobuf messages).
Example: A BadRequest detail can contain a list of FieldViolation objects. This is much cleaner than parsing custom JSON error strings in REST.

7.2 The Silent Killer: No Deadlines (Timeouts)

In microservices, if Service A calls B, and B calls C, and C hangs… the whole chain hangs. gRPC solves this with Deadlines (Context Propagation).

Service A: “I need this done in 100ms.” (Sends request to B with grpc-timeout: 100m).
Service B: Takes 20ms to process. Calls Service C. Forwarding the remaining time (80ms).
Service C: Takes 90ms.
Result: At 80ms, Service B cancels the request to C and returns DEADLINE_EXCEEDED to A. The system fails fast instead of hanging.

[!TIP] Always set Deadlines. The default is “Infinite”, which is a production outage waiting to happen.

[!WARNING] War Story: The 3AM Cascading Failure A major streaming service experienced a massive outage when a tiny, non-critical logging microservice database locked up. Because the core video microservices used gRPC with infinite default timeouts, they waited forever for the logging service to reply. The thread pools exhausted, memory spiked, and the entire system died. A simple 50ms deadline on the logging call would have isolated the failure to lost logs, rather than lost customers.

8. Summary: REST vs gRPC

Feature	REST (Open API)	gRPC (Internal)
Payload	JSON (Text)	Protobuf (Binary)
Contract	Loose (Swagger)	Strict (.proto)
Streaming	Request/Response only	Bi-directional Streaming
Best For	Mobile apps, Public APIs	Microservices, High throughput

[!IMPORTANT] gRPC-Web: Browsers cannot speak raw gRPC because they don’t have access to HTTP/2 trailers. You need a proxy like Envoy to translate between the browser and the gRPC backend.

Staff Engineer Tip: Always Set gRPC Deadlines. The default gRPC timeout is “Infinite” — which means if Service C hangs, Service B hangs, and Service A hangs, and your whole system degrades silently for minutes. This is called a cascading failure. The fix is simple: always set a grpc-timeout on every call and propagate the remaining deadline downstream. When Service A gives Service B 100ms, and B takes 20ms processing, B should give Service C only 80ms (not a fresh 100ms). This is called deadline budget propagation and it’s the difference between a 10-second outage and a 100ms failure. Add this as a code review checklist item on your team.

RPC and gRPC: High-Speed Microservices

RPC and gRPC: High-Speed Microservices

1. What is gRPC?

2. The Power of Protobuf (vs JSON)

Interactive Demo: Serialization Overhead Race

Size Comparator: JSON vs Protobuf

Size Comparator: JSON vs Protobuf

A. Elite Deep Dive: Protobuf Varint Encoding

3. The “Load Balancing” Nightmare

The Problem: Sticky Connections

The Solutions

3.1 Observability: The Envoy Edge

4. Interactive Demo: L4 vs L7 Load Balancing

The gRPC Load Balancing Trap

B. The 4 Types of gRPC

5. Interactive Demo: Schema Evolution (Protobuf)

System Walkthrough: The gRPC Call

6. Can I use gRPC in the Browser?

The Problem

The Solution: gRPC-Web

7. gRPC vs HTTP Status Codes

7.1 Beyond Codes: The Rich Error Model

7.2 The Silent Killer: No Deadlines (Timeouts)

8. Summary: REST vs gRPC

Found this lesson helpful?