gRPC Gateway Low-Level Design: Transcoding, Load Balancing, and Service Reflection

gRPC Fundamentals

gRPC uses HTTP/2 as its transport and Protocol Buffers as its serialization format. Compared to REST+JSON: strongly typed contracts enforced by the compiler, binary encoding is smaller and faster to parse, HTTP/2 multiplexing allows many concurrent streams over one TCP connection, and bidirectional streaming is a first-class feature.

Services and messages are defined in .proto files. Code generation produces type-safe client and server stubs in any supported language. Contract changes are governed by Protobuf's backward-compatibility rules: adding optional fields is safe, removing or renumbering fields breaks clients.

HTTP/JSON Transcoding

Web browsers cannot speak native gRPC (HTTP/2 trailers are not accessible via the Fetch API). grpc-gateway bridges this gap: it reads HTTP mapping annotations in the .proto file and generates a reverse proxy that transcodes HTTP/JSON requests to gRPC calls:

service OrderService {
  rpc GetOrder (GetOrderRequest) returns (Order) {
    option (google.api.http) = {
      get: "/api/v1/orders/{order_id}"
    };
  }
  rpc CreateOrder (CreateOrderRequest) returns (Order) {
    option (google.api.http) = {
      post: "/api/v1/orders"
      body: "*"
    };
  }
}

The generated gateway converts the HTTP GET request and path parameters into a GetOrderRequest Protobuf message, forwards it to the gRPC server, and converts the Protobuf response back to JSON. Clients use standard HTTP; the gateway handles the translation transparently.

Load Balancing Strategies

gRPC runs over HTTP/2, which multiplexes many RPCs over a single TCP connection. Standard L4 load balancers distribute at the connection level — once a client establishes a connection to one backend, all its RPCs go to that backend. This defeats the purpose of load balancing.

  • Client-side load balancing: The gRPC client resolves the service DNS name to all backend pod IPs (requires a headless Kubernetes service) and applies round-robin across them, opening a connection to each. Every RPC is dispatched independently. No proxy needed. Requires DNS that returns all pod IPs — standard Kubernetes ClusterIP services return only one virtual IP.
  • Proxy-side load balancing: Envoy or NGINX acts as an L7 proxy, maintaining upstream connections to all backends and distributing RPCs. Client connects to the proxy. Simpler client configuration, proxy handles health-checking and retries. Adds one network hop.

Service Discovery Integration

For client-side load balancing, DNS must return all backend pod IPs. Kubernetes headless service (clusterIP: None) does this — a DNS query returns an A record per pod. As pods scale, DNS reflects the change within the TTL window.

For dynamic environments, xDS API (used by Envoy) provides real-time service discovery. A control plane (Istio, custom xDS server) pushes endpoint updates to Envoy without DNS TTL delays. gRPC itself has a native xDS resolver that enables client-side load balancing with xDS — no Envoy sidecar required.

Server Reflection

gRPC server reflection allows clients to query a running server for its available services and method signatures without access to the original .proto files:

# List services on a running gRPC server:
grpcurl -plaintext localhost:50051 list

# Describe a service:
grpcurl -plaintext localhost:50051 describe OrderService

# Make a call without a .proto file:
grpcurl -plaintext -d '{"order_id":"ord_123"}' 
  localhost:50051 OrderService/GetOrder

Reflection enables tooling like grpcurl and Postman gRPC to work without .proto files. Disable reflection in production if you do not want to expose your API surface to unauthenticated callers — or protect the reflection service with auth middleware.

Deadline Propagation

gRPC deadlines are set by the client and propagate through the entire call chain. Every intermediate service must respect the remaining deadline:

// Client sets a 2-second deadline for the entire operation
ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
defer cancel()
resp, err := orderClient.GetOrder(ctx, &GetOrderRequest{OrderId: "ord_123"})
// If order-service calls inventory-service, ctx carries the remaining deadline
// inventory-service call will be cancelled if the deadline expires

When a deadline expires mid-chain, all in-flight RPCs in the chain are cancelled. This prevents partial work from accumulating and resources from being consumed for requests that can no longer succeed. Always propagate the context — never create a fresh context for downstream calls.

Interceptors, Streaming, and Observability

Interceptors (middleware) wrap every RPC on client or server side:

  • Auth interceptor: Validate JWT from gRPC metadata on every incoming call.
  • Logging interceptor: Log method name, request size, response status, and latency.
  • Retry interceptor: Retry idempotent calls on transient errors with exponential backoff.
  • Metrics interceptor: Increment Prometheus counters and record latency histograms per method.

Streaming patterns: unary (one request, one response), server-streaming (one request, many responses — useful for large result sets), client-streaming (many requests, one response — useful for uploads), bidirectional streaming (many requests, many responses — useful for real-time chat or telemetry). The gateway must support all four; bidirectional streaming requires the gateway to proxy HTTP/2 frames without buffering the entire stream.

OpenTelemetry trace context is propagated via gRPC metadata headers (traceparent, tracestate). The OpenTelemetry gRPC instrumentation library injects and extracts these headers automatically, producing a complete distributed trace across all services in the call chain without manual instrumentation in business logic.

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering

Scroll to Top