gRPC Gateway Low-Level Design: Transcoding, Load Balancing, and Service Reflection

gRPC Fundamentals

gRPC uses HTTP/2 as its transport and Protocol Buffers as its serialization format. Compared to REST+JSON: strongly typed contracts enforced by the compiler, binary encoding is smaller and faster to parse, HTTP/2 multiplexing allows many concurrent streams over one TCP connection, and bidirectional streaming is a first-class feature.

Services and messages are defined in .proto files. Code generation produces type-safe client and server stubs in any supported language. Contract changes are governed by Protobuf's backward-compatibility rules: adding optional fields is safe, removing or renumbering fields breaks clients.

HTTP/JSON Transcoding

Web browsers cannot speak native gRPC (HTTP/2 trailers are not accessible via the Fetch API). grpc-gateway bridges this gap: it reads HTTP mapping annotations in the .proto file and generates a reverse proxy that transcodes HTTP/JSON requests to gRPC calls:

service OrderService {
  rpc GetOrder (GetOrderRequest) returns (Order) {
    option (google.api.http) = {
      get: "/api/v1/orders/{order_id}"
    };
  }
  rpc CreateOrder (CreateOrderRequest) returns (Order) {
    option (google.api.http) = {
      post: "/api/v1/orders"
      body: "*"
    };
  }
}

The generated gateway converts the HTTP GET request and path parameters into a GetOrderRequest Protobuf message, forwards it to the gRPC server, and converts the Protobuf response back to JSON. Clients use standard HTTP; the gateway handles the translation transparently.

Load Balancing Strategies

gRPC runs over HTTP/2, which multiplexes many RPCs over a single TCP connection. Standard L4 load balancers distribute at the connection level — once a client establishes a connection to one backend, all its RPCs go to that backend. This defeats the purpose of load balancing.

Client-side load balancing: The gRPC client resolves the service DNS name to all backend pod IPs (requires a headless Kubernetes service) and applies round-robin across them, opening a connection to each. Every RPC is dispatched independently. No proxy needed. Requires DNS that returns all pod IPs — standard Kubernetes ClusterIP services return only one virtual IP.
Proxy-side load balancing: Envoy or NGINX acts as an L7 proxy, maintaining upstream connections to all backends and distributing RPCs. Client connects to the proxy. Simpler client configuration, proxy handles health-checking and retries. Adds one network hop.

Service Discovery Integration

For client-side load balancing, DNS must return all backend pod IPs. Kubernetes headless service (clusterIP: None) does this — a DNS query returns an A record per pod. As pods scale, DNS reflects the change within the TTL window.

For dynamic environments, xDS API (used by Envoy) provides real-time service discovery. A control plane (Istio, custom xDS server) pushes endpoint updates to Envoy without DNS TTL delays. gRPC itself has a native xDS resolver that enables client-side load balancing with xDS — no Envoy sidecar required.

Server Reflection

gRPC server reflection allows clients to query a running server for its available services and method signatures without access to the original .proto files:

# List services on a running gRPC server:
grpcurl -plaintext localhost:50051 list

# Describe a service:
grpcurl -plaintext localhost:50051 describe OrderService

# Make a call without a .proto file:
grpcurl -plaintext -d '{"order_id":"ord_123"}' 
  localhost:50051 OrderService/GetOrder

Reflection enables tooling like grpcurl and Postman gRPC to work without .proto files. Disable reflection in production if you do not want to expose your API surface to unauthenticated callers — or protect the reflection service with auth middleware.

Deadline Propagation

gRPC deadlines are set by the client and propagate through the entire call chain. Every intermediate service must respect the remaining deadline:

// Client sets a 2-second deadline for the entire operation
ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
defer cancel()
resp, err := orderClient.GetOrder(ctx, &GetOrderRequest{OrderId: "ord_123"})
// If order-service calls inventory-service, ctx carries the remaining deadline
// inventory-service call will be cancelled if the deadline expires

When a deadline expires mid-chain, all in-flight RPCs in the chain are cancelled. This prevents partial work from accumulating and resources from being consumed for requests that can no longer succeed. Always propagate the context — never create a fresh context for downstream calls.

Interceptors, Streaming, and Observability

Interceptors (middleware) wrap every RPC on client or server side:

Auth interceptor: Validate JWT from gRPC metadata on every incoming call.
Logging interceptor: Log method name, request size, response status, and latency.
Retry interceptor: Retry idempotent calls on transient errors with exponential backoff.
Metrics interceptor: Increment Prometheus counters and record latency histograms per method.

Streaming patterns: unary (one request, one response), server-streaming (one request, many responses — useful for large result sets), client-streaming (many requests, one response — useful for uploads), bidirectional streaming (many requests, many responses — useful for real-time chat or telemetry). The gateway must support all four; bidirectional streaming requires the gateway to proxy HTTP/2 frames without buffering the entire stream.

OpenTelemetry trace context is propagated via gRPC metadata headers (traceparent, tracestate). The OpenTelemetry gRPC instrumentation library injects and extracts these headers automatically, producing a complete distributed trace across all services in the call chain without manual instrumentation in business logic.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does gRPC-gateway transcode HTTP/JSON to gRPC?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “gRPC-gateway reads google.api.http annotations in the proto file to generate a reverse-proxy that maps each HTTP method and path template to a gRPC method, deserializes the JSON request body into the corresponding protobuf message using jsonpb, and forwards it as a native gRPC call to the upstream service. The protobuf response is then serialized back to JSON before being written to the HTTP response.”
}
},
{
“@type”: “Question”,
“name”: “How does client-side gRPC load balancing differ from proxy-side?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Client-side load balancing resolves the service name to a list of backend addresses via a name resolver (DNS or service registry) and distributes RPCs directly across those addresses using a pick-first or round-robin policy in the client channel, eliminating a proxy hop. Proxy-side balancing centralizes the decision in a sidecar or gateway, which simplifies clients but adds latency and a single point of failure unless the proxy itself is highly available.”
}
},
{
“@type”: “Question”,
“name”: “How does deadline propagation work across gRPC service calls?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “gRPC transmits the remaining deadline as a grpc-timeout header on each outbound call, so each downstream service can compute its own context deadline and cancel work early if the budget is already exhausted before it starts. This prevents slow downstream services from wasting resources on requests whose upstream caller has already timed out and moved on.”
}
},
{
“@type”: “Question”,
“name”: “What is server reflection and how does it aid development?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Server reflection is a gRPC service (grpc.reflection.v1alpha.ServerReflection) that exposes a server's registered service descriptors at runtime, allowing tools like grpcurl and gRPC UI to discover available methods and their request/response schemas without needing the original proto files. This is invaluable for ad-hoc debugging and integration testing against live services.”
}
}
]
}