Low Level Design: gRPC Design Deep Dive

gRPC is Google open-source RPC framework that uses HTTP/2 for transport and Protocol Buffers for serialization. It provides strongly-typed service contracts, auto-generated clients in 12+ languages, bidirectional streaming, and built-in flow control. gRPC is the de facto standard for internal microservice communication at companies like Google, Netflix, Lyft, and Cloudflare. Understanding gRPC internals — the HTTP/2 multiplexing model, streaming modes, interceptors, and deadline propagation — is increasingly required for senior backend system design interviews.

gRPC Service Definition and Code Generation

Services are defined in .proto files using Protocol Buffer IDL. The protoc compiler generates: server stub (interface to implement), client stub (strongly-typed client), and message types. This eliminates hand-written HTTP clients, JSON parsing, and API documentation drift — the proto file is the single source of truth. Four gRPC communication modes: Unary RPC: one request, one response (standard function call). Server streaming: one request, stream of responses (subscribe to live events). Client streaming: stream of requests, one response (batch upload). Bidirectional streaming: both sides stream simultaneously (chat, real-time collaboration). Streaming uses HTTP/2 stream multiplexing — multiple logical streams over one TCP connection without head-of-line blocking between streams.

// Proto definition
syntax = "proto3";
service OrderService {
    rpc CreateOrder (CreateOrderRequest) returns (Order);               // unary
    rpc WatchOrderStatus (WatchRequest) returns (stream OrderStatus);  // server stream
    rpc BatchCreateOrders (stream CreateOrderRequest) returns (BatchResult); // client stream
    rpc OrderUpdates (stream OrderQuery) returns (stream OrderEvent);  // bidi stream
}

// Generated Go server implementation
type orderServer struct {
    pb.UnimplementedOrderServiceServer
    db *sql.DB
}

func (s *orderServer) CreateOrder(ctx context.Context, req *pb.CreateOrderRequest) (*pb.Order, error) {
    // ctx carries: deadline, cancellation, trace context, metadata (auth token)
    if err := ctx.Err(); err != nil { return nil, status.Error(codes.DeadlineExceeded, "timeout") }

    order, err := s.db.CreateOrder(ctx, req)
    if err != nil {
        return nil, status.Errorf(codes.Internal, "db error: %v", err)
    }
    return order, nil
}

func (s *orderServer) WatchOrderStatus(req *pb.WatchRequest, stream pb.OrderService_WatchOrderStatusServer) error {
    for {
        select {
        case <-stream.Context().Done(): return nil  // client disconnected
        case update := <-getOrderUpdates(req.OrderId):
            if err := stream.Send(update); err != nil { return err }
        }
    }
}

Deadlines, Metadata, and Interceptors

gRPC context carries deadlines and metadata (headers). Deadline propagation: when client A calls service B with a 500ms deadline, B automatically receives the deadline and propagates it to calls it makes to service C. If the deadline expires anywhere in the chain, all downstream calls are cancelled. This bounds total request latency across a service chain without complex timeout coordination. Metadata: key-value pairs sent with requests (like HTTP headers). Used for auth tokens (Authorization: Bearer jwt), trace IDs (X-Trace-ID), tenant IDs. Interceptors (middleware): unary and stream interceptors wrap all RPC calls on the server or client side. Server interceptor uses: authentication (extract and validate JWT from metadata), logging (log every RPC with duration and status code), rate limiting, distributed tracing (extract trace context from metadata and start a child span). Chain multiple interceptors for separation of concerns.

gRPC Load Balancing

gRPC uses HTTP/2 long-lived connections, which challenges standard L4 load balancers (a connection stays on one backend). Options: Client-side load balancing: the gRPC client maintains connections to multiple servers and applies a load balancing policy (round-robin, pick-first) across them. Service discovery (DNS, Consul, Kubernetes endpoints) provides the server list. The client distributes RPCs across connections. Proxy-based (L7) load balancing: Envoy, gRPC-aware Nginx, or Google Cloud Load Balancer understands gRPC HTTP/2 streams and distributes individual RPCs (not connections) across backends. Better for heterogeneous server pools (Envoy supports least-request policy — sends RPCs to the server with fewest in-flight requests). Lookaside load balancing: the client queries a dedicated load balancer service (the grpclb protocol or xDS) which returns the optimal backend — used by Google Traffic Director.

Key Interview Discussion Points

  • HTTP/2 multiplexing: multiple gRPC calls share one TCP connection via HTTP/2 streams; this eliminates the TCP connection overhead of REST/HTTP1.1 (one connection per request in non-keep-alive mode) but means one slow call does not affect others on the same connection
  • gRPC-Web: browsers cannot speak HTTP/2 directly to gRPC servers (Fetch API does not expose HTTP/2 framing); gRPC-Web uses a proxy (Envoy, grpcwebproxy) that translates between browser HTTP/1.1 and backend HTTP/2 gRPC
  • Status codes: gRPC uses its own status codes (OK, CANCELLED, NOT_FOUND, ALREADY_EXISTS, PERMISSION_DENIED, RESOURCE_EXHAUSTED, DEADLINE_EXCEEDED, INTERNAL) — map these semantically correctly so clients can implement correct retry logic
  • Retry policy: gRPC supports client-side retry with exponential backoff for retryable status codes (NOT_FOUND is not retryable; RESOURCE_EXHAUSTED and DEADLINE_EXCEEDED may be retryable); configure via service config JSON or grpc.ServiceConfig
  • Reflection: gRPC server reflection (grpc.reflection) exposes the service schema at runtime, enabling tools like grpcurl and gRPC UI to discover and invoke methods without the proto file — essential for debugging
Scroll to Top