Low Level Design: gRPC Service Design

⏱ 3 min read

gRPC is a high-performance RPC framework built on HTTP/2 and Protocol Buffers. It provides strongly-typed service contracts, efficient binary serialization, bidirectional streaming, and generated client/server code across multiple languages. gRPC is the standard protocol for internal microservice communication at companies like Google, Netflix, and Uber, replacing JSON/REST for service-to-service calls where performance and type safety matter.

Protocol Buffers and Code Generation

Define services and messages in .proto files. The protoc compiler generates server stubs and client code in Go, Java, Python, C++, and other languages. The generated code handles serialization/deserialization, HTTP/2 framing, and connection management — engineers implement only the business logic. A .proto service definition becomes a Go interface (server side) and a concrete client struct (client side). Type safety is enforced at compile time: calling a gRPC method with the wrong argument type fails compilation, unlike JSON/REST where type errors surface at runtime. Maintain .proto files in a shared repository and use a schema registry to version and distribute them.

HTTP/2 Multiplexing

gRPC runs over HTTP/2. HTTP/2 multiplexes multiple RPC calls over a single TCP connection. HTTP/1.1 REST requires one connection per concurrent request (or connection pooling with a fixed pool). HTTP/2 eliminates head-of-line blocking at the HTTP layer: a slow RPC does not block other RPCs on the same connection. A single HTTP/2 connection supports thousands of concurrent streams. This means gRPC services need fewer connections to databases and upstreams, reducing connection overhead. HTTP/2 also uses HPACK header compression — repeated headers (content-type: application/grpc) are compressed, reducing wire overhead.

Streaming RPC Patterns

gRPC supports four RPC patterns. Unary (request-response): client sends one message, server sends one message — the standard RPC pattern. Server streaming: client sends one request, server streams many responses — useful for live feeds, progress updates, or large result sets. Client streaming: client sends many messages, server sends one response — useful for uploading large datasets or aggregating many events into one result. Bidirectional streaming: both client and server send streams of messages simultaneously — useful for real-time chat, multiplayer game state sync, or long-lived subscriptions. Server and client streaming avoid the overhead of many separate unary RPCs for iterative protocols.

Interceptors (Middleware)

gRPC interceptors are middleware that wrap RPC handlers. Server-side interceptors: authentication (validate JWT or mTLS certificate before handling any RPC), authorization (check RBAC permissions for the requested method), logging (structured log for every RPC: method, duration, status, user_id), metrics (record latency and error rate per method), and panic recovery (catch panics in handlers and return a gRPC INTERNAL error instead of crashing). Client-side interceptors: retry (retry on transient errors with backoff), deadline propagation (pass the context deadline to downstream calls), and tracing (inject trace context into outgoing RPC metadata). Interceptors compose: chain multiple interceptors in order.

Deadlines and Cancellation

Every gRPC call should have a deadline — the absolute time by which the call must complete. Deadlines propagate: if the client sets a 500ms deadline, the server receives the deadline in the request context. The server passes the remaining deadline to any downstream gRPC calls it makes. If the client cancels (times out or is cancelled), the server context is cancelled and downstream calls are cancelled transitively. This prevents cascading waits where a cancelled upstream request keeps server resources occupied processing work that no one will consume. Set deadlines at the client call site, not as a global timeout — different calls have different latency requirements.

Error Handling and Status Codes

gRPC uses status codes instead of HTTP status codes: OK (0), CANCELLED (1), INVALID_ARGUMENT (3), NOT_FOUND (5), ALREADY_EXISTS (6), PERMISSION_DENIED (7), UNAUTHENTICATED (16), RESOURCE_EXHAUSTED (8), UNAVAILABLE (14), DEADLINE_EXCEEDED (4), INTERNAL (13). Map business logic errors to the correct status code: invalid user input → INVALID_ARGUMENT; missing resource → NOT_FOUND; concurrent modification conflict → ABORTED; database unavailable → UNAVAILABLE. Include error details in the status using google.rpc.Status with google.rpc.BadRequest for validation errors (field-level error messages). Retry only on UNAVAILABLE and DEADLINE_EXCEEDED — never retry on INVALID_ARGUMENT or PERMISSION_DENIED.

gRPC vs REST: When to Choose Each

Choose gRPC for: service-to-service (internal) communication where latency and throughput matter, polyglot environments needing generated type-safe clients, streaming requirements (live data, long-running operations), and when protobuf’s schema evolution is preferable to JSON’s flexibility. Choose REST/JSON for: public-facing APIs consumed by browsers and third parties (gRPC-Web adds complexity for browser clients), simple CRUD APIs where HTTP semantics (GET, POST, PUT, DELETE) map naturally, and teams already proficient in REST with no performance bottleneck. gRPC’s performance advantage (2-10x throughput, lower latency) matters at high RPS; for low-traffic services, the operational overhead of maintaining .proto files outweighs the benefit.