Question 1

When should you use Protocol Buffers instead of JSON?

Accepted Answer

Use Protocol Buffers (Protobuf) for internal service-to-service communication, especially with gRPC. Protobuf is 3-10x smaller and 5-20x faster to serialize than JSON because it uses binary encoding and replaces field names with numeric tags. Use JSON for public APIs (universal browser support, human-readable, no tooling requirements for consumers), configuration files, and any interface where developer experience matters more than wire efficiency. Protobuf requires a .proto schema file shared between producer and consumer, and a code generation step. This adds tooling complexity but provides type safety, auto-completion, and schema evolution guarantees. In practice: a microservices backend uses Protobuf/gRPC between services for efficiency, and exposes a JSON REST API to the frontend for compatibility. The API gateway translates between the two formats.

Question 2

What is schema evolution and why does it matter for microservices?

Accepted Answer

Schema evolution is the ability to change data formats without breaking existing producers or consumers. In microservices, services are deployed independently -- service A may be updated to produce data in a new format before service B is updated to consume it. Without schema evolution, every format change requires coordinated deployments across all services. Backward compatibility: new consumers can read old data. Forward compatibility: old consumers can read new data. Protobuf supports both: adding a new field with a default is backward and forward compatible. Old consumers ignore unknown fields; new consumers use the default for missing fields. Removing a field is forward compatible (old consumers just see the value missing). Avro uses a schema registry that validates compatibility before allowing schema changes. Safe operations: add field with default, remove field, rename field (Protobuf uses numeric tags not names). Unsafe: change field type, reuse a deleted field number.

Question 3

What is Apache Avro and why is it preferred for Kafka?

Accepted Answer

Avro is a binary serialization format that stores the writer schema alongside the data. Unlike Protobuf (which embeds field tags in each message), Avro messages do not contain field identifiers -- the schema is needed to decode them. This makes Avro messages slightly more compact but requires schema access at read time. Avro is preferred for Kafka because: (1) The Confluent Schema Registry integrates natively with Avro, storing schemas centrally and enforcing compatibility rules (backward, forward, full) before allowing schema changes. This prevents a producer from deploying a breaking schema change. (2) Avro supports schema evolution without code generation -- tools like Apache Spark can read Avro data with any schema version dynamically. (3) The writer schema is embedded in Avro files, making data in a data lake self-describing (you can decode files years later without external schema knowledge). Use Avro for Kafka event streaming and data lake storage. Use Protobuf for gRPC services and performance-critical APIs.

Question 4

How do you choose the right serialization format for a system design interview?

Accepted Answer

Match the format to the use case: Public REST API consumed by browsers -- JSON. Universal support, human-readable, no client tooling needed. Internal gRPC services -- Protocol Buffers. Compact, fast, type-safe, built into gRPC. Kafka event streaming -- Avro with Schema Registry. Schema evolution, self-describing data, native Kafka integration. Configuration files -- JSON or YAML. Human-readable and editable. Ultra-low-latency systems (gaming, trading) -- FlatBuffers or Cap n Proto. Zero-copy deserialization, no parsing step. Mobile with bandwidth constraints -- Protobuf or MessagePack (JSON-compatible binary, 1.5-2x smaller than JSON). In the interview, state your choice and justify: We use Protobuf for gRPC between services because it is 5x faster and 3x smaller than JSON, and JSON for our public API because external developers expect it. This demonstrates understanding of the tradeoffs, not just knowledge of the formats.

System Design: Data Serialization — JSON, Protocol Buffers, Avro, Thrift, MessagePack, Schema Evolution

JSON: The Universal Format

Protocol Buffers (Protobuf)

Schema Evolution and Backward Compatibility

Apache Avro

Choosing the Right Serialization Format