IoT system design is increasingly common at Amazon (AWS IoT), Google (Cloud IoT), Tesla, Nest, and any company building connected hardware. The core challenges are managing millions of persistent device connections, ingesting high-frequency telemetry data, and pushing commands and firmware updates reliably to constrained devices.
What Makes IoT Different
- Device constraints: microcontrollers with 256KB RAM, 2G/3G connectivity, intermittent connections, battery-powered
- Volume: 10 million devices each sending telemetry every 30 seconds = 333,000 messages/second
- Bidirectional: devices send data up and receive commands down (unlike web APIs where clients always initiate)
- Reliability: devices must not lose important commands (firmware updates, safety alerts) even if temporarily offline
- Security: devices in the field cannot be easily patched — security must be robust from day one
Architecture Overview
[IoT Devices] ←→ MQTT ←→ [MQTT Broker Cluster]
↓
[Message Router / Rules Engine]
/ |
[Telemetry [Command [Alert
Ingestion] Handler] Engine]
↓ ↓ ↓
[Kafka Stream] [Device DB] [PagerDuty/
↓ (Cassandra) Slack]
[Time-Series DB]
(InfluxDB/TimescaleDB)
↓
[Analytics / Dashboard]
MQTT Protocol
MQTT (Message Queuing Telemetry Transport) is the standard protocol for IoT. It uses a publish-subscribe model over persistent TCP connections, with very low overhead (2-byte header minimum).
Key MQTT concepts:
Topics: hierarchical namespace for messages
devices/{device_id}/telemetry -- device sends sensor data
devices/{device_id}/status -- device sends heartbeat
devices/{device_id}/commands -- server sends commands to device
devices/{device_id}/firmware -- server pushes OTA updates
Quality of Service (QoS) levels:
QoS 0: at-most-once (fire and forget) — no acknowledgment
Use for telemetry where occasional loss is acceptable
QoS 1: at-least-once — broker acknowledges; client retries until ack
Use for commands and alerts
QoS 2: exactly-once — 4-way handshake; guaranteed once delivery
Use for billing events, firmware updates
Retained messages:
Broker stores last message on a topic for new subscribers
Device connecting after outage immediately gets the latest command
Last Will and Testament (LWT):
Device registers a "will" message on connect
If device disconnects unexpectedly, broker publishes the will
Use for device_status: {status: "offline", timestamp: ...}
MQTT Broker at Scale
A single MQTT broker (Eclipse Mosquitto) handles ~100,000 connections. At 10M devices, you need a clustered broker:
Options:
1. EMQ X (EMQX): open-source, handles 10M connections on a cluster
2. AWS IoT Core: managed service, scales automatically, ~$0.08/device/month
3. HiveMQ: enterprise MQTT broker with clustering
Sharding connections across brokers:
- Route devices by device_id hash to a specific broker node
- Brokers share subscriptions and route cross-broker messages via internal Kafka
- Device always reconnects to same broker node (sticky routing via DNS/LB)
Connection state (session state):
- Which topics device is subscribed to
- Offline messages queued while device was disconnected
- QoS 1/2 message acknowledgment state
- Store in Redis (shared across broker cluster) or Cassandra
Telemetry Ingestion Pipeline
MQTT Broker → Kafka "telemetry" topic → [Stream Processors]
↓
[Time-Series Database (InfluxDB)]
measurements:
temperature{device=A} 72.3 t=1234
humidity{device=A} 58.0 t=1234
↓
[Rollup Job (Spark/Flink)]
raw: every 30s
1-min avg: rolling window
1-hr avg: for dashboards
24-hr avg: for long-term storage
Kafka partitioning: by device_id — all messages from one device
land on the same partition, preserving ordering
Throughput: 333,000 messages/s × 500 bytes = 167 MB/s → Kafka handles this easily
Time-Series Database Choice
| Database | Write Throughput | Retention | Use Case |
|---|---|---|---|
| InfluxDB | ~1M points/s | Automatic (configurable) | IoT telemetry, metrics |
| TimescaleDB | High (Postgres extension) | Manual or policies | SQL queries on time series |
| Cassandra | Very high | Manual TTL | Wide column, device state history |
| Amazon Timestream | High (managed) | Automatic tiers | Serverless, AWS integration |
Device Management
-- Device registry (PostgreSQL)
CREATE TABLE devices (
device_id VARCHAR(64) PRIMARY KEY,
owner_user_id BIGINT,
device_type VARCHAR(50), -- sensor, camera, thermostat
firmware_version VARCHAR(20),
last_seen_at TIMESTAMP,
status ENUM("active","offline","deprovisioned"),
metadata JSONB -- flexible per-device attributes
);
-- Device shadow / digital twin (Redis or DynamoDB)
-- Stores desired state + reported state for each device
{
"device_id": "dev_12345",
"desired": {
"target_temperature": 72,
"mode": "cooling"
},
"reported": {
"target_temperature": 68,
"mode": "heating",
"firmware": "2.4.1"
},
"delta": {
"target_temperature": 72, // delta = desired - reported
"mode": "cooling"
},
"last_updated": "2026-01-15T12:00:00Z"
}
-- Device connects: receives delta and applies it
-- After applying: reports new state, delta clears
Over-the-Air (OTA) Firmware Updates
OTA Update Flow:
1. Engineer uploads new firmware binary to S3
2. Creates update campaign: target all devices running firmware < 2.4.1
3. Campaign service queries device registry for matching devices
4. Sends firmware update command in batches:
- Batch 1: 1% of fleet (canary) — monitor error rate for 24h
- Batch 2: 10% — monitor
- Batch 3: 100% — full rollout
Device-side update flow:
1. Device receives command: {action: "update", url: "s3://firmware/v2.5.0.bin", checksum: "sha256:..."}
2. Device downloads firmware (chunked, resumable)
3. Verifies checksum
4. Writes to secondary flash partition
5. Reboots into new firmware
6. If boot fails 3 times, reverts to previous firmware (fail-safe)
7. Reports new firmware version to broker
Security
- Device authentication: X.509 client certificates (per-device unique certificate) stored in secure element on hardware
- Transport security: MQTT over TLS 1.3 (port 8883)
- Message authorization: device can only publish to its own topic (
devices/{its_own_id}/*) - Command signing: commands signed with server private key; device verifies with embedded server public key
- Certificate rotation: automate certificate renewal before expiry via OTA
Interview Discussion Points
- How do you handle millions of devices reconnecting simultaneously after a regional outage? Rate-limit reconnections (exponential backoff with jitter on device side); pre-scale broker cluster before expected reconnection wave; prioritize critical devices
- What is the device shadow pattern? Decouple desired state (what you want) from reported state (what the device says it has) — allows commands to be queued even when device is offline; device syncs on reconnect
- How do you detect anomalous device behavior? ML model on telemetry stream (Spark Streaming or Flink): statistical deviation from device historical baseline, sudden value spikes, absence of expected heartbeats
Frequently Asked Questions
How does MQTT differ from HTTP for IoT device communication?
MQTT is a publish/subscribe protocol designed for constrained devices and unreliable networks. Key differences from HTTP: MQTT maintains a persistent TCP connection (no per-message handshake overhead), uses a lightweight binary framing (fixed 2-byte header vs HTTP headers that can exceed 1KB), and supports QoS levels (0=fire-and-forget, 1=at-least-once, 2=exactly-once). HTTP is request/response — the server cannot push data to the device without long-polling. MQTT brokers push messages instantly when published. For a device sending temperature every 30 seconds, MQTT uses roughly 1/10th the bandwidth of HTTP. MQTT also supports Last Will and Testament (LWT): the broker publishes a configured message on behalf of a device when it disconnects unexpectedly, enabling automatic offline detection.
How do you handle OTA firmware updates for millions of IoT devices?
OTA (Over-The-Air) updates require a staged rollout to avoid bricking millions of devices simultaneously. The architecture: (1) Firmware binary is uploaded to object storage (S3) and signed with a private key. (2) An update campaign targets a device group — first 0.1% of devices, then 1%, 10%, 100%, with automatic rollback if error rates exceed a threshold. (3) The device shadow or twin stores desired_firmware_version. When a device connects, it compares its current version to the desired version and downloads the delta patch (not the full binary) from a pre-signed URL. (4) The device verifies the binary signature before applying, then reboots and reports its new version. (5) A watchdog timer detects failed boots and rolls back to the previous firmware automatically.
What is the device shadow pattern and why is it used?
A device shadow (also called a device twin) is a JSON document stored in the cloud that represents the last known and desired state of a physical device. It has two sections: reported (what the device last told the cloud its state is) and desired (what the cloud wants the device to do). The pattern solves the problem of intermittent connectivity: if a device is offline when a command is sent, the command is persisted in the desired state. When the device reconnects, it reads the delta between desired and reported, executes the command, then updates reported. This decouples cloud applications from device connectivity. AWS IoT, Azure IoT Hub, and Google Cloud IoT all implement this pattern. The shadow is also the authoritative state for dashboards and automation rules — they read from the shadow rather than polling devices directly.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does MQTT differ from HTTP for IoT device communication?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “MQTT is a publish/subscribe protocol designed for constrained devices and unreliable networks. Key differences from HTTP: MQTT maintains a persistent TCP connection (no per-message handshake overhead), uses a lightweight binary framing (fixed 2-byte header vs HTTP headers that can exceed 1KB), and supports QoS levels (0=fire-and-forget, 1=at-least-once, 2=exactly-once). HTTP is request/response — the server cannot push data to the device without long-polling. MQTT brokers push messages instantly when published. For a device sending temperature every 30 seconds, MQTT uses roughly 1/10th the bandwidth of HTTP. MQTT also supports Last Will and Testament (LWT): the broker publishes a configured message on behalf of a device when it disconnects unexpectedly, enabling automatic offline detection.”
}
},
{
“@type”: “Question”,
“name”: “How do you handle OTA firmware updates for millions of IoT devices?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “OTA (Over-The-Air) updates require a staged rollout to avoid bricking millions of devices simultaneously. The architecture: (1) Firmware binary is uploaded to object storage (S3) and signed with a private key. (2) An update campaign targets a device group — first 0.1% of devices, then 1%, 10%, 100%, with automatic rollback if error rates exceed a threshold. (3) The device shadow or twin stores desired_firmware_version. When a device connects, it compares its current version to the desired version and downloads the delta patch (not the full binary) from a pre-signed URL. (4) The device verifies the binary signature before applying, then reboots and reports its new version. (5) A watchdog timer detects failed boots and rolls back to the previous firmware automatically.”
}
},
{
“@type”: “Question”,
“name”: “What is the device shadow pattern and why is it used?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A device shadow (also called a device twin) is a JSON document stored in the cloud that represents the last known and desired state of a physical device. It has two sections: reported (what the device last told the cloud its state is) and desired (what the cloud wants the device to do). The pattern solves the problem of intermittent connectivity: if a device is offline when a command is sent, the command is persisted in the desired state. When the device reconnects, it reads the delta between desired and reported, executes the command, then updates reported. This decouples cloud applications from device connectivity. AWS IoT, Azure IoT Hub, and Google Cloud IoT all implement this pattern. The shadow is also the authoritative state for dashboards and automation rules — they read from the shadow rather than polling devices directly.”
}
}
]
}