Requirements and Constraints
A device registry is the source of truth for IoT device identity, credentials, configuration, and metadata. It handles secure device provisioning, stores and rotates certificates, enforces per-device access policies, and supports fleet management queries (find all devices of a type in a region, batch-push configuration updates). Functional requirements: certificate-based device authentication, zero-touch provisioning flow, metadata CRUD, shadow/desired-state storage, and fleet query support. Non-functional: authenticate 200,000 concurrent device connections, serve metadata lookups under 5ms (P99), and support fleets of 10 million devices per tenant.
Core Data Model
- devices(device_id UUID PK, tenant_id FK, device_type, serial_number UNIQUE, status ENUM('provisioning','active','suspended','decommissioned'), registered_at, last_seen_at, firmware_version, hardware_revision)
- device_certificates(cert_id PK, device_id FK, fingerprint UNIQUE, pem TEXT, issued_at, expires_at, revoked_at, is_primary BOOL)
- device_metadata(device_id FK, key VARCHAR, value TEXT, updated_at) — flexible key-value for extensible attributes
- device_shadow(device_id PK, reported JSONB, desired JSONB, version INT, last_updated_at) — desired vs. reported state
- provisioning_claims(claim_token PK, device_type, tenant_id, max_uses, uses_count, expires_at, created_by)
- fleet_groups(group_id PK, tenant_id, name, filter_expr TEXT) — saved fleet queries as filter expressions
Provisioning Flow
Zero-touch provisioning removes the need for manual credential installation. The flow:
- The manufacturer burns a unique device serial number and a factory certificate into device hardware at production time. The factory root CA is pre-registered with the registry.
- On first boot, the device connects to a bootstrap endpoint using its factory certificate and presents its serial number.
- The registry validates the factory certificate chain, checks the serial against a pre-loaded device batch manifest (uploaded by the operator), and creates a device record in 'provisioning' status.
- The registry generates a unique operational certificate (or a signed CSR from the device), stores it in device_certificates, and returns it to the device.
- The device stores the operational certificate and reconnects to the main endpoint. Status transitions to 'active'.
Provisioning claims support a simpler flow for devices without factory certificates: a claim token is shared with the installer; the device presents the token plus a device-generated public key; the registry signs and returns a certificate.
Certificate-Based Authentication
The authentication path is performance-critical and must not touch the database on every connection. Device certificates are validated in three steps: TLS handshake (mTLS) validates the certificate chain against tenant root CAs stored in memory; certificate fingerprint lookup hits a Redis cache (TTL equal to certificate validity period) to check revocation status; on cache miss, fall through to the database and populate cache. This reduces per-connection database load to near zero for healthy, long-lived certificates.
Certificate rotation is triggered by the registry when certificates approach expiry (configurable, default 30 days before expiry). The registry issues a new certificate and updates is_primary; the old certificate remains valid during a overlap window to allow in-flight reconnections.
Device Shadow
The shadow pattern separates desired state (what the operator wants) from reported state (what the device last confirmed). The operator writes to the desired JSONB field. When the device connects, it receives the delta (desired minus reported). The device applies changes and writes back to reported. Version numbers enable optimistic concurrency: a device update fails if the version in the request does not match the current version, forcing a re-fetch.
Shadow writes are propagated to connected devices via a pub/sub channel (e.g., Redis pub/sub or an MQTT retained message on a device-specific topic). This avoids polling and delivers configuration changes within seconds of an operator update.
Fleet Management Queries
Fleet queries filter devices by metadata attributes, status, firmware version, last_seen window, and device_type. The device_metadata table has a GIN index on value and a btree index on key. For large tenants (millions of devices), fleet queries use partition pruning on tenant_id (partitioned table) and cover indexes on common filter combinations (tenant_id, device_type, status). Saved fleet groups store a filter expression evaluated at query time — the group membership is dynamic, not materialized, so no sync lag.
Bulk operations (firmware push to a fleet group, batch suspend) are implemented as background jobs that page through filtered device IDs and enqueue per-device tasks to a work queue, respecting rate limits to avoid overwhelming device connectivity infrastructure.
Scalability Considerations
- Authentication hot path: All certificate lookups served from Redis; the database is only consulted for new or recently revoked certificates.
- Metadata scalability: For tenants with millions of devices, the device_metadata key-value design avoids wide-table migration costs as new attribute types are added. Use a composite index on (tenant_id, key, value) to support fleet filter queries.
- Cross-region replication: Device registry is globally replicated (active-active or active-passive per region). Authentication must succeed locally; shadow writes propagate async to other regions with conflict resolution via last-write-wins on version number.
API Design
POST /devices/provision— bootstrap provisioning; accepts factory cert + serial, returns operational certGET /devices/{id}— device record including status, metadata, firmware versionPUT /devices/{id}/shadow/desired— update desired state; triggers delta delivery to devicePOST /devices/{id}/shadow/reported— device reports current state; validates versionPOST /certificates/{id}/revoke— immediately invalidates certificate, purges cache entryPOST /fleet-groups/{id}/jobs— submit a bulk operation (firmware update, config push) to a fleet groupGET /fleet-groups/{id}/devices— paginated list of devices matching the group filter
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does zero-touch provisioning work with X.509 certificates?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “At manufacture, each device receives a unique device certificate signed by a manufacturer CA, along with the CA chain burned into firmware. On first boot, the device connects to a provisioning endpoint, presents its certificate, and the registry validates the chain against a trusted root store. Upon success, the registry issues a fleet-level operational certificate, registers the device record (serial, model, firmware version, owner), and returns bootstrap configuration. No human intervention is required; the device transitions from unregistered to operational autonomously.”
}
},
{
“@type”: “Question”,
“name”: “What is the device shadow pattern and how is it stored?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A device shadow is a JSON document with two top-level keys: desired (the operator's intended state, e.g., firmware version, config) and reported (the device's last known actual state). The registry stores shadows in a document store (DynamoDB or MongoDB) with optimistic locking via a version counter. When a device connects, it receives a delta document containing only the fields where desired differs from reported. The device applies changes and publishes an updated reported state, which the registry merges and clears from the delta.”
}
},
{
“@type”: “Question”,
“name”: “How does mTLS authentication work in a device registry?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Mutual TLS requires both the server and the device to present certificates during the TLS handshake. The registry's load balancer is configured to request a client certificate and passes the verified certificate's Common Name (device ID) and fingerprint as HTTP headers to the backend. The backend looks up the device record by device ID, verifies the certificate fingerprint matches the registered value, and rejects connections from revoked certificates checked against a CRL or OCSP responder. This eliminates shared-secret credential management at scale.”
}
},
{
“@type”: “Question”,
“name”: “How are dynamic fleet group queries implemented in a device registry?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Fleet groups are defined as saved queries over device attributes (model, firmware_version, region, tags) rather than static membership lists. At query time, the registry evaluates the predicate against an index (Elasticsearch or a relational DB with JSONB indexing) and returns matching device IDs. For bulk operations (OTA rollout, config push), the query result is paginated and enqueued as a batch job. Dynamic groups automatically include newly provisioned devices that match the predicate without requiring manual group management, enabling percentage-based canary rollouts.”
}
}
]
}
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Atlassian Interview Guide