Low-Level Design: Smart Home System — Device Management, Automation Rules, and Real-Time Control

Core Entities

Home: home_id, owner_id, name, address, timezone, created_at. Room: room_id, home_id, name, floor_number. Device: device_id, home_id, room_id, name, type (LIGHT, THERMOSTAT, LOCK, CAMERA, SENSOR, SWITCH, PLUG), manufacturer, model, firmware_version, status (ONLINE, OFFLINE, ERROR), last_seen_at, metadata (JSONB: capabilities, config). DeviceState: device_id, attribute (power, brightness, temperature, lock_state, motion_detected), value (JSON), recorded_at. (Time-series: one row per state change.) AutomationRule: rule_id, home_id, name, is_enabled, trigger (JSON: type, conditions), actions (JSON array: device_id, command, params), created_by. AutomationLog: log_id, rule_id, triggered_at, trigger_data (JSON), actions_taken (JSON), status (SUCCESS, PARTIAL, FAILED). Scene: scene_id, home_id, name, actions (JSON array: device commands to execute together), icon.

Device Communication and Protocol Gateway

IoT devices use various protocols: Z-Wave, Zigbee, Wi-Fi (MQTT, HTTP), Matter (new standard). A Protocol Gateway translates between these protocols and the platform’s internal MQTT broker. Architecture: devices connect to the gateway (local hub or cloud). Gateway publishes device events to MQTT topics: home/{home_id}/device/{device_id}/state. Commands are sent to: home/{home_id}/device/{device_id}/command. The backend subscribes to state topics and unsubscribes/publishes commands. MQTT quality of service: QoS 1 (at-least-once delivery) for device state updates. QoS 2 (exactly-once) for critical commands (lock/unlock). Device shadow: maintain the last known state of each device in Redis even when the device is offline (used to show current state in the UI without querying the device directly).

class DeviceService:
    def send_command(self, device_id: str, command: str,
                     params: dict, actor_id: int) -> CommandResult:
        device = self.db.get_device(device_id)
        if device.status == "OFFLINE":
            return CommandResult(success=False,
                                 error="Device is offline")

        # Validate command against device capabilities
        if not self._is_supported(device, command, params):
            return CommandResult(success=False,
                                 error=f"Command {command} not supported")

        # Publish to MQTT command topic
        payload = json.dumps({
            "command": command,
            "params": params,
            "request_id": str(uuid4()),
            "actor_id": actor_id,
            "timestamp": datetime.utcnow().isoformat()
        })
        self.mqtt.publish(
            topic=f"home/{device.home_id}/device/{device_id}/command",
            payload=payload,
            qos=2 if command in ("lock", "unlock", "alarm") else 1
        )

        # Wait for acknowledgment (with timeout)
        ack = self.ack_store.wait(device_id, timeout_ms=5000)
        return CommandResult(success=ack is not None,
                             error=None if ack else "Command timeout")

Automation Rule Engine

Automation rules: IF trigger_condition THEN execute_actions. Trigger types: Device state change: “when motion sensor detects motion”. Schedule: “at 7:00 AM on weekdays”. Sunrise/sunset: “30 minutes before sunset”. Geofence: “when owner leaves home”. Threshold: “when temperature drops below 65°F”. Rule evaluation: device state changes publish to Kafka. The Rule Engine consumer reads events and evaluates active rules for the home. For each rule: check if the trigger condition matches the event. If yes: execute all actions (send device commands). Scheduling: a cron-based scheduler fires schedule-triggered rules at the configured time. Conflict resolution: if multiple rules try to set conflicting device states (one turns lights off, another turns on), last-write-wins by default. Users can set rule priorities to control conflict resolution.

Real-Time Dashboard and WebSocket Updates

The mobile app/web dashboard shows real-time device states. Architecture: device state changes (from MQTT) → Kafka topic (device_states) → State consumer (updates Redis device shadow + DB) AND WebSocket broadcaster. WebSocket broadcaster: for each device state change, find all WebSocket sessions for users who have access to that home, push the update. Connection management: WebSocket connections are maintained per session. The broadcaster checks Redis for active sessions per home (SET: ws_sessions:{home_id} = {session_ids}). Session cleanup: on WebSocket disconnect, remove from the set. TTL on the set (extend on each message) prevents stale entries. For scale: WebSocket gateway is horizontally scaled; a Redis pub/sub channel (SUBSCRIBE home:{home_id}:updates) broadcasts state changes to all gateway instances, which forward to their connected sessions for that home.


{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How does MQTT enable scalable IoT device communication in a smart home?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”MQTT (Message Queuing Telemetry Transport) is a lightweight publish-subscribe protocol designed for constrained IoT devices (low bandwidth, unreliable networks). Devices publish state updates to topic channels (home/{home_id}/device/{device_id}/state). Subscribers (backend services, dashboards) receive updates without polling. QoS levels: QoS 0 (at-most-once, fire-and-forget) for non-critical sensor readings. QoS 1 (at-least-once) for device state changes (might deliver twice on reconnect, idempotent handlers needed). QoS 2 (exactly-once) for critical commands like lock/unlock or alarm activation. An MQTT broker (Mosquitto, HiveMQ, AWS IoT) manages connections and message routing. The device shadow in Redis caches the last known state, so the dashboard can display current state without the device being connected.”}},{“@type”:”Question”,”name”:”How do automation rules evaluate triggers at scale across many homes?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Naive approach: each device state change is evaluated against all rules for all homes. At 10M homes with 10 rules each: 100M rule evaluations per state change. Optimized: build a trigger index — for each (home_id, device_id, attribute) combination, maintain a list of rules that trigger on that combination. When a state change arrives for device D in home H: look up only the rules indexed for (H, D, changed_attribute). Evaluate only those rules (typically 1-5 per device per home). The trigger index is built at rule creation time and stored in Redis: SADD trigger:{home_id}:{device_id}:{attribute} rule_id. Rule evaluation is then O(matching_rules) per state change instead of O(all_rules). Schedule-triggered rules are handled separately by a cron-based scheduler that does not go through the device state change path.”}},{“@type”:”Question”,”name”:”How does geofencing work in a smart home system for presence detection?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Geofencing triggers automations when a user enters or leaves a geographic area (typically the home's address radius, e.g., 500m). Mobile app implementation: the app registers a geofence with the OS (iOS Core Location, Android Geofencing API). The OS monitors location in the background and fires an event when the user crosses the fence boundary. On exit event: app sends a webhook to the backend (or via MQTT). Backend processes the PRESENCE_CHANGE event and triggers any rules with "when owner leaves home" condition. Privacy: geofence events only include enter/exit state, not continuous location — the OS handles the location monitoring locally without streaming coordinates to the server. Multiple users per home: each user has their own presence state. Rules can be configured to trigger when "any member" or "all members" leave.”}},{“@type”:”Question”,”name”:”How do you handle device firmware OTA (over-the-air) updates safely?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”OTA update flow: (1) Firmware is uploaded to S3 and a new FirmwareRelease record is created (version, checksum, compatible_models, release_notes, is_staged). (2) Staged rollout: first release to 1% of compatible devices. Monitor error rates for 24 hours. Expand to 10%, 50%, 100% if no regressions. (3) Device update notification: publish a firmware_available MQTT message to the device. (4) Device downloads firmware from a pre-signed S3 URL and verifies SHA-256 checksum before applying. (5) Device reports update status (DOWNLOADING, INSTALLING, SUCCESS, FAILED) via MQTT. (6) On failure: device automatically rolls back to previous firmware. Backend sets device firmware_version on confirmed SUCCESS. Abort rollout: if error rate exceeds threshold during staged rollout, set release is_staged = false to stop new devices from receiving it.”}},{“@type”:”Question”,”name”:”How do you prevent unauthorized access to smart home device controls?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Authorization layers: (1) Authentication: all API requests require a valid OAuth JWT token. All MQTT connections authenticate with per-device credentials (X.509 certificates or rotating tokens). (2) Home-level authorization: the backend verifies that the requesting user is a member of the home the device belongs to before processing any command. Membership is cached in Redis (TTL=60s, invalidated on member changes). (3) Role-based access in homes: homeowners can add family members with OWNER or MEMBER roles. MEMBER role cannot: add/remove other members, delete the home, or access security devices (cameras, locks) unless explicitly granted. (4) Device command audit log: every command is logged (who, what device, what command, timestamp) for forensic review. (5) Rate limiting on commands: max N commands per user per minute to prevent bulk API abuse.”}}]}

See also: Apple Interview Prep

See also: Atlassian Interview Prep

See also: Databricks Interview Prep

Scroll to Top