Design Mobile AR Experience: Tracking, Rendering, and Battery

⏱ 2 min read

Mobile AR is one of the most demanding mobile system design topics — combining computer vision, real-time rendering, sensor fusion, and the constraints of a battery-powered handheld device. Apps like Pokemon Go, IKEA Place, Snap filters, and Apple Measure all rely on the same underlying problems.

Functional requirements

Detect the floor / horizontal surfaces
Place virtual objects in the real world
Maintain object position as the user moves
Handle occlusion (objects in front of/behind real items)
Render at 60fps over the camera feed

The AR stack

Three layers: tracking, rendering, understanding.

Tracking

The system continuously estimates the device’s 6-DoF pose (position + orientation) in the real world. Sources of data:

Camera frames (visual SLAM — Simultaneous Localization and Mapping)
IMU (accelerometer + gyroscope) at high frequency
LiDAR depth sensor (newer iPhones, iPads)

Visual-inertial fusion produces a stable pose at 60Hz. ARKit (iOS) and ARCore (Android) handle this — you do not need to implement SLAM yourself.

Surface detection

The system identifies horizontal and vertical planes by clustering feature points. As the user moves, plane estimates improve. The first second of an AR session is “scanning” — surfaces emerge as the user looks around.

World map

The accumulated understanding of the environment can be saved and reloaded. ARKit’s ARWorldMap and ARCore’s Cloud Anchors enable persistent AR — same virtual object in the same physical place across sessions.

Rendering

Use Metal (iOS) or OpenGL ES / Vulkan (Android). Render the virtual content over the camera feed. Render-frame budget: 16.67ms total at 60fps, of which:

Camera frame fetch: 1–2ms
Pose update: 1–2ms
Scene graph update: 2–4ms
Render: 8–10ms

Use lightweight 3D models (PBR textures, optimized polygon counts).

Occlusion

Virtual objects must appear behind real objects when appropriate. Two approaches:

Depth from LiDAR: per-pixel depth map; render virtual objects with proper z-ordering
Depth estimation from monocular camera: ML model produces a depth estimate; less accurate but works on all devices

Lighting

Match real-world lighting for realism. ARKit/ARCore expose ambient light estimates. Apply to virtual objects so they look “in” the environment, not pasted on top.

Battery

AR is one of the most power-hungry mobile use cases:

Camera continuously on
GPU continuously rendering
CPU running tracking algorithms
Devices can overheat in 15–30 minutes

Mitigations:

Lower rendering resolution when battery low
Throttle camera to 30fps for non-fast-moving experiences
Disable AR when not in foreground
Warn user on extended sessions

Frequently Asked Questions

Why does AR drift after I have been using it for a few minutes?

SLAM accumulates error over time. The system anchors to features in the environment; if features become rare or change (someone walks through frame), pose estimates drift. ARKit ARAnchors help mitigate this for placed objects.

How does Pokemon Go anchor a creature to a specific real-world location?

Geographically (lat/lon) at coarse scale. AR placement within a session uses ARKit/ARCore. Persistent anchors across sessions use cloud-anchored shared maps (Niantic’s VPS) or fall back to GPS-only.

Is AR ready for all-day use?

Not yet. Battery and thermal constraints limit sustained AR sessions to 30–60 minutes on most phones. Apple Vision Pro and similar headsets shift the constraint to head-worn devices but bring their own.