Design Mobile Crash Reporting and Telemetry

Q: Why do some crashes never appear in the dashboard?

Crashes during app startup may not have a chance to write to disk. Watchdog kills (long main thread blocks) are not signal-handler-detectable on iOS. Both leave gaps.

Q: How long does symbolication take?

Sub-second to a few seconds typically. If symbol files are large or missing, can take longer or fail.

Q: How does crash reporting differ from APM?

Crash reporting captures fatal events. APM (DataDog, NewRelic) captures performance and traces. Increasingly the same tools cover both, but they began as separate disciplines.

⏱ 2 min read

Crash reporting (Crashlytics, Sentry, Bugsnag) is a system design topic that touches mobile SDK design, server-side processing of millions of events, symbolication of native code, and the dashboards engineers actually use to debug. The interview tests whether you understand the layers and the engineering tradeoffs.

Functional requirements

Capture crashes on the device
Capture non-fatal errors and exceptions
Capture custom telemetry (breadcrumbs, custom events)
Send to backend
Dedupe similar crashes; group by signature
Symbolicate (turn raw stack frames into source-level frames)
Surface in a dashboard

SDK design

The mobile SDK installs handlers for:

Uncaught exceptions (Java/Kotlin runtime exceptions on Android, NSException on iOS)
Native crashes (signal handlers — SIGSEGV, SIGABRT, etc.)
ANRs (Application Not Responding) on Android
Watchdog terminations on iOS (more difficult to capture)

SDK constraints:

Tiny binary impact (target <500KB)
Zero startup time impact
Robust to its own crashes (the crash reporter must not crash)

Capture mechanics

When a crash occurs:

SDK signal handler runs in a constrained context (no malloc, limited APIs)
Captures crash metadata: stack frames, registers, threads, current breadcrumbs
Writes to local disk (cannot send network from a crash handler)
Process terminates
On next app launch, SDK detects pending crash report and uploads

Symbolication

Native crash stacks are addresses, not function names:

0x100012345
0x100023456

Symbolication maps addresses to source-level frames using debug symbols (dSYMs on iOS, mapping files on Android).

Symbolication happens server-side after upload. Symbol files are uploaded by the build pipeline; the server cross-references on incoming crashes.

Dedup and grouping

Many crashes are the same bug from many users. Group by:

Top 3–5 stack frames (after stripping non-app frames)
Exception type
App version

Each group is one issue in the dashboard. Users count is the impact metric.

Server architecture

Three pipelines:

Ingest: high-volume HTTP endpoint, queue events
Process: symbolicate, group, persist
Serve: dashboard queries

Volume scales fast: 1B events/day at the larger crash reporters. Architecture similar to Sentry, Datadog ingest pipelines.

Storage

Hot store: recent events, dashboard-queryable (ClickHouse, Druid)
Cold store: archived events for forensic analysis (S3 + Parquet)
Metadata DB: issue metadata, user-issue mapping (Postgres)

The dashboard

Engineers want:

Sort issues by impact (users affected, occurrences)
Filter by app version, OS, device
Drill into a single occurrence with breadcrumbs
Mark issue as resolved; track regression detection

Privacy

Strip PII from breadcrumbs and custom data automatically
Allow opt-out per user
Honor regional regulations (GDPR, CCPA)
Never log auth tokens, passwords, or sensitive customer data

Frequently Asked Questions

Why do some crashes never appear in the dashboard?

Crashes during app startup may not have a chance to write to disk. Watchdog kills (long main thread blocks) are not signal-handler-detectable on iOS. Both leave gaps.

How long does symbolication take?

Sub-second to a few seconds typically. If symbol files are large or missing, can take longer or fail.

How does crash reporting differ from APM?

Crash reporting captures fatal events. APM (DataDog, NewRelic) captures performance and traces. Increasingly the same tools cover both, but they began as separate disciplines.