A barcode scanner service must handle multiple symbologies, noisy real-world images, fast mobile frame processing, and integration with product catalogs. Here is the complete low level design.
Supported Barcode Formats
The service should support the following symbologies at minimum:
- Code 128 — variable length, full ASCII, widely used in logistics and shipping labels.
- EAN-13 / EAN-8 — fixed length, numeric, standard for retail products in Europe and globally.
- UPC-A / UPC-E — fixed length, numeric, standard for retail products in North America.
- QR Code — 2D matrix, handles URLs, text, vCards. See LLD: QR Code Generation Service for encoding details.
- Data Matrix — 2D matrix, compact, used in pharmaceutical and electronics labeling.
- PDF417 — 2D stacked linear, used in driver licenses, boarding passes, shipping labels.
Image Preprocessing Pipeline
Raw camera images need cleaning before barcode detection can work reliably:
- Grayscale conversion — convert RGB input to grayscale. Reduces memory and compute; color is not needed for barcode detection.
- Noise reduction — apply a Gaussian blur (kernel 3×3 or 5×5) to suppress sensor noise without losing edge sharpness.
- Adaptive thresholding — binarize using a local threshold (Otsu or adaptive mean), handling uneven lighting across the image. Global thresholding fails on shadows or glare.
- Perspective correction — detect the barcode region, compute a homography matrix, and apply a warp transform to produce a front-facing rectangular view. Critical for skewed or angled captures.
- Contrast enhancement — apply CLAHE (Contrast Limited Adaptive Histogram Equalization) for low-contrast images, e.g., barcodes on matte black surfaces.
Barcode Region Detection
Before decoding, locate candidate barcode regions in the image:
- Gradient analysis — compute Scharr or Sobel gradients. 1D barcodes produce strong horizontal gradients with low vertical gradients in the bar region. Threshold and morphologically close the gradient map to find candidate rectangles.
- Contour analysis — find external contours in the binarized image, filter by aspect ratio and area. 1D barcodes are wide and short; QR codes are roughly square.
- Hough line transform — detect parallel line clusters. A group of closely spaced parallel lines is a strong barcode indicator. Use probabilistic Hough for speed.
- ML-based detection — for complex scenes, use a lightweight object detector (MobileNet SSD or YOLO-nano) trained on barcode regions. Higher accuracy, more compute.
Decoding Pipeline
Once a region is isolated, decode the barcode:
- Library decode — pass the region to ZXing (open source, multi-format) or ZBar. These implement format-specific state machines. ZXing handles all six formats above; ZBar is faster for 1D codes.
- ML decode fallback — for damaged or partially occluded barcodes, use a CRNN-based model trained to read barcode digit sequences from the image directly, bypassing the traditional decode path.
- Multi-angle retry — if decode fails, rotate the ROI by 90°, 180°, 270° and retry. Useful for handheld scans where orientation is unknown.
- Confidence score — assign a confidence value. For library decodes, derive confidence from checksum validity and quiet zone presence. For ML fallbacks, use softmax output.
Product Lookup Integration
After decoding, resolve the barcode value to product data. The lookup layer should: query an internal product catalog by barcode (EAN/UPC/Code128 as key), fall back to external databases (Open Food Facts, GS1 GEPIR) on cache miss, cache results in Redis with a 24-hour TTL, and return a structured product object (name, brand, category, image URL, attributes). Failures should return a partial result with the raw decoded value so the caller can handle the miss.
Bulk Scan Processing
For warehouse or inventory use cases, accept a batch of images via a bulk scan API. Each image is enqueued as a decode job. Workers process jobs in parallel (decode + lookup). Results are aggregated and returned as a structured list. Include per-image status (decoded, failed, low-confidence) so the caller knows which items need manual review. Expose a job status endpoint for polling and a webhook for completion notification.
Mobile SDK Design
The mobile SDK runs the preprocessing and detection on-device to minimize latency and network cost:
- Camera frame processing — process frames at 15-30fps using the camera preview stream. Drop frames when the decode queue is backed up rather than buffering.
- Region of interest (ROI) — display a targeting rectangle in the camera UI. Only process the ROI subregion, reducing compute by 60-80% vs full frame.
- On-device decode — run ZXing or a quantized ML model locally for instant results without a network round-trip. Fall back to server-side API for formats or damage levels the on-device model cannot handle.
- Vibration + audio feedback — trigger haptic and audio cue immediately on successful decode, before the product lookup completes. Improves perceived responsiveness.
Error Handling
Return confidence scores on every decode response. For scores below a configurable threshold (e.g., 0.85), flag the result as low-confidence and surface a manual entry fallback in the client UI. Log all low-confidence and failed decode attempts with the original image hash for model retraining data collection. Never silently return a wrong barcode value — a wrong lookup is worse than a failed one.
Frequently Asked Questions
What is a barcode scanner service in system design?
A barcode scanner service decodes one-dimensional (UPC, EAN, Code 128) or two-dimensional (QR, Data Matrix, PDF417) barcodes from images or video frames and returns the encoded payload. It may be deployed as a mobile SDK that runs on-device using the camera, as a server-side API that accepts uploaded images, or as an edge service embedded in warehouse or retail hardware. Key design concerns are decode accuracy, throughput, latency, and support for the relevant symbology set.
What image preprocessing steps improve barcode detection accuracy?
Common preprocessing steps include: (1) grayscale conversion to reduce the problem to intensity values; (2) adaptive thresholding or Otsu binarization to separate dark bars from light spaces under uneven lighting; (3) Gaussian blur to reduce sensor noise before edge detection; (4) perspective correction using detected quadrilateral contours to straighten skewed or angled codes; and (5) super-resolution upscaling for small or low-resolution codes. Each step is tuned per symbology — 1D codes benefit most from horizontal gradient filters, while 2D codes benefit from full affine-warp correction.
How do you design a mobile SDK for real-time barcode scanning?
The SDK captures camera frames at 30+ fps and processes them on a dedicated background thread to avoid blocking the UI. Each frame is downsampled to a processing resolution (e.g., 720p) and passed through the decode pipeline. The pipeline uses a fast region-of-interest detector (e.g., edge density heuristic) to skip frames with no barcode-like structure before running the full decoder. Successful decodes trigger a callback on the main thread with the payload and bounding-box coordinates. The SDK exposes configuration for symbology whitelist, scan region, and scan sound/haptic feedback. Battery and thermal impact are managed by throttling frame rate after sustained decoding activity.
How do you handle bulk barcode scanning for warehouse operations?
Warehouse bulk scanning uses fixed or handheld scanners that emit multiple scans per second. The service ingests scan events via a high-throughput message queue (Kafka or SQS), deduplicates events within a sliding time window using a Redis set keyed by barcode and location, and routes events to downstream inventory or order-management systems. Scan records are written to an append-only event log for audit purposes. The system must tolerate intermittent network connectivity in the warehouse floor — handheld devices buffer scans locally and flush when connectivity is restored — and must handle high concurrency when multiple workers scan the same bin simultaneously.
See also: Shopify Interview Guide
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture