Translation Service Low-Level Design: String Extraction, TM Lookup, and Human Review Queue

Translation Service: Low-Level Design

A translation service manages the full lifecycle of localizable strings: extraction from source code and content, lookup against a translation memory (TM), machine translation (MT) fallback, and routing unconfident segments to a human post-edit review queue. It serves both engineering teams shipping new features and localization project managers coordinating translator workflows.

Requirements

Functional

  • Extract translatable strings from source files (JSON, YAML, PO, XLIFF, HTML) via a CI pipeline hook
  • Look up existing approved translations from a TM for exact and fuzzy matches
  • Fall back to an MT provider (DeepL, Google Translate) when TM coverage is below a threshold
  • Route low-confidence MT segments to a human review queue with context (screenshot, surrounding strings)
  • Serve translated strings to client applications via a CDN-backed edge API
  • Export locale bundles in JSON, PO, and XLIFF formats

Non-Functional

  • String serve latency: under 20 ms p99 from CDN edge
  • TM lookup latency: under 50 ms p99 for fuzzy match across 10 million segments
  • Support 100+ target locales per project

Data Model

  • projects: project_id (UUID), name, source_locale, target_locales (ARRAY), created_at
  • strings: string_id (UUID), project_id, key (TEXT), source_text (TEXT), context (TEXT), max_length (INT), screenshot_url (TEXT), fingerprint (SHA256 of source_text), created_at
  • translation_memory: tm_id, source_fingerprint (SHA256), target_locale, translated_text (TEXT), quality_score (FLOAT 0-1), approved (BOOL), translator_id, updated_at
  • review_tasks: task_id, string_id, target_locale, mt_output (TEXT), mt_confidence (FLOAT), assigned_to, status (ENUM: pending, in_review, approved, rejected), due_at
  • locale_bundles: project_id, target_locale, version (INT), bundle_url (S3 pre-signed), generated_at

Core Algorithms

String Extraction

The extractor runs as a CI step. It parses source files using format-specific parsers, computes a SHA256 fingerprint of each source string, and diffs against the current strings table. New strings are inserted; modified strings (fingerprint changed) create a new string_id and deprecate the old one to preserve TM history. Deleted keys are soft-deleted and excluded from bundle generation.

Translation Memory Lookup

Exact match: query translation_memory by source_fingerprint and target_locale. If no exact match, fuzzy match uses trigram similarity (pg_trgm in PostgreSQL) against source text indexed with GIN. A similarity threshold of 0.75 qualifies as a fuzzy match; the TM entry is returned with a quality_score reduced by (1 – similarity) as a penalty. Segments with quality_score below 0.6 are routed to MT.

MT Fallback and Confidence Scoring

The MT client calls the configured provider, receives a translation, and computes a confidence score using the provider quality estimate (where available) combined with a length-ratio heuristic (translated length divided by source length outside 0.5 to 2.0 range signals suspect output). Segments with confidence below 0.7 are inserted into review_tasks; otherwise they are auto-approved and written to the TM.

Scalability and Architecture

The pipeline is event-driven. A CI webhook triggers string extraction, which publishes new/changed string events to a Kafka topic. A TM lookup worker processes each event: hit goes directly to bundle generation, miss routes to the MT worker pool, which calls the external provider with retry and circuit breaker logic. Human review tasks are inserted into Postgres and surfaced via a translator dashboard.

  • Bundle generation runs after all strings in a project reach approved status (or a deadline passes with best-effort output)
  • Bundles are stored in S3 and pushed to a CDN with a cache key of project_id + locale + version
  • Cache invalidation on new bundle: purge CDN edge nodes via API, bump version integer
  • MT provider rate limits are handled with a token bucket per provider; overflow queues to a delayed retry topic
  • Translation memory is replicated to a read replica for fuzzy search queries to avoid write contention

API Design

String Management

  • POST /v1/projects/{project_id}/strings/extract — accepts a zip of source files, starts async extraction job, returns job_id
  • GET /v1/projects/{project_id}/strings?locale=STRING&status=STRING — paginated list of strings with translation status per locale

Translation Fetch (Client-Facing)

GET /v1/bundles/{project_id}/{locale}/latest.json — served from CDN, returns flat key-value JSON. ETags enable conditional requests for bandwidth efficiency.

Review Queue

  • GET /v1/review-tasks?assignee=ME&locale=STRING — fetch pending review tasks with context
  • PATCH /v1/review-tasks/{task_id} — body: {status, edited_text} — approve or reject an MT segment

Interview Tips

Key discussion points include the tradeoff between auto-approval confidence thresholds and human review workload: lowering the threshold to 0.6 doubles review tasks but catches more MT errors. Discuss version management when a source string changes mid-sprint: deprecate the old string_id rather than mutating it so in-flight translations remain valid. For the fuzzy TM lookup, interviewers expect you to know that pg_trgm GIN indexes support similarity queries efficiently and scale to tens of millions of rows without Elasticsearch.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does a string extraction pipeline work in a translation service?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The extraction pipeline parses source code and resource files (JSON, YAML, PO, XLIFF) to pull translatable strings along with their context keys and developer comments. Extracted strings are deduplicated by key and pushed to a translation management system, which tracks version history and flags strings changed since last translation.”
}
},
{
“@type”: “Question”,
“name”: “What is translation memory fuzzy lookup and how is it implemented?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Translation memory (TM) stores previously translated source-target pairs. On a new string, the system computes edit-distance similarity (Levenshtein or TF-IDF cosine) against TM entries. Matches above a configurable threshold (e.g., 75%) are surfaced as suggestions with their match score, letting translators reuse prior work rather than retranslating from scratch.”
}
},
{
“@type”: “Question”,
“name”: “How is machine translation fallback integrated when no TM match exists?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “When TM lookup returns no match above threshold, the string is sent to an MT engine (e.g., DeepL, Google Translate) via an async job queue. The MT output is stored in the TM at a lower confidence tier and routed to the human post-edit queue so a translator can verify and upgrade its confidence tier before it ships to production.”
}
},
{
“@type”: “Question”,
“name”: “How is the human post-edit queue managed to avoid bottlenecks?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Strings are prioritized in the post-edit queue by release deadline, traffic weight, and MT confidence score. Low-confidence MT translations for high-traffic strings jump the queue. Translators claim batches to avoid duplicate work. SLA timers fire alerts when strings age past a deadline, and auto-publish rules can optionally ship MT output after a timeout with a disclaimer flag.”
}
}
]
}

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Atlassian Interview Guide

Scroll to Top