Low Level Design: KYC Identity Verification Service

Introduction

KYC (Know Your Customer) requires verifying user identity using government-issued documents. It is a regulatory requirement for fintech companies, crypto exchanges, and banks. Failure to comply results in fines and loss of operating licenses. The system must verify that a user is who they claim to be, check them against sanctions lists, and produce an auditable decision.

Verification State Machine

The verification lifecycle follows a defined state machine: PENDING → DOCUMENT_SUBMITTED → UNDER_REVIEW → APPROVED / REJECTED / NEEDS_RESUBMISSION. Transitions are triggered by events such as document upload, OCR result availability, or a manual review decision. The current state is stored as current_status on the verification record. Every state transition is written to a separate events table capturing the event type, actor, timestamp, and metadata. This events table is the audit trail regulators require.

Document Upload Pipeline

The user uploads a photo of a government-issued ID document — passport, driver’s license, or national ID. The file is stored in S3 with server-side encryption (SSE-S3 or SSE-KMS). A message is placed on an SQS queue containing the S3 object key and verification_id. A worker pool reads from the queue and calls an OCR vendor API such as AWS Textract or Onfido. The OCR service extracts structured fields: full name, date of birth, document number, expiry date, and issuing country. Extracted data is written back to the verification record. The worker also checks document authenticity signals (font consistency, MRZ checksum) if the vendor provides them.

Face Match

After document submission, the user submits a selfie. The backend calls a face comparison API such as Amazon Rekognition or a specialized KYC vendor. The API compares the selfie against the ID document photo and returns a confidence score. A score of 95% or above is required to pass. Liveness detection prevents photo spoofing: the user is given a challenge-response prompt (blink, turn head left) and the SDK captures a short video or series of frames. Liveness result is stored alongside the face match score on the verification record.

AML Screening

The extracted name and date of birth are sent to a sanctions screening API such as Dow Jones, Refinitiv, or Chainalysis for crypto-focused platforms. The screening checks against the OFAC SDN list, PEP (politically exposed persons) lists, and adverse media sources. The API returns match candidates with match scores. Results are stored on the verification record along with a screening_id for audit. Existing approved customers are re-screened on a weekly batch job to catch newly added sanctions entries. Any hit above the match threshold triggers a manual review regardless of other signals.

Risk Scoring

A composite risk score is computed from multiple signals: document quality score from the OCR vendor, face match confidence percentage, AML screening result (clear / hit / partial hit), IP geolocation versus claimed address country, and device fingerprint risk indicators. Each signal is weighted and summed into a final score. Score buckets determine routing: LOW triggers auto-approve, MEDIUM triggers auto-approve with enhanced monitoring, HIGH routes to the manual review queue, and CRITICAL triggers auto-reject. Thresholds are tunable by the compliance team without a code deploy.

Manual Review Queue

High-risk cases are routed to a manual review queue consumed by compliance agents through an internal dashboard. The reviewer sees document images (served via pre-signed S3 URLs), OCR output with field-level confidence, AML results with match candidates, and the full risk score breakdown. The reviewer can approve, reject, or request resubmission with a reason code. All review actions are written to the events table with reviewer_id and timestamp. Cases can be assigned, locked to prevent double-review, and escalated to a senior reviewer. SLA timers track how long cases sit in the queue.

Data Retention

Document images are retained for the regulatory-mandated period, typically 5 to 7 years depending on jurisdiction. After the retention window expires, images are deleted from S3 via lifecycle policies. PII is minimized after approval: the raw document number is replaced with a salted hash, and raw OCR output is purged. GDPR right-to-erasure requests are handled by replacing remaining PII fields with anonymized tokens and recording the erasure event. The verification decision record and audit trail are retained in anonymized form to satisfy anti-money-laundering reporting requirements even after erasure.

{ “@context”: “https://schema.org”, “@type”: “FAQPage”, “mainEntity”: [ { “@type”: “Question”, “name”: “How do you model state machine transitions in a KYC verification service?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “A KYC verification workflow is naturally represented as a finite state machine with states such as PENDING, DOCUMENT_SUBMITTED, DOCUMENT_VERIFIED, BIOMETRIC_PENDING, BIOMETRIC_VERIFIED, AML_SCREENING, APPROVED, REJECTED, and MANUAL_REVIEW. Each transition is triggered by an event (e.g., document upload, OCR result, liveness check result) and persisted to an append-only audit log. The state machine enforces valid transitions — for example, BIOMETRIC_PENDING can only be entered from DOCUMENT_VERIFIED — preventing partial or out-of-order updates. A durable workflow engine (e.g., Temporal or AWS Step Functions) is commonly used to manage retries, timeouts, and compensation logic across the multi-step process.” } }, { “@type”: “Question”, “name”: “How does face match and liveness detection work in a KYC system?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “Face matching compares a selfie captured during onboarding against the photo extracted from a government-issued ID using a deep learning embedding model (e.g., FaceNet or ArcFace). Cosine similarity between the two embeddings is computed and compared against a configurable threshold (typically 0.85+). Liveness detection prevents spoofing attacks — a passive liveness model analyzes texture, moiré patterns, and reflections to distinguish a live face from a printed photo or screen replay. Active liveness challenges (blinking, head turns) add a second layer of assurance for higher-risk use cases. Results from both checks feed as signals into the downstream risk score.” } }, { “@type”: “Question”, “name”: “How do you implement AML screening against OFAC and PEP lists?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “AML screening matches a user’s name, date of birth, and nationality against sanctions lists (OFAC SDN, UN Consolidated, EU) and Politically Exposed Persons (PEP) databases. The matching algorithm must handle name transliterations, nicknames, and fuzzy spelling variations — a weighted combination of exact match, Levenshtein distance, and phonetic encoding (Soundex/Metaphone) is typical. List data is ingested as a daily batch delta feed and indexed in an inverted index for fast lookup. Matches above a configurable score threshold are flagged for manual review rather than auto-rejected, since false positive rates on common names can be high. All screening decisions and list versions are logged for regulatory audit.” } }, { “@type”: “Question”, “name”: “What does a composite risk scoring model look like in a KYC verification service?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “A composite risk score aggregates signals from multiple independent checks: document authenticity score (OCR confidence, security feature detection), biometric match score, liveness confidence, AML/PEP screening result, device risk (VPN/proxy/TOR detection, device fingerprint), and behavioral signals (session velocity, IP geolocation consistency). Each signal is normalized to a 0–1 range and combined using a weighted linear model or a gradient-boosted classifier trained on labeled fraud outcomes. The final score maps to a risk tier — LOW, MEDIUM, HIGH — which determines the automation path: auto-approve, step-up verification, or manual review queue. Weights are tuned periodically based on fraud analyst feedback and model monitoring.” } }, { “@type”: “Question”, “name”: “How do you handle GDPR right-to-erasure requests in a KYC system?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “GDPR Article 17 right-to-erasure conflicts with AML/KYC record retention obligations (typically 5–7 years under FATF guidelines), so the solution must balance both. In practice, PII fields (name, DOB, document images, selfies, biometric embeddings) are stored in a dedicated PII store keyed by a pseudonymous user token. On erasure request, the PII store is purged and the token is invalidated, leaving only non-PII fields (risk tier, screening result code, timestamps, internal audit hashes) in the compliance record. Biometric embeddings must be explicitly destroyed and deletion confirmed via a signed receipt. A data retention policy engine schedules automatic deletion of PII after the retention window expires, with a legal hold override capability for active fraud investigations.” } } ] }

Frequently Asked Questions: KYC Identity Verification Service

How do you model state machine transitions in a KYC verification service?

A KYC verification workflow is naturally represented as a finite state machine with states such as PENDING, DOCUMENT_SUBMITTED, DOCUMENT_VERIFIED, BIOMETRIC_PENDING, BIOMETRIC_VERIFIED, AML_SCREENING, APPROVED, REJECTED, and MANUAL_REVIEW. Each transition is triggered by an event (e.g., document upload, OCR result, liveness check result) and persisted to an append-only audit log. The state machine enforces valid transitions — for example, BIOMETRIC_PENDING can only be entered from DOCUMENT_VERIFIED — preventing partial or out-of-order updates. A durable workflow engine (e.g., Temporal or AWS Step Functions) is commonly used to manage retries, timeouts, and compensation logic across the multi-step process.

How does face match and liveness detection work in a KYC system?

Face matching compares a selfie captured during onboarding against the photo extracted from a government-issued ID using a deep learning embedding model (e.g., FaceNet or ArcFace). Cosine similarity between the two embeddings is computed and compared against a configurable threshold (typically 0.85+). Liveness detection prevents spoofing attacks — a passive liveness model analyzes texture, moiré patterns, and reflections to distinguish a live face from a printed photo or screen replay. Active liveness challenges (blinking, head turns) add a second layer of assurance for higher-risk use cases. Results from both checks feed as signals into the downstream risk score.

How do you implement AML screening against OFAC and PEP lists?

AML screening matches a user’s name, date of birth, and nationality against sanctions lists (OFAC SDN, UN Consolidated, EU) and Politically Exposed Persons (PEP) databases. The matching algorithm must handle name transliterations, nicknames, and fuzzy spelling variations — a weighted combination of exact match, Levenshtein distance, and phonetic encoding (Soundex/Metaphone) is typical. List data is ingested as a daily batch delta feed and indexed in an inverted index for fast lookup. Matches above a configurable score threshold are flagged for manual review rather than auto-rejected, since false positive rates on common names can be high. All screening decisions and list versions are logged for regulatory audit.

What does a composite risk scoring model look like in a KYC verification service?

A composite risk score aggregates signals from multiple independent checks: document authenticity score (OCR confidence, security feature detection), biometric match score, liveness confidence, AML/PEP screening result, device risk (VPN/proxy/TOR detection, device fingerprint), and behavioral signals (session velocity, IP geolocation consistency). Each signal is normalized to a 0–1 range and combined using a weighted linear model or a gradient-boosted classifier trained on labeled fraud outcomes. The final score maps to a risk tier — LOW, MEDIUM, HIGH — which determines the automation path: auto-approve, step-up verification, or manual review queue. Weights are tuned periodically based on fraud analyst feedback and model monitoring.

How do you handle GDPR right-to-erasure requests in a KYC system?

GDPR Article 17 right-to-erasure conflicts with AML/KYC record retention obligations (typically 5–7 years under FATF guidelines), so the solution must balance both. In practice, PII fields (name, DOB, document images, selfies, biometric embeddings) are stored in a dedicated PII store keyed by a pseudonymous user token. On erasure request, the PII store is purged and the token is invalidated, leaving only non-PII fields (risk tier, screening result code, timestamps, internal audit hashes) in the compliance record. Biometric embeddings must be explicitly destroyed and deletion confirmed via a signed receipt. A data retention policy engine schedules automatic deletion of PII after the retention window expires, with a legal hold override capability for active fraud investigations.