Requirements and Constraints
A risk scoring service aggregates signals from multiple data sources, runs them through an ensemble of models, produces a calibrated composite score, and generates a human-readable explanation of the score's key drivers. It is consumed by credit underwriting, fraud review, and onboarding flows. Functional requirements: multi-signal feature aggregation (bureau data, behavioral data, device signals, transactional history), model ensemble with configurable weights, calibrated probability output, and explainability export. Non-functional: P99 response under 500ms, support 500 concurrent scoring requests, return consistent scores for the same input (deterministic), and maintain a full audit log per score for regulatory lookups.
Core Data Model
- score_requests(request_id PK UUID, entity_type ENUM('user','business'), entity_id, purpose ENUM('credit','fraud','onboarding'), requested_by, created_at)
- score_results(request_id FK PK, composite_score FLOAT, risk_tier ENUM('low','medium','high','very_high'), model_scores JSONB, feature_values JSONB, explanation JSONB, model_ensemble_version, created_at)
- model_registry(model_id PK, name, version, type ENUM('lgbm','logistic','neural'), purpose, onnx_artifact_path, weight FLOAT, calibration_params JSONB, deployed_at, active BOOL)
- feature_sources(source_id PK, name, signal_type, fetch_method ENUM('sync_http','cache','batch'), timeout_ms, required BOOL)
- entity_feature_cache(entity_id, feature_source_id, feature_values JSONB, computed_at, expires_at)
- calibration_models(model_id FK, method ENUM('platt','isotonic'), params JSONB, trained_at)
Multi-Signal Feature Aggregation
Feature signals are fetched in parallel with a configurable timeout per source. Signal types include: credit bureau trade lines and derogatory marks (synchronous HTTP to bureau API), internal transaction behavioral features (pre-computed, served from the feature cache), device reputation and identity signals (sync HTTP to device intelligence vendor), application data submitted by the user (passed inline in the request), and social/network graph features (async pre-computed batch). The aggregation layer uses a scatter-gather pattern: all required sources are fetched concurrently; the response waits for required sources (up to their timeout) and proceeds with available optional sources.
Fetched features are normalized to a canonical schema defined per model. Feature engineering transformations (log scaling, binning, one-hot encoding) are defined as a versioned transformation pipeline co-deployed with each model, ensuring the feature representation at inference matches what the model was trained on.
Model Ensemble
Multiple models are combined using a weighted average ensemble. Each model in model_registry has an assigned weight; weights sum to 1.0. Each model receives the same feature vector and produces a raw probability output. The ensemble output is: composite_raw = SUM(model.weight * model.raw_score). Weights are managed in the database and can be adjusted without code deployment, enabling gradual rollout of new models by increasing their weight incrementally (shadow mode at weight=0, canary at weight=0.1, then ramp).
Calibration corrects for systematic over- or under-confidence in raw model outputs. Platt scaling (logistic regression on model output vs. true label) and isotonic regression are the two supported methods, with calibration parameters stored per model. The calibration_models table stores the fitted parameters; calibration is applied in the scoring service after raw inference, before ensemble aggregation.
Score Explanation
Explanation is generated using SHAP (SHapley Additive exPlanations) values computed at inference time for tree models, or approximated using LIME for neural models. The explanation JSONB field stores the top-10 features by absolute SHAP value, including the feature name, the entity's value for that feature, the population median for context, and the SHAP contribution (signed float). This structure supports: adverse action notices (which negative features most impacted the score), analyst review panels, and regulatory inquiry responses. Explanation generation adds approximately 10-20ms overhead for tree-based SHAP — acceptable within the 500ms budget.
Calibration and Monitoring
Score calibration quality degrades over time as population distributions shift (model drift). A daily calibration job evaluates the Brier score and Expected Calibration Error (ECE) on the rolling 30-day labeled dataset. If ECE exceeds a threshold, an alert fires and the calibration parameters are retrained on the latest data. The model_ensemble_version field in score_results allows post-hoc analysis of score quality by ensemble cohort.
Scalability Considerations
- Feature cache: Pre-computing expensive features (bureau queries cost $0.05-$0.50 each) and caching them for 24-48 hours for the same entity reduces both latency and cost. Cache population is triggered on first miss or via a nightly batch refresh for high-activity entities.
- Model artifact loading: ONNX models are loaded into shared memory at process startup. For large ensembles, use process-level model pools with thread-safe ONNX Runtime InferenceSession objects.
- Determinism: Fixed random seeds, deterministic ONNX execution providers, and feature snapshot storage (feature_values in score_results) ensure the score can be exactly reproduced for audit purposes.
- Horizontal scaling: The scoring service is stateless; scale horizontally. Route by entity_id for cache locality if the feature cache is in-process rather than external.
API Design
POST /scores— primary scoring endpoint; accepts entity_id, entity_type, purpose, and inline features; returns composite score, tier, model breakdown, and explanationGET /scores/{request_id}— retrieve stored score result with full feature snapshot for auditGET /entities/{id}/score-history— time series of scores for an entity, useful for trend analysisPOST /models/{id}/activate— add a model to the active ensemble with specified weightGET /models/{id}/calibration-report— current Brier score and ECE for a model over rolling windowPOST /features/precompute— batch endpoint to trigger feature pre-computation for a list of entity IDs
See also: Stripe Interview Guide 2026: Process, Bug Bash Round, and Payment Systems
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering