Data Governance Platform Low-Level Design: Policy Engine, Access Control, and Compliance Enforcement

Data Governance Platform Overview

A Data Governance Platform enforces organizational data policies — who can access what data, under what conditions, and with what logging obligations — through a combination of a policy-as-code engine, attribute-based access control (ABAC), sensitive data tagging, and automated compliance reporting. It acts as the authoritative policy decision point for data access requests across the data platform.

Requirements

Functional Requirements

  • Define and version data access policies as code (e.g., OPA Rego, Cedar, or a domain-specific policy language).
  • Evaluate access requests in real time using ABAC: subject attributes (role, team, region), resource attributes (classification, owner, PII flag), and environmental attributes (time, access method).
  • Automatically tag sensitive data assets: PII, financial, health, proprietary — using classifier models and manual annotation.
  • Enforce policies at query time by integrating with data warehouses and lake engines as an authorization layer.
  • Generate compliance reports: data access logs, policy coverage, and sensitive data inventory for GDPR, CCPA, and internal audit.

Non-Functional Requirements

  • Policy evaluation latency under 10 ms at p99 on the query authorization path.
  • Support 10,000 concurrent policy evaluation requests.
  • Policy changes take effect within 30 seconds of publication without service restart.

Data Model

The Policy document stores: policy_id UUID, name, version INT, language ENUM (rego, cedar, custom), policy_code TEXT, effective_from TIMESTAMP, status ENUM (draft, active, deprecated), and author_id. Only one version per policy is active at a time; the policy engine loads all active policies on startup and re-loads within 30 seconds of a status change.

The DataAssetClassification record links asset metadata to sensitivity labels: asset_id, column_ref (nullable), classification_tags LIST (PII, financial, health, confidential, public), classifier_source ENUM (model, manual, inherited), confidence FLOAT, and last_reviewed_at TIMESTAMP.

The AccessAuditLog table records every evaluated access request: log_id UUID, subject_id, resource_ref, action, decision ENUM (allow, deny, allow-with-mask), matched_policy_id, evaluated_at TIMESTAMP, and request_context JSON. Append-only; partitioned by date for efficient compliance queries.

Core Algorithms

Policy-as-Code Engine

Access policies are written in OPA Rego (or an equivalent declarative language) and compiled to bytecode at publish time. The policy engine loads all active policies into an in-process bundle. On each access request, the engine evaluates the relevant policy set using the subject, resource, and environment attributes as input. Evaluation is purely functional and side-effect-free, enabling parallel evaluation across request threads. The compiled bundle is atomically swapped when new policies are published, with no request latency impact during the swap.

Attribute-Based Access Control

Subject attributes are fetched from an identity provider (cached in Redis with a 60-second TTL). Resource attributes are read from the Data Catalog (asset classification, owner, PII flags). Environmental attributes (time of day, access endpoint) are derived from the request context. The ABAC evaluation produces one of three decisions: allow, deny, or allow-with-mask (column masking for semi-sensitive access). Masking transforms (e.g., partial redaction of email, hash of SSN) are applied by a data-layer plugin at query execution time.

Sensitive Data Classification

A multi-pass classifier pipeline tags assets: (1) a rule-based pass applies regex patterns for known PII formats (SSN, email, phone, credit card); (2) a fine-tuned text classifier on column names and sample values identifies likely sensitive columns with a probability score; (3) a lineage-based inheritance step propagates tags from source to derived columns if the transformation is a pass-through (projection or filter). Manual overrides always take precedence and are preserved across re-classification runs.

API Design

  • EvaluateAccess(AccessRequest) → AccessDecision — core authorization endpoint called by data access proxies and query engines; returns allow/deny/mask with the matched policy ID.
  • PublishPolicy(Policy) → PolicyId — creates or updates a policy; transitions it to active status and triggers hot-reload in the policy engine fleet.
  • GetPolicyEvalTrace(AccessRequest) → EvalTrace — debug endpoint that returns the full policy evaluation trace for a given request, useful for policy authoring and troubleshooting.
  • GetClassification(AssetRef) → ClassificationRecord — returns the current sensitivity tags and their sources for an asset or column.
  • GenerateComplianceReport(ReportSpec) → ReportJob — triggers an async compliance report (access log summary, PII inventory, policy coverage) and returns a job ID.

Scalability and Fault Tolerance

The policy engine fleet is stateless; each instance loads the active policy bundle at startup and polls the policy store every 15 seconds for updates. On bundle update, a background thread compiles and pre-warms the new bundle before atomic promotion. If compilation fails (syntax or semantic error), the new bundle is rejected and the previous bundle remains active with an alert fired to the policy author.

Subject attribute caching in Redis reduces identity provider load to near zero on the hot path. Cache miss latency (identity provider round-trip) is bounded at 20 ms; if exceeded, the engine falls back to the last cached value with a staleness flag in the audit log. Resource attributes (classification tags) are pre-loaded into a local in-process map refreshed every 60 seconds from the Data Catalog, eliminating cross-service calls on the critical path.

The access audit log is written asynchronously to avoid adding latency to the authorization response. Log records are buffered in a local queue and flushed to the append-only store in micro-batches every 100 ms. In the event of a store outage, records are spilled to local disk and replayed on recovery.

Monitoring

  • Track policy evaluation latency at p50, p95, and p99; alert if p99 exceeds 10 ms.
  • Monitor deny rate by team and resource; a sudden spike may indicate a policy misconfiguration or a legitimate access pattern that needs a policy update.
  • Track sensitive data classification coverage: fraction of production assets with at least one classification tag.
  • Publish weekly compliance dashboards: total access events, deny events, PII-tagged asset access by team, and open policy review items.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How is OPA Rego used as policy-as-code in a data governance platform?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Open Policy Agent (OPA) evaluates Rego policies against a JSON input document representing the access request (user attributes, resource metadata, action). Policies are stored in version control and deployed to OPA sidecars or a central OPA cluster. This makes access rules auditable, testable in CI, and decoupled from application code.”
}
},
{
“@type”: “Question”,
“name”: “How does ABAC with column masking work in data governance?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Attribute-Based Access Control (ABAC) evaluates policies against user attributes (role, team, clearance level) and data attributes (sensitivity classification, PII flag). When a query touches a column the user is not authorized to see in full, the query engine rewrites it to return a masked value (e.g., last four digits of SSN, hashed email). Masking is enforced at the query layer so raw values never leave the warehouse.”
}
},
{
“@type”: “Question”,
“name”: “How does a PII classifier combining regex and ML work?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Regex patterns handle well-structured PII with known formats (SSN, credit card numbers, email addresses, phone numbers) at low cost. An ML classifier (typically a fine-tuned text or tabular model) handles free-text fields and ambiguous cases where regex alone produces too many false positives. Results from both are combined: a column is flagged as PII if either detector fires above its confidence threshold.”
}
},
{
“@type”: “Question”,
“name”: “Why is an async append-only log used for compliance auditing in data governance?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “An append-only log ensures that every data access and policy decision is recorded immutably — records cannot be edited or deleted, satisfying audit and regulatory requirements (GDPR, HIPAA, SOX). Async writing decouples audit logging from the critical path of the access request, preventing log write latency from affecting query performance. The log is periodically checkpointed and signed for tamper evidence.”
}
}
]
}

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture

Scroll to Top