AI Ethics and Fairness: Interview Questions and Technical Framework

⏱ 5 min read

AI ethics and fairness questions appear in interviews at every major tech company — and not just for policy roles. Engineers are expected to identify and mitigate bias, understand fairness trade-offs, and discuss the limits of technical solutions to social problems. This is especially common at Google, Meta, Microsoft, and AI-focused companies.

What Interviewers Are Testing

Can you identify bias sources in data and models?
Do you understand that different fairness definitions are mathematically incompatible in general?
Can you describe technical interventions and their trade-offs?
Do you understand where technical solutions end and policy decisions begin?

Types of Bias in ML

Historical Bias

The world has historical inequities that appear in data. A resume screening model trained on historical hiring decisions will encode historical discrimination — e.g., lower hiring rates for women in technical roles. Removing protected attributes doesn’t fix this because proxies remain (graduation year, name, university).

Measurement Bias

The feature being measured is a worse proxy for the true construct for some groups. Criminal recidivism prediction: arrest records are a proxy for crime, but arrest rates differ by race due to policing patterns. The model predicts arrests, not actual crime.

Aggregation Bias

A model trained on aggregate data performs worse on subgroups. A diabetes prediction model trained on predominantly white patients may have higher error rates for Black patients due to physiological differences in relevant biomarkers.

Representation Bias

Underrepresentation of groups in training data. Facial recognition systems trained on lighter-skinned faces have significantly higher error rates on darker-skinned faces (documented in the Gender Shades study: error rates up to 34% higher for darker-skinned women vs lighter-skinned men).

Fairness Definitions and the Impossibility Theorem

Q: Explain the different definitions of algorithmic fairness.

Demographic Parity (Statistical Parity):
Positive prediction rates are equal across groups.
P(Ŷ=1 | A=0) = P(Ŷ=1 | A=1)

Equalized Odds:
True positive rates AND false positive rates are equal across groups.
P(Ŷ=1 | Y=1, A=0) = P(Ŷ=1 | Y=1, A=1) [TPR]
P(Ŷ=1 | Y=0, A=0) = P(Ŷ=1 | Y=0, A=1) [FPR]

Equal Opportunity:
Only true positive rates are equal (relaxation of equalized odds).
P(Ŷ=1 | Y=1, A=0) = P(Ŷ=1 | Y=1, A=1)

Calibration:
Among all predicted at 70% probability, 70% actually belong to the positive class — for all groups.
P(Y=1 | Ŷ=p, A=0) = P(Y=1 | Ŷ=p, A=1) = p

Q: Why can’t you satisfy all fairness definitions simultaneously?

The Chouldechova (2017) impossibility theorem proves that calibration, equal FPR, and equal FNR cannot all hold simultaneously when base rates differ across groups — which they almost always do in practice.

The COMPAS recidivism tool case: ProPublica found COMPAS violated equalized odds (Black defendants had higher false positive rates). Northpointe responded that COMPAS was calibrated. Both were correct — because base rates differ, you cannot have both.

The policy implication: Which fairness criterion to optimize is not a technical decision — it reflects a value judgment about which type of error is worse. Engineers must surface this trade-off, not resolve it unilaterally.

Technical Interventions

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

# ── Pre-processing: Reweighting ───────────────────────────────────────────

def compute_reweighting_factors(y_true, sensitive_attr):
    """
    Reweight samples so that each (group, label) combination
    has equal representation in effective training distribution.
    """
    weights = np.ones(len(y_true))
    groups = np.unique(sensitive_attr)
    labels = np.unique(y_true)

    for group in groups:
        for label in labels:
            mask = (sensitive_attr == group) & (y_true == label)
            if mask.sum() == 0:
                continue
            # Expected proportion if perfectly balanced
            expected = 1.0 / (len(groups) * len(labels))
            # Actual proportion
            actual = mask.sum() / len(y_true)
            weights[mask] = expected / actual

    return weights

# ── Post-processing: Threshold Adjustment ─────────────────────────────────

def equalize_opportunity(y_scores, y_true, sensitive_attr, target_tpr=None):
    """
    Find group-specific thresholds that equalize TPR (Equal Opportunity).
    If target_tpr is None, find thresholds that minimize TPR difference.
    """
    groups = np.unique(sensitive_attr)
    thresholds = {}

    for group in groups:
        mask = sensitive_attr == group
        group_scores = y_scores[mask]
        group_labels = y_true[mask]

        # Find threshold that achieves target TPR for this group
        from sklearn.metrics import roc_curve
        fpr, tpr, thresh = roc_curve(group_labels, group_scores)

        if target_tpr is not None:
            idx = np.argmin(np.abs(tpr - target_tpr))
        else:
            idx = len(thresh) // 2  # Default midpoint

        thresholds[group] = thresh[idx]
        print(f"Group {group}: threshold={thresh[idx]:.3f}, TPR={tpr[idx]:.3f}")

    return thresholds

def apply_group_thresholds(y_scores, sensitive_attr, thresholds):
    """Apply group-specific thresholds to get predictions."""
    predictions = np.zeros(len(y_scores))
    for group, threshold in thresholds.items():
        mask = sensitive_attr == group
        predictions[mask] = (y_scores[mask] >= threshold).astype(int)
    return predictions

Intervention comparison:

Stage	Method	Pros	Cons
Pre-processing	Reweighting, resampling, disparate impact removal	Model-agnostic; fixes representation	May not fully address proxy variables
In-processing	Fairness constraints in objective (adversarial debiasing, fairness regularization)	Directly optimizes fairness during training	Complex; may reduce accuracy
Post-processing	Group-specific thresholds, equalized odds post-processing	Simple; doesn’t require retraining	Requires access to sensitive attribute at serving time; may violate anti-discrimination law in some jurisdictions

High-Stakes Applications

Q: How would you approach fairness for a hiring screening model?

Don’t use features that are proxies for protected attributes: zip code correlates with race; name can reveal gender and ethnicity
Audit output distributions: track selection rates by demographic group; flag adverse impact (4/5ths rule: selection rate for protected group should be at least 80% of highest-selecting group)
Consider: should you build this at all? If the historical training data reflects discriminatory hiring, the model will replicate it regardless of interventions
Human in the loop: for high-stakes decisions, require human review; model is a filter, not a decision-maker

Q: What are the limitations of purely technical fairness interventions?

Proxy variables: even after removing race/gender, models can learn equivalent proxies from correlated features
Feedback loops: biased decisions generate biased future training data (predictive policing example)
Definition conflicts: satisfying one fairness criterion may worsen another
The ground truth problem: if the label itself is biased (arrest = crime), no technical fix solves the underlying problem
Generalization: a fair model on test data may be unfair on out-of-distribution deployment data

Regulatory Context (2026)

EU AI Act: High-risk AI systems (hiring, credit, law enforcement) require conformity assessments, transparency reports, and human oversight
US: EEOC guidance extends disparate impact doctrine to algorithmic hiring tools; CFPB applies ECOA to credit scoring models
Model cards: Standardized documentation of model performance across demographic groups; expected at Google, Meta, Microsoft for all production models

Depth Levels

Junior: Name types of bias, explain the fairness trade-off, describe one technical intervention.

Senior: Implement group-specific threshold adjustment, explain impossibility theorem, audit a model for disparate impact.

Staff: Design a fairness monitoring system for production, navigate stakeholder alignment on which fairness criterion to optimize, assess regulatory compliance requirements, advise on whether to build a high-risk application at all.

Classification Metrics — fairness metrics (equal opportunity, equalized odds) are extensions of precision/recall applied across demographic groups
Handling Imbalanced Datasets — class imbalance and group imbalance interact: a model can be accurate overall while severely underperforming for underrepresented groups
How to Evaluate an LLM — LLM bias evaluation and red-teaming are part of the broader LLM evaluation framework; TruthfulQA and demographic bias benchmarks
ML System Design: Build a Fraud Detection System — fraud models face regulatory fairness requirements (ECOA, CFPB); false positive rates must be audited across demographic groups
How Does RLHF Work? — RLHF and Constitutional AI are technical approaches to aligning LLMs with human values; related to but distinct from bias mitigation