Guide

Playbook: Executing SR 11-7 Model Validation Using Synthetic Data

Published May 8, 2026

SR 11-7 has been the federal banking model-risk framework since 2011 and the de-facto standard well beyond its banking origin — large RIAs, RIAs that custody at OCC-supervised banks, and increasingly any AI-using fintech adopt the SR 11-7 frame to demonstrate model governance to enterprise prospects. The framework's three pillars — conceptual soundness, ongoing monitoring, outcomes analysis — each require evidence the validation team can independently produce. This playbook covers how to execute each pillar using a synthetic-data corpus, with the artifacts the model risk committee or examiner consumes at the end.

Independence: the validation team's data isn't the development team's data

SR 11-7 emphasizes effective challenge — the validation team is independent of the development team. A common gap: the validation team uses the same training data the development team used, which means the validation discovers only the bugs the dev team's data didn't surface.

Using a synthetic-data corpus that the validation team controls — independent from the development data — is one effective-challenge pattern. The corpus structurally tests scenarios the development team may not have considered. For wealth-tech, this often catches edge cases like ITIN filers, multi-state residents, and K-1 income recipients whose absence from training data hides model defects.

Pillar 1 — Conceptual soundness

Conceptual soundness asks: is the model designed to do what it claims to do? The validation must include: documentation of the model's theoretical underpinnings, an evaluation of the assumptions, an evaluation of the data inputs, and an assessment of the modeling methodology.

The synthetic corpus contributes here by providing a counterfactual test bed. For each major assumption in the model, construct a synthetic scenario that violates the assumption and observe model behavior. If the model fails gracefully, the assumption is bounded; if the model produces nonsense, the assumption is load-bearing and the documentation must reflect that.

  • ·Assumption inventory: list every assumption the model makes
  • ·For each assumption, identify a synthetic scenario that violates it
  • ·Run the model on the violating scenario; document behavior
  • ·Classify each assumption: graceful-degradation vs. load-bearing
  • ·Document load-bearing assumptions with explicit user-facing constraints

Pillar 2 — Ongoing monitoring

Ongoing monitoring tests whether the model continues to perform as expected post-deployment. The validation must include: defined performance metrics, defined thresholds, monitoring frequency, and escalation procedures.

The synthetic corpus contributes by providing the baseline against which production drift is measured. Periodically (typically quarterly) run the model against the same synthetic corpus and compare results to baseline. Drift in the model's outputs against the constant corpus indicates the model itself has drifted or its dependencies (libraries, infrastructure, data pipelines) have changed.

// Quarterly drift check pseudocode
const baseline = JSON.parse(fs.readFileSync('baseline_outputs.json'));
const current = corpus.map(h => model.score(h));

for (const [householdId, baselineOutput] of Object.entries(baseline)) {
  const currentOutput = current[householdId];
  const drift = Math.abs(currentOutput.score - baselineOutput.score);
  if (drift > drift_threshold) {
    flag('drift', householdId, baselineOutput, currentOutput);
  }
}

// If drift count exceeds population threshold, escalate
if (driftCount / corpus.length > 0.05) {
  escalateToModelRiskCommittee();
}

Pillar 3 — Outcomes analysis

Outcomes analysis evaluates the model's predictions against actual outcomes. The validation must include: backtesting against known outcomes, sensitivity analysis, and benchmarking against alternative models.

For models that lack rich actual-outcome data (a new credit model, a new fraud detection model), the synthetic corpus provides synthetic outcomes against which to backtest. The synthetic outcomes must be plausible and structurally consistent — a corpus that includes a 'fraud' label per household, with the label structurally tied to features that should drive the prediction, allows backtesting before real-world deployment.

Effective challenge: the validation team's adversarial role

SR 11-7's effective-challenge principle says the validation team should attempt to find the model's failure modes, not merely confirm what the development team already showed. Using synthetic data, this becomes tractable.

The pattern: the validation team identifies hypotheses about the model's failure modes and constructs synthetic scenarios to test them. Examples for wealth-tech: 'the model fails on households with K-1 income because the training data underrepresented K-1s,' 'the model fails on multi-state residents because the training data was single-state,' 'the model produces inconsistent recommendations for matched pairs differing only in protected class.'

For each hypothesis, construct synthetic households that exercise it and observe model behavior. Document hypotheses confirmed and refuted. The validation report includes both — confirmed failures (with required remediation) and refuted hypotheses (which strengthen the validation conclusion).

Documentation: the validation report structure

The validation report has a standard structure that examiners and model risk committees expect. Following this structure shortens review time.

  1. Executive summary — model purpose, validation scope, key findings, validation conclusion
  2. Model description — methodology, assumptions, data inputs, intended use
  3. Validation methodology — what was tested, how, with what data (including synthetic corpus citation)
  4. Conceptual soundness findings — assumption testing, methodology evaluation
  5. Ongoing monitoring framework — metrics, thresholds, frequency, escalation
  6. Outcomes analysis — backtesting results, sensitivity analysis, benchmark comparisons
  7. Effective challenge findings — hypotheses tested and outcomes
  8. Limitations and constraints — what the validation did NOT cover
  9. Recommendations — required, suggested, observational
  10. Validation conclusion — approved / approved-with-conditions / not-approved

Validation refresh cadence

Annual revalidation is the standard for production models. Trigger events that force earlier revalidation: material model change, material data input change, performance threshold breach in monitoring, regulatory guidance change, business-context change (new product line, new customer segment).

The revalidation can be lighter than the initial validation if no triggers fired — a focused review of monitoring results, a re-run of the synthetic corpus, and a refresh of the documentation. If triggers fired, the revalidation depth must match the change material.

Key takeaways

  • Validation team independence is operationalized through validation-team-controlled synthetic corpora that the development team didn't see during model development.
  • Each SR 11-7 pillar — conceptual soundness, ongoing monitoring, outcomes analysis — has a synthetic-data testing pattern that produces evidence the model risk committee and examiners consume.
  • Effective challenge is the underrated pillar. Construct hypotheses about model failure modes and test each with targeted synthetic scenarios. Document confirmed and refuted hypotheses both.
  • Quarterly drift testing against a constant synthetic baseline is the cheapest signal for whether the model or its dependencies have drifted post-deployment.

FAQ

Is SR 11-7 mandatory for non-bank fintechs?+

Not mandatory by federal banking rule, but increasingly expected by enterprise customers, custodian banks, and (in adapted form) by RIA examiners. Adopting the framework even when not strictly required de-risks future regulatory developments and shortens enterprise-prospect security reviews.

How does this interact with the OCC's AI bulletin and CFPB's adverse-action guidance?+

Both build on SR 11-7's frame and add specifics for AI-driven decisions. The validation playbook in SR 11-7 is the foundation; the AI / adverse-action specifics are layers on top. A validation that satisfies SR 11-7 is the prerequisite for satisfying the layered guidance.

What size validation team is realistic for this?+

For mid-stage fintechs, often 1-2 dedicated validators plus access to subject-matter experts on demand. The validation work is bounded — not every model needs full annual validation; tier the models by risk and apply proportional validation rigor.

How do we handle proprietary vendor models that we use but don't develop?+

Vendor model validation is explicitly contemplated in SR 11-7. The validation must cover: vendor's documentation review, vendor's validation evidence review, your own testing on representative data, and ongoing monitoring of the vendor's outputs. Vendors should provide model documentation responsive to SR 11-7 framing.

What if the synthetic corpus we use doesn't perfectly match the production population?+

Document the gaps and apply the validation conclusion subject to the gaps. The validation conclusion in production then includes monitoring of input drift to detect when production data moves into territory the synthetic corpus didn't cover.

How is this different from other model-risk frameworks (FRTB, IFRS 9, etc.)?+

Other frameworks are domain-specific (market risk, credit risk reporting). SR 11-7 is general-purpose model risk and applies across domains. The execution patterns translate; the specific tests differ.

Can the synthetic corpus replace internal-data testing entirely?+

No — synthetic complements internal data, doesn't replace it. Internal data shows actual production behavior; synthetic shows expected production behavior under designed scenarios. Both are needed for a complete picture.