SR 11-7 model risk management with synthetic data — what bank examiners expect

WealthSchema StaffCompliance & legalMay 8, 20266 min read

The Federal Reserve's Supervisory Letter SR 11-7, issued in 2011 and still the controlling document on model risk management for US banks, was written before "synthetic data" was a category anyone in banking used. The letter doesn't reference it. The OCC's parallel guidance in OCC 2011-12 doesn't either. As of 2026, both documents remain current; neither has been updated for the rise of generative AI or the use of synthetic data in model development and validation.

That regulatory silence is not the same thing as regulatory permission. Bank examiners ask about synthetic data in MRM exams routinely. The burden is on the bank to articulate how synthetic data fits into the model-risk-management framework defined by SR 11-7 — and the institutions that have done the work in advance get through their exams without findings, while the institutions that improvise during the exam tend to draw matters-requiring-attention or worse.

This article is the framework for the work in advance.

What SR 11-7 actually requires

SR 11-7 defines a model as "a quantitative method, system, or approach that applies statistical, economic, financial, or mathematical theories, techniques, and assumptions to process input data into quantitative estimates." A bank's MRM program must cover the full model lifecycle: development, validation, implementation, ongoing monitoring, and governance.

The letter identifies three components of effective model risk management:

Robust model development, implementation, and use. Models must be built on sound theory, with attention to data quality, and with documentation that allows independent review.
Sound model validation. Independent of the model developer, validation must include conceptual soundness, ongoing monitoring, and outcomes analysis.
Strong governance, policies, and controls. Including board oversight, model inventory, and a model-risk-management policy.

Synthetic data touches each of the three components. Each one has specific examiner concerns when synthetic data is involved.

Where synthetic data fits in the model lifecycle

Synthetic data appears in three distinct places in a bank MRM program. Each has different examiner expectations.

	What it's used for	SR 11-7 dimension
Training data for model development	Direct input to the model; affects all dimensions	Data quality, bias inheritance, distributional fidelity to production
Test data for model validation	Validation evidence, outcomes analysis	Coverage of edge cases, independence from training data, distributional realism
Stress test scenarios	Outcomes analysis, scenario design	Plausibility of stress scenarios, calibration to real stress events
Sandbox / pre-production environments	Implementation, governance	Production-environment fidelity, controls preventing leakage of synthetic data into production decisions

The first usage — synthetic data as training input — is the highest-risk and gets the most examiner attention. The other three are progressively lower-risk and easier to defend.

The data quality concern

SR 11-7 paragraph III is explicit about data quality: "data and other information used to develop a model are of critical importance...there should be rigorous assessment of data quality and relevance, and appropriate documentation." A model built on poor-quality data is itself a poor-quality model regardless of how good the math is.

For synthetic data, examiners will ask three questions about data quality:

Is the synthetic dataset distributionally representative of the population the model will see in production? A consumer-credit model trained on synthetic data calibrated to a 2019 economic environment is going to underperform on a 2024 production population. The bank must demonstrate alignment between the synthetic distribution and the expected production distribution.
Are edge cases and tail behaviors adequately represented? Stress scenarios, regime changes, and atypical borrower profiles all matter for model robustness. A synthetic dataset that smooths these out has hidden a category of model risk.
What is the provenance of the synthetic data? Who generated it, with what methodology, against what reference distributions, and with what validation? Black-box synthetic data fails this question; archetype-driven generation against documented public sources passes it.

The validation independence concern

SR 11-7 requires that model validation be independent of model development. The standard pattern is that the development team builds the model and the validation team runs an independent battery of tests against it, often using a different dataset.

When synthetic data is involved, examiners ask whether the validation dataset is genuinely independent of the training dataset. If both came from the same generator, run with the same random seeds, calibrated to the same reference distributions, the validation dataset is not independent — it is a sample from the same distribution as the training set, and validation against it tests the model's fit to the distribution rather than its ability to generalize.

The fix is to require synthetic validation datasets to be generated with at least one of: (a) different random seeds, (b) different archetype mixes, (c) different reference distributions, (d) different generation methodologies. The independence dimension matters; documentation of which dimension was varied matters more.

Formula

The independence test

independence(train, validate) = max(seed_independence, mix_independence, reference_independence, methodology_independence)

seed_independence: = 1.0 if validation generated with seeds disjoint from training; 0 if same seeds or seeds re-used with simple offset
mix_independence: = 1.0 if validation archetype mix differs materially from training; 0 if identical
reference_independence: = 1.0 if validation calibrated to a different reference distribution (different vintage, different geography, different segment)
methodology_independence: = 1.0 if validation generated by a different methodology entirely (e.g., training was hybrid LLM, validation is rule-based)

At least one component must be 1.0 for the validation to count as independent under SR 11-7's standard. Examiners will accept multiple weaker forms of independence; they will not accept zero independence on the grounds that 'both datasets are synthetic.'

The documentation pattern that works at exam

The institutions that get through MRM exams cleanly have prepared a synthetic-data-specific addendum to their model documentation. The pattern:

Section 1
Synthetic data inventory
List every synthetic dataset used in model development or validation, with vendor, version, generation date, archetype mix, and reference distribution.
Section 2
Provenance and methodology
For each dataset, the generation methodology (rule-based, LLM, hybrid, GAN), the source data (public aggregates, licensed real data, internal records), and the validation battery applied to the synthetic dataset itself.
Section 3
Distributional fit assessment
Comparison of synthetic dataset distributions against expected production distributions across at least 5 dimensions (demographic, geographic, financial, temporal, regulatory). KL divergence or equivalent metric, with interpretation.
Section 4
Edge-case coverage
Inventory of edge cases the dataset is intended to cover, with population frequencies and the rationale for each. Examiners increasingly ask for this section explicitly for AI/ML models.
Section 5
Validation independence
Documentation of how validation datasets are independent from training datasets. Random seeds, archetype mixes, reference distributions — whichever dimensions were used.
Section 6
Ongoing monitoring
Plan for re-validating synthetic datasets when underlying populations drift. Triggers for refresh, comparison metrics, and the model-rebuild conditions.

The addendum runs 15–25 pages for a meaningful model. It is referenced in the model documentation but maintained separately so that synthetic-data updates can be made without re-issuing the full model documentation.

What examiners increasingly ask about

The state of the art in MRM exams has been evolving. Examiner questions that are increasingly common in 2025–2026:

"How did you decide between training on real data and training on synthetic data?" The implicit standard is that synthetic should be used when real data has known quality issues (privacy, bias, scale) and real should be used otherwise. Banks that default to synthetic without articulating the trade-off draw scrutiny.
"What's your fallback if the synthetic data vendor goes out of business or changes their methodology?" Operational-risk concern. Banks with single-vendor synthetic data dependencies for material models are increasingly being asked to maintain rebuild procedures.
"How do you detect when the synthetic distribution has drifted from the production distribution?" The monitoring question. The expected answer is a dashboard of population-level statistics with thresholds that trigger re-validation.
"Is the synthetic data used in any production decision path, even indirectly?" The implementation question. Synthetic data should never reach a real customer's decision path. Banks need controls that prevent this and documentation showing the controls work.
"What's the role of synthetic data in your stress-test program?" Increasingly, regulators expect banks to use synthetic stress scenarios alongside or in place of historical scenarios, especially for tail events that are not well-represented in real data. The DFAST and CCAR programs have started accepting synthetic stress data with appropriate documentation.

What can go wrong at exam

The most common findings on synthetic-data-using MRM programs:

Insufficient provenance documentation. "Vendor's pipeline is described as 'proprietary' in the model documentation" — drives an MRA every time. Fix is to require vendors to provide architectural documentation as a procurement condition.
Lack of distributional fit assessment. "Bank does not document the alignment between synthetic training distribution and expected production distribution" — drives a finding even when the alignment is in fact good. Fix is the explicit Section 3 assessment.
No edge-case coverage analysis. "Bank cannot demonstrate that the model has been tested against the edge cases relevant to the production environment" — increasingly common in 2025+. Fix is the explicit Section 4 inventory.
Validation set provenance unclear. "Validation dataset and training dataset both came from the same vendor and the bank cannot demonstrate independence" — drives a finding on validation methodology. Fix is the explicit Section 5 documentation.

The findings are correctable but slow to remediate. Pre-exam preparation is almost always cheaper than post-exam remediation.

Key takeaways

SR 11-7 doesn't mention synthetic data, but the framework applies — and the burden is on the bank to articulate how synthetic data fits each MRM dimension.
Data quality, validation independence, edge-case coverage, and provenance are the four examiner concerns that come up most often.
A 15–25 page synthetic-data addendum to model documentation, organized by the six sections above, holds up at exam better than ad-hoc explanations.
Validation datasets must be genuinely independent from training datasets — same vendor, same seeds is not independent. At least one of seed, mix, reference, or methodology must vary.
Examiners are getting more sophisticated about synthetic data. Banks that prepare in advance get through; banks that improvise during the exam often don't.

Frequently asked questions

Do non-bank financial institutions (RIAs, broker-dealers, insurers) face equivalent requirements?+

RIAs face SEC Rule 206(4)-7 compliance program requirements that are less prescriptive than SR 11-7 but functionally similar for material models. Broker-dealers face FINRA supervision rules. Insurers face NAIC model regulations and state insurance department exams. None has the SR 11-7 specificity, but all expect institutions to be able to defend their data and methodology choices for models that materially affect customer outcomes. The SR 11-7 framework is a useful template for non-bank institutions even when it doesn't formally apply.

How does this interact with the EU AI Act for institutions that operate in the EU?+

The EU AI Act classifies most credit-decision and risk-scoring models as 'high-risk AI systems' subject to data governance, documentation, transparency, and human-oversight requirements. The data governance requirements in Article 10 specifically address training data quality and bias — which synthetic data with controlled distributions addresses well. The documentation requirements in Article 11 are similar in spirit to SR 11-7's documentation requirements; an institution with strong SR 11-7 documentation usually doesn't have to start from scratch for the AI Act.

What's the right cadence for re-validating synthetic datasets?+

We recommend annual re-validation against current production distributions, with triggers for off-cycle re-validation when significant drift is detected. The annual cadence aligns with most banks' annual model review cycles. Off-cycle triggers should be based on population-level statistics (demographic shifts, regulatory changes, product changes) and should be explicit in the monitoring documentation.

Can we use synthetic data to satisfy CCAR / DFAST scenario requirements?+

Yes, with caveats. The Fed's stress-test scenarios are themselves synthetic in the relevant sense — they describe hypothetical economic conditions. Banks have used synthetic borrower populations to evaluate model performance under those scenarios. The bank still needs to demonstrate that the synthetic populations align with the stress scenario in distributional terms, which adds a layer of validation not present for non-stress modeling. Several large banks now explicitly use synthetic populations in their CCAR submissions; the practice is mainstream as of 2026.