wealthschema/data sets/reg-bi-suitability-audit-pack
All Data Sets
B01Compliance & AuditBest Seller

Reg BI Suitability Audit Pack

When the SEC examines a broker-dealer's Care Obligation under Regulation Best Interest, they don't audit your policies — they audit individual recommendations made to individual clients. To prove your supervisory system catches the recommendations that should never have been made, you need a body of test households that look exactly like the ones examiners cite in enforcement actions: a 78-year-old with 60% of assets in a single equity, a recent inheritance recipient steered into illiquid alternatives, a client with deteriorating cognitive markers whose POA wasn't updated. The Reg BI Suitability Audit Pack is 130 such households, fully synthetic, ready to drop into your suitability engine on Monday.

Households
130
Archetypes
6
Formats
JSON, CSV
Deviation
Low

Why this Data Set exists

Most compliance teams test their suitability logic with sanitized records pulled from production — anonymized, but distributionally identical to existing book of business, which means the edge cases never appear. The senior with concentrated holdings exists in your real client base in single digits; in a test fixture pulled from prod, you might have zero. So your supervisory system has never been exercised on the cases that actually matter for Reg BI.

The alternative — hand-authored test records — is worse. Each one takes a senior compliance analyst hours to construct, the field coverage is inconsistent, and reviewers can't tell whether the record is realistic or a strawman built to make a specific test pass. That's not what an SEC examiner wants to see when you're walking through your control environment.

This Data Set solves both problems. Every household is generated end-to-end against a documented archetype, then validated for cross-field consistency (age × cognitive markers × POA recency × concentration percentage all align). The 130-household population is calibrated to surface every Reg BI red flag the regulators have actually cited in 2023–2025 enforcement orders — no more, no less.

Use Cases

Reg BI suitability algorithm validation
Senior client supervisory review testing
Concentration & cognitive-decline flag detection
Broker-dealer audit preparation

Who uses this Data Set

Compliance Officer at a Broker-Dealer

Validates that the firm's automated suitability screen flags concentrated-position recommendations to clients age 75+, replicating fact patterns from recent FINRA enforcement actions before an examiner does it for them.

Supervisory Principal

Runs the supervisory review queue against this Data Set quarterly to confirm every category of red flag (cognitive markers, recent inheritance, risk-tolerance mismatch) routes to a Series 24 reviewer with the documentation trail intact.

RegTech Engineer

Uses the JSON corpus as fixture data for unit and integration tests of the firm's suitability rules engine, including regression tests when FINRA Rule 2111 interpretations evolve.

Internal Audit Lead

Provides examiners with a reproducible test bed showing the firm's controls have been exercised against every Reg BI Care Obligation scenario, with audit trail captured.

Compliance SaaS Builder

Demos a compliance product to broker-dealer prospects using realistic households without exposing any prospect's real client data — eliminating the demo data problem entirely.

What's inside

Each of the 130 households is drawn from one of six archetypes selected for Reg BI relevance: Widowed HNW Spouse (H-04), Sudden Wealth Recipient (P-06), RMD-Stage Retiree (RL-01), Elderly Widow/Widower (RL-02), Overconfident DIY Investor (B-02), and Millennial Inheritor (E-01). The mix is weighted toward the senior cohorts because that's where Reg BI examination focus actually sits — roughly 60% of the corpus is age 70+, with 25% age 80+.

Every record carries the full WealthSchema canonical schema: household members with ages, cognitive status flags, POA-on-file booleans, and POA-recency dates; account-level holdings with concentration percentages computed against total liquid assets; risk-tolerance scores on a FINRA-aligned 1–10 scale alongside the questionnaire that produced them; recommendation history (last 24 months) with the supervisory disposition for each. Concentration triggers are pre-flagged on records where any single security exceeds 25% of liquid assets, but the underlying field data is present so you can re-run your own threshold logic against it.

The Data Set ships as JSON (one file per household, plus a manifest) and CSV (long-format with one row per account-position). The WealthSynth Methodology PDF documents every field's derivation, the population calibration methodology, and the specific Reg BI scenarios each archetype is designed to exercise. Annual refresh keeps the corpus aligned with current FINRA interpretive guidance and the calendar of recent enforcement actions.

Preview a sample household

A redacted summary of one household from this Data Set — names, employers, exact balances, and metro area are stripped. Ages are bucketed, income and net worth are reported as bands. The full record (and all 130 like it) ships in the ZIP.

H-04·Widowed HNW Spouse
representative archetype household
Household
Single
State
NY
Gross income (band)
$500k–$1M
Net worth (band)
Dependents
0
Income source types
w2 salary, w2 bonus
Members (1)
primary
Age 55–59
finance

Technical Highlights

FINRA-aligned risk tolerance taxonomy
Concentration-trigger flags pre-populated
Age & POA-recency markers per Reg BI rules
Audit trail of recommendation history

Sample Schema Fields

sample_record.json
{
  "risk_profile.tolerance_score": <value>,
  "compliance.suitability_flags": <value>,
  "members[].age": <value>,
  "members[].cognitive_status": <value>,
  "assets.concentration_pct": <value>
}

Sample queries

Find concentrated holdings in senior clients

Returns every household where a member is age 75+ AND any single security position exceeds 25% of liquid assets — the canonical Reg BI red flag.

households.filter(h =>
  h.members.some(m => m.age >= 75) &&
  h.assets.concentration_pct > 0.25
)
Detect risk-tolerance mismatches

Identifies households where the recommendation history shows allocation to risk-level-7+ products on accounts whose risk-tolerance questionnaire scored 4 or below.

households.filter(h =>
  h.risk_profile.tolerance_score <= 4 &&
  h.recommendations.some(r => r.risk_level >= 7)
)
Surface cognitive-decline + outdated-POA combinations

Returns every household with any cognitive_status marker beyond 'none' AND a POA either missing or last updated more than 3 years ago — the case examiners ask about most.

households.filter(h =>
  h.members.some(m => m.cognitive_status !== 'none') &&
  (!h.legal.poa_on_file ||
    yearsSince(h.legal.poa_last_updated) > 3)
)
Recent-inheritance × illiquid-allocation suitability check

Households whose recommendations include alternative or illiquid products within 18 months of an inheritance event — the fact pattern in multiple 2024 FINRA actions.

households.filter(h =>
  h.events.some(e => e.type === 'inheritance' &&
    monthsSince(e.date) < 18) &&
  h.recommendations.some(r =>
    r.product_class === 'alternative' ||
    r.liquidity_score <= 2)
)

Methodology

The 130-household population is generated by the WealthSynth pipeline from six archetypes defined in our public archetype registry. Each household begins as a deterministic seed (archetype + sequence number), runs through Claude Sonnet 4.6 for persona elaboration, then through a series of overlay stages that inject the Reg BI-specific fields: cognitive markers, POA records, concentration positions, and historical recommendations. Every record then passes a strict consistency validator (age × cognitive flags × POA × account titling × concentration math all reconcile) and an LLM-as-judge quality gate. Households that fail either gate are auto-retried up to twice, then quarantined; the published corpus contains only validated records. Annual refresh re-runs the pipeline against the current year's tax tables, FINRA interpretive guidance, and recent enforcement-action fact patterns.

Included Archetypes (6)

Frequently asked questions

Is this data legally usable for compliance testing?+

Yes. Every record is fully synthetic — no real individual is referenced, directly or indirectly. There is no PII, no GDPR exposure, no GLBA concern, and no data-use agreement to negotiate. The Data License permits internal compliance testing, examiner demonstrations, and audit walk-throughs without restriction; it prohibits redistribution and resale.

How does this differ from anonymized production data?+

Anonymized production data inherits the distribution of your existing book of business, which is exactly the wrong distribution for Reg BI testing — the edge cases the regulators care about are statistically rare in production. This Data Set is calibrated to over-represent those edge cases by design, so your supervisory system gets exercised on the scenarios that drive enforcement.

Can I use this to demo a compliance product to a prospect?+

Yes — that's one of the primary use cases. Prospect demos using this corpus avoid the chicken-and-egg problem where you need real data to demo but the prospect won't share data until they buy. The Data License explicitly permits demonstration use, including in sales calls and at industry conferences.

What FINRA rules and SEC releases is the Data Set calibrated against?+

Primarily SEC Regulation Best Interest (Release No. 34-86031, Care Obligation), FINRA Rule 2111 (Suitability), FINRA Rule 2090 (Know Your Customer), and FINRA Notice 21-09 on senior clients. We track subsequent interpretive guidance and enforcement-action fact patterns and fold them into refreshed corpus versions.

Do I need the full corpus, or can I buy a subset?+

The Data Set is sold as a single 130-household package; it isn't subdivided. The subset most buyers ask for — "just the concentrated-senior cases" — is approximately half the corpus and is included. If you need bespoke composition (e.g. a 500-household concentrated-senior corpus for ML training), reach out about custom generation.

How often is the data refreshed?+

We re-run the generation pipeline against current-year tax tables, updated FINRA interpretive guidance, and any enforcement-action fact patterns from the prior 12 months that introduce new red-flag combinations. Existing records are not edited in place; a new corpus version is delivered. Refresh cadence and scope for buyers are still being defined — check back, or reach out if regulatory currency is critical to your evaluation.

What formats are included?+

JSON (one file per household plus a manifest) and CSV (long-format, one row per account position). The WealthSynth Methodology PDF is included documenting every field's derivation, statistical calibration, and the specific Reg BI scenarios each archetype exercises. Parquet is available on request.

Can I integrate this with my existing supervisory system?+

Yes. The schema is designed for direct ingestion: account titling, holdings, recommendation history, and risk-tolerance scores all use field names compatible with major broker-dealer data models. The Methodology PDF includes a field-mapping appendix for the four most common supervisory platforms.

Related Wealth Data Sets

$5,000
one-time purchase
130 households (ZIP)
Methodology PDF
JSON, CSV formats
Account required to purchase

Purchases are for internal use only. Redistribution or resale of data is prohibited under the WealthSchema Data License.

View data license →