When the SEC examines a broker-dealer's Care Obligation under Regulation Best Interest, they don't audit your policies — they audit individual recommendations made to individual clients. To prove your supervisory system catches the recommendations that should never have been made, you need a body of test households that look exactly like the ones examiners cite in enforcement actions: a 78-year-old with 60% of assets in a single equity, a recent inheritance recipient steered into illiquid alternatives, a client with deteriorating cognitive markers whose POA wasn't updated. The Reg BI Suitability Audit Pack is 130 such households, fully synthetic, ready to drop into your suitability engine on Monday.
Most compliance teams test their suitability logic with sanitized records pulled from production — anonymized, but distributionally identical to existing book of business, which means the edge cases never appear. The senior with concentrated holdings exists in your real client base in single digits; in a test fixture pulled from prod, you might have zero. So your supervisory system has never been exercised on the cases that actually matter for Reg BI.
The alternative — hand-authored test records — is worse. Each one takes a senior compliance analyst hours to construct, the field coverage is inconsistent, and reviewers can't tell whether the record is realistic or a strawman built to make a specific test pass. That's not what an SEC examiner wants to see when you're walking through your control environment.
This Data Set solves both problems. Every household is generated end-to-end against a documented archetype, then validated for cross-field consistency (age × cognitive markers × POA recency × concentration percentage all align). The 130-household population is calibrated to surface every Reg BI red flag the regulators have actually cited in 2023–2025 enforcement orders — no more, no less.
Validates that the firm's automated suitability screen flags concentrated-position recommendations to clients age 75+, replicating fact patterns from recent FINRA enforcement actions before an examiner does it for them.
Runs the supervisory review queue against this Data Set quarterly to confirm every category of red flag (cognitive markers, recent inheritance, risk-tolerance mismatch) routes to a Series 24 reviewer with the documentation trail intact.
Uses the JSON corpus as fixture data for unit and integration tests of the firm's suitability rules engine, including regression tests when FINRA Rule 2111 interpretations evolve.
Provides examiners with a reproducible test bed showing the firm's controls have been exercised against every Reg BI Care Obligation scenario, with audit trail captured.
Demos a compliance product to broker-dealer prospects using realistic households without exposing any prospect's real client data — eliminating the demo data problem entirely.
Each of the 130 households is drawn from one of six archetypes selected for Reg BI relevance: Widowed HNW Spouse (H-04), Sudden Wealth Recipient (P-06), RMD-Stage Retiree (RL-01), Elderly Widow/Widower (RL-02), Overconfident DIY Investor (B-02), and Millennial Inheritor (E-01). The mix is weighted toward the senior cohorts because that's where Reg BI examination focus actually sits — roughly 60% of the corpus is age 70+, with 25% age 80+.
Every record carries the full WealthSchema canonical schema: household members with ages, cognitive status flags, POA-on-file booleans, and POA-recency dates; account-level holdings with concentration percentages computed against total liquid assets; risk-tolerance scores on a FINRA-aligned 1–10 scale alongside the questionnaire that produced them; recommendation history (last 24 months) with the supervisory disposition for each. Concentration triggers are pre-flagged on records where any single security exceeds 25% of liquid assets, but the underlying field data is present so you can re-run your own threshold logic against it.
The Data Set ships as JSON (one file per household, plus a manifest) and CSV (long-format with one row per account-position). The WealthSynth Methodology PDF documents every field's derivation, the population calibration methodology, and the specific Reg BI scenarios each archetype is designed to exercise. Annual refresh keeps the corpus aligned with current FINRA interpretive guidance and the calendar of recent enforcement actions.
A redacted summary of one household from this Data Set — names, employers, exact balances, and metro area are stripped. Ages are bucketed, income and net worth are reported as bands. The full record (and all 130 like it) ships in the ZIP.
{
"risk_profile.tolerance_score": <value>,
"compliance.suitability_flags": <value>,
"members[].age": <value>,
"members[].cognitive_status": <value>,
"assets.concentration_pct": <value>
}Returns every household where a member is age 75+ AND any single security position exceeds 25% of liquid assets — the canonical Reg BI red flag.
households.filter(h => h.members.some(m => m.age >= 75) && h.assets.concentration_pct > 0.25 )
Identifies households where the recommendation history shows allocation to risk-level-7+ products on accounts whose risk-tolerance questionnaire scored 4 or below.
households.filter(h => h.risk_profile.tolerance_score <= 4 && h.recommendations.some(r => r.risk_level >= 7) )
Returns every household with any cognitive_status marker beyond 'none' AND a POA either missing or last updated more than 3 years ago — the case examiners ask about most.
households.filter(h =>
h.members.some(m => m.cognitive_status !== 'none') &&
(!h.legal.poa_on_file ||
yearsSince(h.legal.poa_last_updated) > 3)
)Households whose recommendations include alternative or illiquid products within 18 months of an inheritance event — the fact pattern in multiple 2024 FINRA actions.
households.filter(h =>
h.events.some(e => e.type === 'inheritance' &&
monthsSince(e.date) < 18) &&
h.recommendations.some(r =>
r.product_class === 'alternative' ||
r.liquidity_score <= 2)
)The 130-household population is generated by the WealthSynth pipeline from six archetypes defined in our public archetype registry. Each household begins as a deterministic seed (archetype + sequence number), runs through Claude Sonnet 4.6 for persona elaboration, then through a series of overlay stages that inject the Reg BI-specific fields: cognitive markers, POA records, concentration positions, and historical recommendations. Every record then passes a strict consistency validator (age × cognitive flags × POA × account titling × concentration math all reconcile) and an LLM-as-judge quality gate. Households that fail either gate are auto-retried up to twice, then quarantined; the published corpus contains only validated records. Annual refresh re-runs the pipeline against the current year's tax tables, FINRA interpretive guidance, and recent enforcement-action fact patterns.
Yes. Every record is fully synthetic — no real individual is referenced, directly or indirectly. There is no PII, no GDPR exposure, no GLBA concern, and no data-use agreement to negotiate. The Data License permits internal compliance testing, examiner demonstrations, and audit walk-throughs without restriction; it prohibits redistribution and resale.
Anonymized production data inherits the distribution of your existing book of business, which is exactly the wrong distribution for Reg BI testing — the edge cases the regulators care about are statistically rare in production. This Data Set is calibrated to over-represent those edge cases by design, so your supervisory system gets exercised on the scenarios that drive enforcement.
Yes — that's one of the primary use cases. Prospect demos using this corpus avoid the chicken-and-egg problem where you need real data to demo but the prospect won't share data until they buy. The Data License explicitly permits demonstration use, including in sales calls and at industry conferences.
Primarily SEC Regulation Best Interest (Release No. 34-86031, Care Obligation), FINRA Rule 2111 (Suitability), FINRA Rule 2090 (Know Your Customer), and FINRA Notice 21-09 on senior clients. We track subsequent interpretive guidance and enforcement-action fact patterns and fold them into refreshed corpus versions.
The Data Set is sold as a single 130-household package; it isn't subdivided. The subset most buyers ask for — "just the concentrated-senior cases" — is approximately half the corpus and is included. If you need bespoke composition (e.g. a 500-household concentrated-senior corpus for ML training), reach out about custom generation.
We re-run the generation pipeline against current-year tax tables, updated FINRA interpretive guidance, and any enforcement-action fact patterns from the prior 12 months that introduce new red-flag combinations. Existing records are not edited in place; a new corpus version is delivered. Refresh cadence and scope for buyers are still being defined — check back, or reach out if regulatory currency is critical to your evaluation.
JSON (one file per household plus a manifest) and CSV (long-format, one row per account position). The WealthSynth Methodology PDF is included documenting every field's derivation, statistical calibration, and the specific Reg BI scenarios each archetype exercises. Parquet is available on request.
Yes. The schema is designed for direct ingestion: account titling, holdings, recommendation history, and risk-tolerance scores all use field names compatible with major broker-dealer data models. The Methodology PDF includes a field-mapping appendix for the four most common supervisory platforms.
400 prospect households covering RIA client variety from formation through retirement. KYC-complete records, goal-based planning fields, initial recommendation outputs, and CRM-compatible field naming. The broadest single bundle by archetype coverage.
130 affluent and HNW households with detailed fee structures: AUM-based advisory fees, tiered breakpoints, fund expense ratios, transaction costs, and tax-drag estimates. Includes complex fee arrangements (multi-firm, family-office, performance-based).
270 households with monthly cash-flow models including income shocks, expense spikes, and liquidity-stress scenarios. Variable-income earners, single parents, gig workers, distressed mortgages, and post-divorce rebuild cases. Each household has 96 monthly snapshots.
Purchases are for internal use only. Redistribution or resale of data is prohibited under the WealthSchema Data License.
View data license →