wealthschema/data sets/ria-onboarding-stress-test-pack
All Data Sets

RIA Onboarding Stress Test Pack

The friction in RIA onboarding isn't on the happy path. It's on the long tail: the H-1B prospect whose visa status the intake form doesn't capture, the gig worker whose income doesn't fit the W-2 dropdown, the divorcing client whose accounts are in the middle of being retitled, the artist whose royalty income confuses the cash-flow modeler. The RIA Onboarding Stress Test Pack is 400 simulated prospects engineered to exercise every part of your firm's intake process — KYC fields, goal capture, initial recommendation logic, CRM mapping — against the full variety of clients an RIA actually sees.

Households
400
Archetypes
12
Formats
JSON, CSV
Deviation
Low

Why this Data Set exists

Most onboarding workflows are designed for the modal client and tested against three or four hand-built test fixtures. The result is friction that surfaces only in production: the form that requires a Social Security Number when an ITIN is what the prospect has; the goal-capture flow that doesn't have a category for caregiver-of-aging-parent; the initial-recommendation engine that breaks when the prospect's income is a six-month-trailing-average rather than a current monthly figure.

These aren't edge cases — they're the modal cases for important RIA growth segments. Cross-border professionals, gig-economy earners, recent immigrants, and life-transition clients are the highest-LTV prospects most firms underserve because the onboarding system rejects or misroutes them. Firms find out only after the fact, when win rates from those segments are quietly half what they should be.

This Data Set surfaces the friction in advance. The 400 prospects span 12 archetypes covering every wealth tier, life stage, employment type, and family structure your intake system needs to handle. The broadest archetype coverage of any single bundle in the catalog — that's intentional.

Use Cases

RIA onboarding workflow testing
KYC data validation
Initial financial plan generation
CRM integration testing

Who uses this Data Set

RIA Operations Lead

Runs the firm's onboarding intake against all 400 prospects in a test environment to find every form field, validation rule, and routing decision that breaks on a non-modal prospect. The friction points get prioritised before launch or before a redesign goes live.

Onboarding SaaS Builder (selling to RIAs)

Demos the product's ability to handle complex prospects to RIA buyers using realistic households spanning H-1B tech workers, gig artists, military officers, and dual-residency UHNW couples — without exposing any prospect's real data.

CRM Integration Engineer

Tests the firm's CRM data mapping (Salesforce / Wealthbox / Redtail) by ingesting all 400 prospects and verifying every field round-trips correctly, including custom fields for goal categories and KYC documentation status.

Compliance Lead at a New RIA

Walks the SEC examiner through the firm's KYC documentation process using realistic prospects covering CIP, beneficial ownership, and identity-verification edge cases — demonstrating controls without using actual client data.

Financial Planning Engine Engineer

Validates the initial-plan generation logic across all archetype types, ensuring the engine produces sensible recommendations for prospects whose financial situation doesn't fit a textbook 401k-saver-with-mortgage profile.

What's inside

The 400 prospects span 12 archetypes — the widest archetype coverage of any single Wealth Data Set: from F-01 (New Graduate Tech Worker) and AR-01 (Artist with Royalty Income) on the formation end, through dual-income professionals and small business owners in mid-career, to retirees and complex life-transition cases. Wealth tiers span $50K to $30M+ so every advisory engagement model is represented.

Every prospect has a complete KYC record: identity verification fields (with realistic edge cases — ITIN filers, dual citizens, recent name changes), beneficial ownership for entity-owning prospects, source-of-wealth narrative, and risk-tolerance questionnaire results. Goal capture covers the full goal taxonomy (retirement, college, home purchase, business sale, charitable, transition) with structured priority rankings and time horizons. Initial recommendation outputs include the canonical first-meeting deliverables: asset allocation suggestion, account-type recommendation (which bucket to fund first), insurance gap analysis, and estate planning readiness flags (will/POA/HCD presence, beneficiary completeness).

Field names follow CRM-compatible conventions so direct ingestion into Salesforce, Wealthbox, or Redtail requires minimal transformation. The Data Set ships as JSON (one file per prospect plus a manifest) and CSV (long-format with normalized account/goal/recommendation tables for SQL ingestion), accompanied by the WealthSynth Methodology PDF — covering the field schema, the CRM-mapping appendix, and the calibration source for each archetype's KYC/goal/recommendation defaults.

Preview a sample household

A redacted summary of one household from this Data Set — names, employers, exact balances, and metro area are stripped. Ages are bucketed, income and net worth are reported as bands. The full record (and all 400 like it) ships in the ZIP.

F-01·New Graduate Tech Worker
representative archetype household
Household
Single
State
WV
Gross income (band)
$50k–$100k
Net worth (band)
Dependents
0
Income source types
w2 salary, w2 bonus
Members (1)
primary
Age 25–29
professional services

Technical Highlights

KYC-complete household records
Goal-based planning fields
Wide archetype coverage (12+ types)
CRM-compatible field naming

Sample Schema Fields

sample_record.json
{
  "demographics.household_profile": <value>,
  "accounts.summary_balances": <value>,
  "goals.primary_financial_goals": <value>,
  "kyc.identity_verification_fields": <value>,
  "planning.initial_recommendations": <value>
}

Sample queries

Find prospects whose income type breaks W-2 assumptions

Returns every prospect whose primary income source is non-W-2 (1099 contractor, K-1 distribution, royalty, business owner draw) — the population most likely to expose friction in income-capture forms.

prospects.filter(p =>
  ['1099', 'K-1', 'royalty', 'business_draw'].includes(
    p.income.primary_source_type
  )
)
Surface KYC edge cases by identity-verification status

Filters prospects whose identity verification path is non-standard: ITIN holders, dual citizens, recent name changes, or thin-file credit history. The intake flow that handles these well wins clients competitors lose.

prospects.filter(p =>
  p.kyc.identity_verification_fields.itin_filer ||
  p.kyc.identity_verification_fields.dual_citizen ||
  p.kyc.identity_verification_fields.recent_name_change ||
  p.credit.thin_file_flag
)
Match prospects to advisor specialty

Returns prospects whose financial profile suggests a specific advisor specialty (equity-comp, tax-complex, estate-heavy) — useful for routing logic that pairs the right advisor with the right prospect at intake.

prospects.map(p => ({
  id: p.id,
  routing_specialty: p.equity_comp.grants?.length > 0 ? 'equity_comp'
    : p.business.entity_type ? 'business_owner'
    : p.estate.trust_structures?.length > 0 ? 'estate'
    : 'general'
}))
Generate initial-plan checklist completeness

For each prospect, returns the percentage of the canonical initial-plan checklist that the firm has data to complete from intake alone — surfaces where additional data collection is needed before the first meeting.

prospects.map(p => {
  const required = ['risk_tolerance', 'goals',
    'income', 'expenses', 'assets', 'liabilities',
    'beneficiaries', 'insurance'];
  const present = required.filter(k => p[k] != null);
  return { id: p.id, completeness: present.length / required.length };
})

Methodology

Each prospect is generated against a randomly weighted draw from 12 archetypes, with the weighting deliberately equalising across wealth tiers and life stages so no single segment dominates the 400-prospect corpus. KYC fields draw from realistic distributions of identity-verification scenarios (about 4% ITIN filers, 6% dual citizens, 2% recent name changes — calibrated against Pew demographic data on the financial-services-customer population). Goal-capture and risk-tolerance fields use FINRA Rule 2090–compliant structures. Initial recommendation outputs are produced by the same recommendation logic that generates the WealthSynth canonical onboarding outputs, so the recommendations exhibit realistic variation across archetypes. The full corpus passes the WealthSynth consistency validator (KYC fields reconcile, goal priorities sum correctly, recommendation logic is deterministic given inputs) and the LLM-as-judge gate. Annual refresh re-runs against current FINRA interpretive guidance and CFP Board CIP standards.

Included Archetypes (12)

Frequently asked questions

Why is this called a 'stress test' if it's about onboarding?+

The 'stress test' framing is intentional. Most firms onboard the modal prospect cleanly; the friction emerges with the long-tail prospects whose data doesn't fit the form. Running 400 simulated prospects through the firm's intake stress-tests the workflow in the same way a backend load test stresses an API — surfacing the failures before real clients hit them.

Are the field names compatible with my CRM?+

Field names follow conventions compatible with Salesforce Financial Services Cloud, Wealthbox, and Redtail. The Methodology PDF includes an explicit field-mapping appendix for the four most common RIA CRMs. For systems with custom field schemas, the JSON nested structure is straightforward to remap.

Does this include onboarding for entity / trust accounts?+

Yes — about 18% of the corpus prospects are bringing entity-owned accounts (LLC, trust, family LP). These have the additional KYC fields (beneficial ownership, entity formation documents, signing-authority structures) that the entity-onboarding path requires. Entity-only test fixtures can be filtered with `prospects.filter(p => p.kyc.entity_owned).`

Can I use this for a sales pitch to RIA prospects?+

Yes. The Data License explicitly permits demonstration use, including in product demos, conference presentations, and sales calls. Many platform vendors use this Data Set as a 'show, don't tell' way of demonstrating onboarding capability to RIA buyers.

How current is the regulatory alignment?+

The KYC, goal-capture, and risk-tolerance fields align with FINRA Rule 2090 (KYC), Reg S-P (privacy), CFP Board CIP standards, and current SEC OCIE examination focus areas. Annual refresh updates against any new SEC interpretive guidance from the prior 12 months.

What's the right sample size — do I need all 400?+

For workflow QA, the full 400 is recommended — the long-tail edge cases that surface friction are sparsely distributed. For initial demos or smaller integration tests, a 50-prospect subset stratified across archetypes works (filter to one or two prospects per archetype). The Data Set isn't subdivided for sale; the full 400 is the only purchase option.

Are the initial recommendations realistic enough to use as benchmarks?+

The recommendations are generated by deterministic logic against each prospect's structured data, so they exhibit realistic variation (different age + income + goals → different allocation). They aren't intended as 'gold-standard' recommendations for benchmarking; they're a structurally valid set of outputs your platform can use to test recommendation-display logic, audit trail, and client-facing report generation.

How does this compare with the Master Corpus (B31)?+

B14 is a curated 400-prospect subset focused specifically on onboarding workflow testing — KYC, goals, initial recommendations. B31 is the full 1,451-household structured-JSON corpus with all 30 bundle overlays applied where eligible and 96 monthly longitudinal snapshots per household. If you only need onboarding testing, B14 is the right buy. If you're building multi-product platform infrastructure, B31 is more efficient than buying ten bundles individually.

Related Wealth Data Sets

$2,500
one-time purchase
400 households (ZIP)
Methodology PDF
JSON, CSV formats
Account required to purchase

Purchases are for internal use only. Redistribution or resale of data is prohibited under the WealthSchema Data License.

View data license →