wealthschema/data sets/ria-onboarding-stress-test-pack

RIA Onboarding Stress Test Pack

Name: RIA Onboarding Stress Test Pack
Brand: WealthSchema
SKU: B14
Price: 2500.00 USD
Availability: InStock

The friction in RIA onboarding isn't on the happy path. It's on the long tail: the H-1B prospect whose visa status the intake form doesn't capture, the gig worker whose income doesn't fit the W-2 dropdown, the divorcing client whose accounts are in the middle of being retitled, the artist whose royalty income confuses the cash-flow modeler. The RIA Onboarding Stress Test Pack is 400 simulated prospects engineered to exercise every part of your firm's intake process — KYC fields, goal capture, initial recommendation logic, CRM mapping — against the full variety of clients an RIA actually sees.

Households

400

Archetypes

Formats

JSON, CSV

Deviation

Low

Why this Data Set exists

Most onboarding workflows are designed for the modal client and tested against three or four hand-built test fixtures. The result is friction that surfaces only in production: the form that requires a Social Security Number when an ITIN is what the prospect has; the goal-capture flow that doesn't have a category for caregiver-of-aging-parent; the initial-recommendation engine that breaks when the prospect's income is a six-month-trailing-average rather than a current monthly figure.

These aren't edge cases — they're the modal cases for important RIA growth segments. Cross-border professionals, gig-economy earners, recent immigrants, and life-transition clients are the highest-LTV prospects most firms underserve because the onboarding system rejects or misroutes them. Firms find out only after the fact, when win rates from those segments are quietly half what they should be.

This Data Set surfaces the friction in advance. The 400 prospects span 12 archetypes covering every wealth tier, life stage, employment type, and family structure your intake system needs to handle. The broadest archetype coverage of any single bundle in the catalog — that's intentional.

Use Cases

RIA onboarding workflow testing

KYC data validation

Initial financial plan generation

CRM integration testing

Who uses this Data Set

RIA Operations Lead

Runs the firm's onboarding intake against all 400 prospects in a test environment to find every form field, validation rule, and routing decision that breaks on a non-modal prospect. The friction points get prioritised before launch or before a redesign goes live.

Onboarding SaaS Builder (selling to RIAs)

Demos the product's ability to handle complex prospects to RIA buyers using realistic households spanning H-1B tech workers, gig artists, military officers, and dual-residency UHNW couples — without exposing any prospect's real data.

CRM Integration Engineer

Tests the firm's CRM data mapping (Salesforce / Wealthbox / Redtail) by ingesting all 400 prospects and verifying every field round-trips correctly, including custom fields for goal categories and KYC documentation status.

Compliance Lead at a New RIA

Walks the SEC examiner through the firm's KYC documentation process using realistic prospects covering CIP, beneficial ownership, and identity-verification edge cases — demonstrating controls without using actual client data.

Financial Planning Engine Engineer

Validates the initial-plan generation logic across all archetype types, ensuring the engine produces sensible recommendations for prospects whose financial situation doesn't fit a textbook 401k-saver-with-mortgage profile.

What's inside

The 400 prospects span 12 archetypes — the widest archetype coverage of any single Wealth Data Set: from F-01 (New Graduate Tech Worker) and AR-01 (Artist with Royalty Income) on the formation end, through dual-income professionals and small business owners in mid-career, to retirees and complex life-transition cases. Wealth tiers span $50K to $30M+ so every advisory engagement model is represented.

Every prospect has a complete KYC record: identity verification fields (with realistic edge cases — ITIN filers, dual citizens, recent name changes), beneficial ownership for entity-owning prospects, source-of-wealth narrative, and risk-tolerance questionnaire results. Goal capture covers the full goal taxonomy (retirement, college, home purchase, business sale, charitable, transition) with structured priority rankings and time horizons. Initial recommendation outputs include the canonical first-meeting deliverables: asset allocation suggestion, account-type recommendation (which bucket to fund first), insurance gap analysis, and estate planning readiness flags (will/POA/HCD presence, beneficiary completeness).

Field names follow CRM-compatible conventions so direct ingestion into Salesforce, Wealthbox, or Redtail requires minimal transformation. The Data Set ships as JSON (one file per prospect plus a manifest) and CSV (long-format with normalized account/goal/recommendation tables for SQL ingestion), accompanied by the Methodology PDF covering the general generation and reproducibility methodology.

Preview a sample household

A redacted summary of one household from this Data Set — names, employers, exact balances, and metro area are stripped. Ages are bucketed, income and net worth are reported as bands. The full record (and all 400 like it) ships in the ZIP.

F-01·New Graduate Tech Worker

representative archetype household

Household

Single

State

Gross income (band)

$50k–$100k

Net worth (band)

—

Dependents

Income source types

w2 salary, w2 bonus

Members (1)

primary

Age 25–29

professional services

Technical Highlights

→KYC-complete household records

→Goal-based planning fields

→Wide archetype coverage (12+ types)

→CRM-compatible field naming

Sample Schema Fields

sample_record.json

{
  "demographics.household_profile": <value>,
  "accounts.summary_balances": <value>,
  "goals.primary_financial_goals": <value>,
  "kyc.identity_verification_fields": <value>,
  "planning.initial_recommendations": <value>
}

Sample queries

Find prospects whose income type breaks W-2 assumptions

Returns every prospect whose primary income source is non-W-2 (1099 contractor, K-1 distribution, royalty, business owner draw) — the population most likely to expose friction in income-capture forms.

prospects.filter(p =>
  ['1099', 'K-1', 'royalty', 'business_draw'].includes(
    p.income.primary_source_type
  )
)

Surface KYC edge cases by identity-verification status

Filters prospects whose identity verification path is non-standard: ITIN holders, dual citizens, recent name changes, or thin-file credit history. The intake flow that handles these well wins clients competitors lose.

prospects.filter(p =>
  p.kyc.identity_verification_fields.itin_filer ||
  p.kyc.identity_verification_fields.dual_citizen ||
  p.kyc.identity_verification_fields.recent_name_change ||
  p.credit.thin_file_flag
)

Match prospects to advisor specialty

Returns prospects whose financial profile suggests a specific advisor specialty (equity-comp, tax-complex, estate-heavy) — useful for routing logic that pairs the right advisor with the right prospect at intake.

prospects.map(p => ({
  id: p.id,
  routing_specialty: p.equity_comp.grants?.length > 0 ? 'equity_comp'
    : p.business.entity_type ? 'business_owner'
    : p.estate.trust_structures?.length > 0 ? 'estate'
    : 'general'
}))

Generate initial-plan checklist completeness

For each prospect, returns the percentage of the canonical initial-plan checklist that the firm has data to complete from intake alone — surfaces where additional data collection is needed before the first meeting.

prospects.map(p => {
  const required = ['risk_tolerance', 'goals',
    'income', 'expenses', 'assets', 'liabilities',
    'beneficiaries', 'insurance'];
  const present = required.filter(k => p[k] != null);
  return { id: p.id, completeness: present.length / required.length };
})

Methodology

Each prospect is generated against a randomly weighted draw from 12 archetypes, with the weighting deliberately equalising across wealth tiers and life stages so no single segment dominates the 400-prospect corpus. KYC fields draw from realistic distributions of identity-verification scenarios (about 4% ITIN filers, 6% dual citizens, 2% recent name changes — calibrated against Pew demographic data on the financial-services-customer population). Goal-capture and risk-tolerance fields use FINRA Rule 2090–compliant structures. Initial recommendation outputs are produced by the same recommendation logic that generates the Synthetic Wealth Data Sets canonical onboarding outputs, so the recommendations exhibit realistic variation across archetypes. The full corpus passes the consistency validator (KYC fields reconcile, goal priorities sum correctly, recommendation logic is deterministic given inputs) and the LLM-as-judge gate. Each refresh re-runs against current FINRA interpretive guidance and CFP Board CIP standards.

Included Archetypes (12)

F-01New Graduate Tech Worker

Formation

A-01Young Family — First Home

Accumulation

A-03Dual-Income Professional Couple

Accumulation

P-01Peak Earner — Corporate Executive

Accumulation

P-03Dual High-Income Professionals

Accumulation

H-01Affluent Investor ($1M–$3M)

Accumulation

R-01Corporate Pre-Retiree (5 Years Out)

Preservation

RE-01Active Early Retiree

Distribution

S-01Divorce in Progress

Transfer

U-01Unbanked / Recently Banked

Formation

X-01Remote Worker / Digital Nomad

Accumulation

AR-01Artist / Creative (Royalties & Irregular Income)

Accumulation

Frequently asked questions

Why is this called a 'stress test' if it's about onboarding?+

The 'stress test' framing is intentional. Most firms onboard the modal prospect cleanly; the friction emerges with the long-tail prospects whose data doesn't fit the form. Running 400 simulated prospects through the firm's intake stress-tests the workflow in the same way a backend load test stresses an API — surfacing the failures before real clients hit them.

Are the field names compatible with my CRM?+

Field names follow conventions compatible with Salesforce Financial Services Cloud, Wealthbox, and Redtail. For systems with custom field schemas, the JSON nested structure is straightforward to remap.

Does this include onboarding for entity / trust accounts?+

Yes — about 18% of the corpus prospects are bringing entity-owned accounts (LLC, trust, family LP). These have the additional KYC fields (beneficial ownership, entity formation documents, signing-authority structures) that the entity-onboarding path requires. Entity-only test fixtures can be filtered with `prospects.filter(p => p.kyc.entity_owned).`

Can I use this for a sales pitch to RIA prospects?+

Yes. The Data License explicitly permits demonstration use, including in product demos, conference presentations, and sales calls. Many platform vendors use this Data Set as a 'show, don't tell' way of demonstrating onboarding capability to RIA buyers.

How current is the regulatory alignment?+

The KYC, goal-capture, and risk-tolerance fields align with FINRA Rule 2090 (KYC), Reg S-P (privacy), CFP Board CIP standards, and current SEC OCIE examination focus areas. Each refresh updates against any new SEC interpretive guidance from the prior 12 months.

What's the right sample size — do I need all 400?+

For workflow QA, the full 400 is recommended — the long-tail edge cases that surface friction are sparsely distributed. For initial demos or smaller integration tests, a 50-prospect subset stratified across archetypes works (filter to one or two prospects per archetype). The Data Set isn't subdivided for sale; the full 400 is the only purchase option.

Are the initial recommendations realistic enough to use as benchmarks?+

The recommendations are generated by deterministic logic against each prospect's structured data, so they exhibit realistic variation (different age + income + goals → different allocation). They aren't intended as 'gold-standard' recommendations for benchmarking; they're a structurally valid set of outputs your platform can use to test recommendation-display logic, audit trail, and client-facing report generation.

How does this compare with the Master Corpus (B31)?+

B14 is a curated 400-prospect subset focused specifically on onboarding workflow testing — KYC, goals, initial recommendations. B31 is the full 1,451-household structured-JSON corpus with all 30 bundle overlays applied where eligible and 96 monthly longitudinal snapshots per household. If you only need onboarding testing, B14 is the right buy. If you're building multi-product platform infrastructure, B31 is more efficient than buying ten bundles individually.

Related Wealth Data Sets

B01·$5,000

Reg BI Suitability Audit Pack

130 synthetic households tuned for Reg BI suitability testing — concentrated holdings, age 75+, recent inheritance, cognitive decline markers, and risk-mismatch flags. Each record carries the eligibility triggers required to exercise broker-dealer supervisory workflows end to end.

B07·$4,000

Fiduciary Fee Benchmark Dataset

130 affluent and HNW households with detailed fee structures: AUM-based advisory fees, tiered breakpoints, fund expense ratios, transaction costs, and tax-drag estimates. Includes complex fee arrangements (multi-firm, family-office, performance-based).

B27·$5,500

Life Transitions Pack

220 households mid-transition: divorce in progress, post-bankruptcy recovery, medical-debt crisis, sandwich-generation caregivers, recent windowhood, sudden wealth, distressed mortgages, and blended-family formation. High behavioral-event density and asset reshuffling.

$2,500

one-time purchase

400 households (ZIP)

Methodology PDF

JSON, CSV formats

Account required to purchase

Purchases are for internal use only. Redistribution or resale of data is prohibited under the WealthSchema Data License.

View data license →