Insurance illustration engines fail interestingly. The textbook household — healthy primary earner, two healthy children, mortgage, single salary — produces clean illustrations every time. The failure modes show up in the long tail: the household with a special-needs dependent whose lifetime cost projection breaks the straight-line need calculator, the recently widowed prospect whose existing policy structure creates illustration artifacts, the divorcing couple whose coverage needs are mid-restructuring, the blended family where prior-marriage child-support obligations affect the death-benefit calculation. The InsurTech Illustration Validator is 240 households built specifically for these edge cases — the ones your illustration engine needs to handle gracefully but probably doesn't.
Insurance product development happens in a quality-versus-volume tension. Carriers and InsurTech firms test illustration engines against a small number of textbook fixtures — the modal client cases that produce clean outputs. They ship. Customers find the edge cases. Each edge case becomes a one-off bug fix, the regression test set grows ad-hoc, and the engine accumulates technical debt because it was never validated against structurally diverse cases up front.
The alternative — building a high-deviation, structurally diverse test corpus — is what every illustration engineer knows they need but rarely has. Production data from carriers is heavily customer-controlled and often de-identified in ways that strip the structural diversity. Hand-built fixtures are costly and tend to converge on a small number of edge cases that the team happens to think of.
This Data Set provides the high-deviation corpus. 240 households with intentionally edge-case-rich profiles: special-needs dependents, sub-standard underwriting class assignments, blended families with prior-marriage obligations, recent-claim histories that affect ongoing coverage, disability-claimant households, and the rare scenarios that stress-test annuity illustration logic. The high deviation rating is the design intent — this corpus is for finding the failure modes, not for sampling the modal customer.
Validates the illustration engine across 240 households with structurally diverse insurance-need profiles, surfacing the failure modes (illustration math breaks on special-needs trust beneficiaries, sub-standard underwriting class generates wrong premium projections, recent-claim history produces incorrect riders) before they surface at customers.
Tests the platform's illustration capability across underserved segments — special-needs families, neurodiverse households, recent-divorce cases, post-loss households — demonstrating to carrier partners that the platform handles the segments traditional illustration software fumbles.
Tests the firm's needs-analysis tooling against realistic client variety, ensuring the analysis surfaces the right product mix for each client's actual situation rather than defaulting to a one-size-fits-all life + disability recommendation.
Validates the firm's underwriting-class assignment logic against households with realistic risk profiles — including the sub-standard cases (chronic conditions, hazardous occupations, family history flags) where naive underwriting produces incorrect premium projections.
Tests the firm's annuity suitability documentation against realistic prospects, ensuring the suitability analysis correctly handles the cases where annuity recommendation requires extra scrutiny — cognitive-impairment markers, recent-spouse-death, large illiquid commitments relative to net worth.
The 240 households span ten archetypes deliberately weighted toward insurance-edge-case profiles: young families with first-mortgage life-insurance needs (A-01), single parents with sole-income vulnerability (A-02), small business owners with key-person and buy-sell needs (A-04), healthcare professionals with disability-income concerns (A-05), medical-debt-crisis households (S-03), sandwich-generation caregivers (S-04), LGBTQ+ households navigating recent legal-recognition changes (X-03), neurodiverse / disability households (X-04), first-time homebuyers (MB-01), and SSDI claimants (HC-02).
Each household carries a structured insurance-need calculation. Life-insurance need uses the DIME (Debt + Income replacement + Mortgage + Education) methodology with realistic adjustments for special-needs trust beneficiaries (where the lifetime support calculation extends the income-replacement period meaningfully). Disability insurance need is computed against income-replacement target percentages with realistic occupation-class assignments. LTC need projection uses the household's age, health status, and family longevity history. Annuity illustration inputs include the prospect's current asset mix, withdrawal-rate target, and risk tolerance for sequencing-of-returns concerns.
The Data Set ships as JSON, CSV, and Parquet. The WealthSynth Methodology PDF documents the needs-analysis methodology (DIME life, income-replacement disability, age-and-health-adjusted LTC, suitability-anchored annuity), the underwriting-class taxonomy (Preferred Plus through Sub-standard), the riders typically applicable to each archetype, and the calibration source for typical product mixes by household type.
A redacted summary of one household from this Data Set — names, employers, exact balances, and metro area are stripped. Ages are bucketed, income and net worth are reported as bands. The full record (and all 240 like it) ships in the ZIP.
{
"insurance.life.need_calculation": <value>,
"insurance.disability.benefit_amount": <value>,
"insurance.ltc.daily_benefit": <value>,
"insurance.annuity.illustration_inputs": <value>,
"insurance.underwriting.class": <value>
}Returns households whose underwriting class would not be Preferred or Standard — the cases where naive illustration logic produces incorrect premium projections because it assumes Standard pricing.
households.filter(h => h.insurance.underwriting.class !== 'Preferred Plus' && h.insurance.underwriting.class !== 'Preferred' && h.insurance.underwriting.class !== 'Standard' )
Returns households with a special-needs dependent whose lifetime-support cost extends the income-replacement period to age 75+ rather than the standard age 18 or college graduation. The DIME methodology breaks on these without explicit handling.
households.filter(h =>
h.members.some(m =>
m.dependent_status === 'special_needs' &&
m.expected_support_duration_years > 50
)
)Returns annuity prospects whose suitability profile triggers heightened review: cognitive-impairment markers, recent loss of spouse, illiquidity commitment exceeding 25% of net worth, or short remaining life expectancy.
households.filter(h =>
h.insurance.annuity.illustration_inputs &&
(h.members.some(m => m.cognitive_status !== 'none') ||
h.events.life_events.some(e =>
e.type === 'death_of_spouse' && monthsSince(e.date) < 12) ||
h.insurance.annuity.proposed_premium / h.assets.total > 0.25)
)Returns households age 50-65 whose family longevity history and current health status suggest meaningful LTC need probability, AND who don't yet have LTC coverage (insurance-product or self-insured reserves).
households.filter(h => {
const age = h.members.find(m => m.role === 'primary').age;
return age >= 50 && age < 65 &&
h.health.family_longevity_score > 70 &&
!h.insurance.ltc.policy_in_force &&
h.assets.dedicated_ltc_reserve < 200000;
})Each household's insurance need is generated against archetype-specific patterns. Life-insurance need uses DIME with archetype-realistic adjustments: special-needs households extend income-replacement to age 75+; blended families add prior-marriage child-support obligations; small business owners include buy-sell agreement funding; healthcare professionals get disability-income scaling for high-risk occupations. Underwriting class is assigned based on health and lifestyle factors with realistic distributions (about 20% Preferred Plus, 30% Preferred, 35% Standard, 15% Sub-standard or rated). LTC need projections use age, family longevity, and current health markers. Annuity illustrations include the realistic suitability factors that drive review queue. The corpus has high deviation by design — the goal is structural diversity for finding edge cases, not statistical representativeness of the modal customer. The corpus passes the WealthSynth consistency validator (insurance need calculations reconcile with underlying household data; underwriting-class assignment is consistent with health markers; annuity-suitability flags fire when the underlying conditions are met) and the LLM-as-judge gate. Annual refresh updates against current NAIC illustration standards and any state-level suitability rule changes.
By design. The goal is to find illustration-engine failure modes, not to sample the modal customer. Standard fixtures over-represent textbook cases and miss the edge cases. This Data Set deliberately concentrates the edge cases — special-needs households, sub-standard underwriting, recent-claim histories, blended families — so a few hundred test runs against this corpus exercise more failure modes than thousands of runs against a typical-customer fixture.
Yes. The illustration inputs follow NAIC-aligned structures (Sections 1-7 of the Life Insurance Illustrations Model Regulation). The corpus structures support testing both the basic illustration and the rarely-validated supplemental illustrations. Annuity suitability follows the NAIC Suitability in Annuity Transactions Model Regulation as updated.
Yes. About 8% of the corpus has a special-needs dependent (autism spectrum, Down syndrome, cerebral palsy, severe-developmental-delay, or chronic medical conditions). The structured needs-analysis correctly extends the lifetime-support calculation; about 60% of these households also have a third-party special-needs trust structure (where life insurance is the funding mechanism for the trust).
About 15% of the corpus is sub-standard or rated. The conditions driving the rating are realistic and documented: chronic conditions (Type 2 diabetes, sleep apnea, anxiety/depression history), occupational hazard (commercial pilot, oil rig worker, motorcycle racer), or family history (early heart disease, hereditary cancer). The corpus distribution reflects published reinsurer experience.
Yes. About 12% of the corpus has a recent (last 5 years) insurance claim history that affects ongoing coverage — typically a disability claim resolved by return-to-work, a property claim history, or a denied life-insurance application from a prior carrier. The structured claims data lets your illustration logic surface the carrier-relationship constraints these create.
Yes. About 18% of the corpus has structured LTC planning: traditional LTC insurance, hybrid life-LTC products, or self-insurance via dedicated reserves. The product structures include realistic benefit periods, daily benefits, inflation riders, and elimination periods.
Cyber-insurance is increasingly relevant for HNW households. About 6% of the corpus has structured cyber-insurance: identity theft protection, cyber-extortion coverage, fraud-loss coverage, family-office cyber. The product structures are based on the major personal-cyber carriers' product catalogs (Chubb Masterpiece, AIG Private Client).
B10 focuses on the illustration and underwriting side — pre-issue analysis, premium projection, suitability documentation. B15 focuses on the claims side — fraud detection, claim disputes, account-takeover patterns, AML. Carriers serving the full insurance lifecycle typically buy both. InsurTech builders narrowly focused on illustration buy B10; those building claims or fraud-detection tools buy B15.
50 households with embedded fraud, claims, and cybersecurity scenario flags: medical claims disputes, account takeover patterns, predatory lending exposure, and disability claim documentation. High deviation for adversarial ML training.
90 households spanning the mortgage lifecycle: first-time buyers stretched to DTI limit, young families with new mortgages, distressed homeowners evaluating modification, and underwater scenarios. Includes complete mortgage application data, current LTV, and reserve adequacy.
250 households across the healthcare-benefits lifecycle: HDHP-with-HSA accumulators, COBRA/ACA-marketplace gap-fillers, SSDI/LTD claimants, Medicare-bridge pre-retirees, and IRMAA-exposed retirees. HSA-as-retirement-account strategies fully modeled.
Purchases are for internal use only. Redistribution or resale of data is prohibited under the WealthSchema Data License.
View data license →