wealthschema/data sets/insurtech-illustration-validator
All Data Sets

InsurTech Illustration Validator

Insurance illustration engines fail interestingly. The textbook household — healthy primary earner, two healthy children, mortgage, single salary — produces clean illustrations every time. The failure modes show up in the long tail: the household with a special-needs dependent whose lifetime cost projection breaks the straight-line need calculator, the recently widowed prospect whose existing policy structure creates illustration artifacts, the divorcing couple whose coverage needs are mid-restructuring, the blended family where prior-marriage child-support obligations affect the death-benefit calculation. The InsurTech Illustration Validator is 240 households built specifically for these edge cases — the ones your illustration engine needs to handle gracefully but probably doesn't.

Households
240
Archetypes
10
Formats
JSON, CSV, Parquet
Deviation
High

Why this Data Set exists

Insurance product development happens in a quality-versus-volume tension. Carriers and InsurTech firms test illustration engines against a small number of textbook fixtures — the modal client cases that produce clean outputs. They ship. Customers find the edge cases. Each edge case becomes a one-off bug fix, the regression test set grows ad-hoc, and the engine accumulates technical debt because it was never validated against structurally diverse cases up front.

The alternative — building a high-deviation, structurally diverse test corpus — is what every illustration engineer knows they need but rarely has. Production data from carriers is heavily customer-controlled and often de-identified in ways that strip the structural diversity. Hand-built fixtures are costly and tend to converge on a small number of edge cases that the team happens to think of.

This Data Set provides the high-deviation corpus. 240 households with intentionally edge-case-rich profiles: special-needs dependents, sub-standard underwriting class assignments, blended families with prior-marriage obligations, recent-claim histories that affect ongoing coverage, disability-claimant households, and the rare scenarios that stress-test annuity illustration logic. The high deviation rating is the design intent — this corpus is for finding the failure modes, not for sampling the modal customer.

Use Cases

Illustration engine validation
Needs analysis algorithm testing
Annuity suitability modeling
Underwriting edge-case discovery

Who uses this Data Set

Insurance Carrier Illustration Engine Engineer

Validates the illustration engine across 240 households with structurally diverse insurance-need profiles, surfacing the failure modes (illustration math breaks on special-needs trust beneficiaries, sub-standard underwriting class generates wrong premium projections, recent-claim history produces incorrect riders) before they surface at customers.

InsurTech Builder Targeting Underserved Segments

Tests the platform's illustration capability across underserved segments — special-needs families, neurodiverse households, recent-divorce cases, post-loss households — demonstrating to carrier partners that the platform handles the segments traditional illustration software fumbles.

Independent Insurance Agent Building Tech-Enabled Practice

Tests the firm's needs-analysis tooling against realistic client variety, ensuring the analysis surfaces the right product mix for each client's actual situation rather than defaulting to a one-size-fits-all life + disability recommendation.

Underwriting Software Engineer

Validates the firm's underwriting-class assignment logic against households with realistic risk profiles — including the sub-standard cases (chronic conditions, hazardous occupations, family history flags) where naive underwriting produces incorrect premium projections.

Annuity Suitability Reviewer

Tests the firm's annuity suitability documentation against realistic prospects, ensuring the suitability analysis correctly handles the cases where annuity recommendation requires extra scrutiny — cognitive-impairment markers, recent-spouse-death, large illiquid commitments relative to net worth.

What's inside

The 240 households span ten archetypes deliberately weighted toward insurance-edge-case profiles: young families with first-mortgage life-insurance needs (A-01), single parents with sole-income vulnerability (A-02), small business owners with key-person and buy-sell needs (A-04), healthcare professionals with disability-income concerns (A-05), medical-debt-crisis households (S-03), sandwich-generation caregivers (S-04), LGBTQ+ households navigating recent legal-recognition changes (X-03), neurodiverse / disability households (X-04), first-time homebuyers (MB-01), and SSDI claimants (HC-02).

Each household carries a structured insurance-need calculation. Life-insurance need uses the DIME (Debt + Income replacement + Mortgage + Education) methodology with realistic adjustments for special-needs trust beneficiaries (where the lifetime support calculation extends the income-replacement period meaningfully). Disability insurance need is computed against income-replacement target percentages with realistic occupation-class assignments. LTC need projection uses the household's age, health status, and family longevity history. Annuity illustration inputs include the prospect's current asset mix, withdrawal-rate target, and risk tolerance for sequencing-of-returns concerns.

The Data Set ships as JSON, CSV, and Parquet. The WealthSynth Methodology PDF documents the needs-analysis methodology (DIME life, income-replacement disability, age-and-health-adjusted LTC, suitability-anchored annuity), the underwriting-class taxonomy (Preferred Plus through Sub-standard), the riders typically applicable to each archetype, and the calibration source for typical product mixes by household type.

Preview a sample household

A redacted summary of one household from this Data Set — names, employers, exact balances, and metro area are stripped. Ages are bucketed, income and net worth are reported as bands. The full record (and all 240 like it) ships in the ZIP.

A-01·Young Family — First Home
representative archetype household
Household
Married Joint
State
FL
Gross income (band)
$100k–$200k
Net worth (band)
Dependents
1
Income source types
w2 salary, w2 bonus
Members (3)
primary
Age 35–39
healthcare
spouse
Age 40–44
healthcare
dependent
Age 10–14

Technical Highlights

NAIC illustration standards alignment
High deviation for edge case testing
Multi-product insurance needs
Underwriting class flags

Sample Schema Fields

sample_record.json
{
  "insurance.life.need_calculation": <value>,
  "insurance.disability.benefit_amount": <value>,
  "insurance.ltc.daily_benefit": <value>,
  "insurance.annuity.illustration_inputs": <value>,
  "insurance.underwriting.class": <value>
}

Sample queries

Find sub-standard underwriting cases

Returns households whose underwriting class would not be Preferred or Standard — the cases where naive illustration logic produces incorrect premium projections because it assumes Standard pricing.

households.filter(h =>
  h.insurance.underwriting.class !== 'Preferred Plus' &&
  h.insurance.underwriting.class !== 'Preferred' &&
  h.insurance.underwriting.class !== 'Standard'
)
Identify special-needs lifetime-support calculations

Returns households with a special-needs dependent whose lifetime-support cost extends the income-replacement period to age 75+ rather than the standard age 18 or college graduation. The DIME methodology breaks on these without explicit handling.

households.filter(h =>
  h.members.some(m =>
    m.dependent_status === 'special_needs' &&
    m.expected_support_duration_years > 50
  )
)
Surface annuity suitability flags

Returns annuity prospects whose suitability profile triggers heightened review: cognitive-impairment markers, recent loss of spouse, illiquidity commitment exceeding 25% of net worth, or short remaining life expectancy.

households.filter(h =>
  h.insurance.annuity.illustration_inputs &&
  (h.members.some(m => m.cognitive_status !== 'none') ||
   h.events.life_events.some(e =>
     e.type === 'death_of_spouse' && monthsSince(e.date) < 12) ||
   h.insurance.annuity.proposed_premium / h.assets.total > 0.25)
)
Identify LTC planning candidates

Returns households age 50-65 whose family longevity history and current health status suggest meaningful LTC need probability, AND who don't yet have LTC coverage (insurance-product or self-insured reserves).

households.filter(h => {
  const age = h.members.find(m => m.role === 'primary').age;
  return age >= 50 && age < 65 &&
    h.health.family_longevity_score > 70 &&
    !h.insurance.ltc.policy_in_force &&
    h.assets.dedicated_ltc_reserve < 200000;
})

Methodology

Each household's insurance need is generated against archetype-specific patterns. Life-insurance need uses DIME with archetype-realistic adjustments: special-needs households extend income-replacement to age 75+; blended families add prior-marriage child-support obligations; small business owners include buy-sell agreement funding; healthcare professionals get disability-income scaling for high-risk occupations. Underwriting class is assigned based on health and lifestyle factors with realistic distributions (about 20% Preferred Plus, 30% Preferred, 35% Standard, 15% Sub-standard or rated). LTC need projections use age, family longevity, and current health markers. Annuity illustrations include the realistic suitability factors that drive review queue. The corpus has high deviation by design — the goal is structural diversity for finding edge cases, not statistical representativeness of the modal customer. The corpus passes the WealthSynth consistency validator (insurance need calculations reconcile with underlying household data; underwriting-class assignment is consistent with health markers; annuity-suitability flags fire when the underlying conditions are met) and the LLM-as-judge gate. Annual refresh updates against current NAIC illustration standards and any state-level suitability rule changes.

Included Archetypes (10)

Frequently asked questions

Why is this Data Set rated 'High deviation'?+

By design. The goal is to find illustration-engine failure modes, not to sample the modal customer. Standard fixtures over-represent textbook cases and miss the edge cases. This Data Set deliberately concentrates the edge cases — special-needs households, sub-standard underwriting, recent-claim histories, blended families — so a few hundred test runs against this corpus exercise more failure modes than thousands of runs against a typical-customer fixture.

Are NAIC illustration standards reflected?+

Yes. The illustration inputs follow NAIC-aligned structures (Sections 1-7 of the Life Insurance Illustrations Model Regulation). The corpus structures support testing both the basic illustration and the rarely-validated supplemental illustrations. Annuity suitability follows the NAIC Suitability in Annuity Transactions Model Regulation as updated.

Are special-needs households realistic?+

Yes. About 8% of the corpus has a special-needs dependent (autism spectrum, Down syndrome, cerebral palsy, severe-developmental-delay, or chronic medical conditions). The structured needs-analysis correctly extends the lifetime-support calculation; about 60% of these households also have a third-party special-needs trust structure (where life insurance is the funding mechanism for the trust).

How are sub-standard underwriting cases distributed?+

About 15% of the corpus is sub-standard or rated. The conditions driving the rating are realistic and documented: chronic conditions (Type 2 diabetes, sleep apnea, anxiety/depression history), occupational hazard (commercial pilot, oil rig worker, motorcycle racer), or family history (early heart disease, hereditary cancer). The corpus distribution reflects published reinsurer experience.

Are claim histories represented?+

Yes. About 12% of the corpus has a recent (last 5 years) insurance claim history that affects ongoing coverage — typically a disability claim resolved by return-to-work, a property claim history, or a denied life-insurance application from a prior carrier. The structured claims data lets your illustration logic surface the carrier-relationship constraints these create.

Does the corpus include long-term-care planning?+

Yes. About 18% of the corpus has structured LTC planning: traditional LTC insurance, hybrid life-LTC products, or self-insurance via dedicated reserves. The product structures include realistic benefit periods, daily benefits, inflation riders, and elimination periods.

What about cyber-insurance products?+

Cyber-insurance is increasingly relevant for HNW households. About 6% of the corpus has structured cyber-insurance: identity theft protection, cyber-extortion coverage, fraud-loss coverage, family-office cyber. The product structures are based on the major personal-cyber carriers' product catalogs (Chubb Masterpiece, AIG Private Client).

How does this fit alongside B15 (Insurance Claims & Cybersecurity)?+

B10 focuses on the illustration and underwriting side — pre-issue analysis, premium projection, suitability documentation. B15 focuses on the claims side — fraud detection, claim disputes, account-takeover patterns, AML. Carriers serving the full insurance lifecycle typically buy both. InsurTech builders narrowly focused on illustration buy B10; those building claims or fraud-detection tools buy B15.

Related Wealth Data Sets

$8,250
one-time purchase
240 households (ZIP)
Methodology PDF
JSON, CSV, Parquet formats
Account required to purchase

Purchases are for internal use only. Redistribution or resale of data is prohibited under the WealthSchema Data License.

View data license →