v4 corpus contract

How WealthSynth households are built and validated

WealthSynth is the synthetic-household-data product behind every Wealth Data Set on this site. This page is the public methodology index — written for compliance teams evaluating the data for examiner use, for engineering teams integrating the JSON, and for academic researchers citing the corpus.

The corpus holds itself to a strict internal-consistency contract: if a generated record disagrees with itself in any way — arithmetic, schema, narrative — it is rejected and regenerated. The result is a dataset your data team can trust without spot-checking.

1,451
households validated
0
P0 failures at ship
96
monthly snapshots per household
31
purpose-built bundles

Six commitments behind every record

These commitments are codified in the generation pipeline and tested on every refresh. They are the answer to the only question that matters when buyers evaluate synthetic data: can I trust this?

Schema-first generation

Every household is generated from a single canonical Zod schema. If a record fails the schema, it is discarded and re-generated — invalid data never touches disk.

Consistency contract

All downstream artifacts — overlays, longitudinal trajectories, tax calculations — are pure projections of the canonical household JSON. No secondary math, no renderer drift.

Strict validation gate

Two-pass validation: deterministic checks for arithmetic and schema, then LLM-assisted review for narrative coherence. Any warning fails the household — no soft passes.

Field-level provenance

Every field carries documented type, range, and derivation logic. Methodology PDFs ship with every Data Set so your data team can audit any number end-to-end.

Refreshes with transparent diffs

Tax law changes, market shifts, and new archetypes flow through the same pipeline. Refreshed corpus versions ship with a changelog showing exactly what moved and why. Cadence and pricing are still being defined.

Synthetic by construction

No real individuals, no GDPR exposure, no data use agreements. Sensitive overlays (race/ethnicity, religion) appear only on the bundles that explicitly require them.

Want the per-bundle methodology PDF?

Every Wealth Data Set ships with a Methodology PDF describing the field derivations, eligibility rules, and statistical calibration for that specific bundle. Browse the catalog to see which bundle fits your use case.