The edge cases that break financial test data — a field guide

WealthSchema StaffPipeline architectureMay 8, 20263 min read

The thing that breaks production fintech is almost never the case the test corpus was built to demonstrate. It is the case the test corpus does not contain — the multi-state filer in a year of a partial-year move, the household that crossed an IRMAA bracket because of a Roth conversion, the tax-lot wash-sale that linked an HSA purchase to a taxable-account loss. These cases are not exotic. They appear in every real production population at frequencies between 1% and 10%. They are exotic only in test corpora that were built without explicit edge-case coverage as a goal.

This article is the catalog. We organize edge cases by domain, give frequency estimates from the literature where they exist, and call out the bug that typically ships when the case is absent from the test data.

Why this catalog exists

The most common procurement question is "what edge cases should a production-grade synthetic financial corpus contain?" The honest answer is "the ones your engine has to handle correctly," but that's not actionable. The actionable answer is the catalog below, which covers the edge cases that 80%+ of wealth-tech engines need to handle and that are missing from most test corpora.

Edge cases

In the catalog. We expect this list to grow.

Population frequency

1–10%

Per case. Aggregated, edge cases are 25–40% of households.

Production incidents

~80%

Of post-deployment bugs trace to an edge case the test corpus didn't cover.

Coverage gap

30–50%

Of the catalog is missing from typical anonymized or generic-synthetic corpora.

Tax edge cases

These are the cases that break tax-aware engines, including TLH, RMD, retirement-income sequencing, and any product that produces a 1099 or 1040 line.

Tax edge cases (by frequency in a representative population)

Multi-state filers — 14% of households (mid-year movers, dual-state earners, students). Apportionment rules vary by state pair; reciprocal-agreement states are not the same as non-reciprocal.
Capital loss carryforwards — 6–10% of investing households, concentrated at UHNW. Carryforwards interact with current-year gains, AMT, and NIIT in non-obvious ways.
Wash-sale across linked accounts — 8% of accounts that trade. The most common bug: treating wash sales as account-local instead of taxpayer-wide (IRA purchases trigger taxable-account losses).
QSBS Section 1202 exclusion — 0.5–1.5% of accounts, 100% of pre-IPO founder accounts. The 5-year holding rule, the 10x basis cap, and the 50/75/100% exclusion bands all interact.
Section 1042 ESOP rollover — < 0.1% but the engines that handle it must not silently fail when they see one.
AMT exposure — 2–4% of high-income households. Reduced post-TCJA but still material for ISOs and SALT-heavy itemizers.
NIIT (3.8%) trigger — 6–8% of households crossing the AGI threshold mid-year.
QBI Section 199A deduction with W-2 wage limitation binding — 1–2% of pass-through households.
Foreign tax credit — 3–5% of investing households via international funds.
Backdoor Roth and pro-rata rule — 4–6% of high-income households. Engines that miss the pro-rata rule produce silently wrong basis tracking.

The case that ships the most surprises is the multi-state mid-year filer. Almost every wealth-tech engine handles same-state-all-year correctly. About half handle multi-state-all-year correctly. About a quarter handle the move-year case correctly — and the move-year case is 4–6% of households in any given year.

Retirement edge cases

These are the cases that break retirement-income, RMD, Social Security, and Medicare-coordinated engines.

Retirement edge cases

RMD year-of-first-RMD timing — 100% of households turning the RMD age in the corpus. The rule that the first RMD can be deferred to April 1 of the following year is silently wrong in many engines.
IRMAA bracket transitions — 8–12% of high-income retirees. Premium increases lag the income event by two years and depend on MFS vs MFJ filing status.
Roth conversion in the gap years (post-retirement, pre-RMD) — 4–6% of households in the relevant age band. The interaction with IRMAA brackets and ACA premium subsidies is the hard part.
Survivor-benefit timing for Social Security — 3–5% of married households where one spouse is significantly older. Optimal claiming strategies depend on relative ages and PIAs.
Required Beginning Date for inherited IRAs (10-year rule post-SECURE) — 1–2% per year. Whether the original owner had reached RMD age changes the rule.
QCDs (qualified charitable distributions) — 2–4% of charitable retirees over 70.5. The QCD lowers AGI in a way that interacts with IRMAA, NIIT, and SS taxability.
401(k) loan during job change — 2–3% of households per year. The 60-day rollover window vs. extended deadlines under the SECURE Act is a common source of bugs.

Insurance and risk edge cases

These are the cases that break insurance illustration validation, suitability, and claims engines.

Insurance edge cases

MEC (modified endowment contract) reclassification — 1–2% of permanent-life policies. Once classified, all withdrawals are taxable. Engines that don't track 7-pay test status silently misreport tax basis.
Section 1035 exchange between policies — 1–3% of policy-holding households per year. Basis carries forward; gain doesn't.
ILIT-owned policies — 0.5–1% of life-insurance holders, 5–10% in HNW. Ownership and beneficiary structure differ from the policyholder; estate-tax treatment differs further.
Long-term-care hybrid policies — 2–4%. Treatment of the LTC rider's premium for tax purposes is non-obvious.
Annuitization vs. systematic withdrawal — 3–5% of annuity holders. Tax treatment is fundamentally different (exclusion ratio vs. LIFO).

Equity compensation edge cases

These break equity-comp engines, founder tools, and pre-IPO platforms.

Equity-comp edge cases

ISO disqualifying disposition — 0.5–1% of ISO-holding households per year. The dispositional event flips the entire tax treatment from LTCG to ordinary.
Section 83(b) election filed within the 30-day window — virtually all early-stage equity grants. Engines that don't track filing date can't reproduce the basis correctly.
RSU vest with mandatory share-sell-to-cover — 80%+ of public-company RSU populations. The withholding rate is statutory 22% federal, which under-withholds for most recipients.
Cliff vesting at the one-year anniversary — every standard 4-year-1-cliff grant. Engines that linearize vesting silently misreport basis in year 1.
Post-termination exercise window — 3–5% of equity-holders per year (job changes). The 90-day default expires before most ISO holders can pay the AMT exposure to exercise.
Net exercise of options — 2–4% of exercise events. Treated differently from cash exercise for AMT purposes.

Crypto and DeFi edge cases

A category that didn't exist five years ago and now appears in 4–8% of households.

Crypto / DeFi edge cases

Hard fork and airdrop receipts — taxable as ordinary income at receipt FMV. 1–3% of crypto-holding households per year.
Staking rewards — taxable as ordinary income at receipt; basis at receipt becomes long-term basis after 12 months. 2–4% of crypto holders.
DeFi liquidity-pool token receipts — disposition vs. non-disposition treatment is unsettled; engines have to track both interpretations.
NFT transactions — collectibles tax rate (28%) on long-term gains, ordinary on short-term. 1–2% of crypto holders.
Cross-chain bridge transfers — gas costs, token wrapping, and the question of whether the bridge is itself a taxable event. Defensible policy varies by chain.

Compliance and lending edge cases

The cases that break Reg B / ECOA fair-lending audits, Reg BI suitability scoring, and underwriting models.

Formula

The fair-lending edge case

P(disparate_impact | model_output, demographic_overlay) ≠ 0

model_output: = Lending decision, suitability score, or rate quote
demographic_overlay: = Race, ethnicity, sex, marital status — explicitly NOT in the input but inferable from ZIP, surname, household structure
P: = Probability that adding the demographic overlay materially changes the model's output distribution

The audit asks whether your model produces statistically different outcomes across protected classes given the same observable inputs. The edge cases that surface disparate impact are the ones where ZIP-coded census demographics (highly correlated with protected classes) drive the model's output more than the explicit financial inputs do. Synthetic corpora without explicit demographic-distribution control cannot exercise this audit code path.

Compliance / lending edge cases

Thin-file applicants — 8–12% of credit-seeking population. Models trained on full-file populations fail audits when run against thin-file applicants.
ITIN filers — 2–4% in some markets. Engines that assume SSN-shaped IDs ship validation failures the first time they see an ITIN.
Borrowers in active forbearance or modification — 1–3% in any given year, much higher in stress years. Underwriting models that don't see this in test data extrapolate badly.
First-time borrowers in CRA-assessment areas — 4–6%. Distributional skew of training data toward established borrowers produces fair-lending exposure.
Self-employed / 1099 applicants with two-year averaging — 8–10% of mortgage applicants. The two-year average can hide a sharp recent decline that should change underwriting.

What to do with the catalog

We use the catalog three ways internally. First, as a coverage rubric for the corpora we generate — every archetype × edge-case combination is either represented (by archetype invariant or overlay) or explicitly out of scope for that corpus. Second, as a procurement audit tool — given a candidate vendor's corpus, we run a query for each edge case and check the population frequency against the catalog's expected range. Third, as a regression tracker — when a new bug ships in production, we add the case to the catalog and the regression test suite. For one of the harder edge-case clusters — high-balance federal student debt where IDR plan, PSLF eligibility, and refi-vs-forgiveness interact — see student-loan IDR, PSLF, and the refi-vs-forgiveness decision.

The catalog is not exhaustive. New edge cases appear every year as regulations change and product types evolve. The version we ship internally is on its fourteenth revision since 2023. If you have edge cases we haven't covered that have bitten you in production, send them — the catalog gets stronger with every contribution.

Key takeaways

Production fintech bugs are usually edge cases the test corpus didn't contain. Frequencies are 1–10% per case but 25–40% in aggregate.
Tax edge cases (multi-state filers, wash-sale across accounts, QSBS, NIIT) are the most commonly missed and the most expensive when they ship.
Retirement edge cases (RMD timing, IRMAA brackets, Roth conversion windows) interact with each other in ways that single-case test data can't exercise.
Crypto / DeFi edge cases didn't exist five years ago and now appear in 4–8% of households — corpora that pre-date 2023 are increasingly stale on this dimension.
Compliance edge cases (thin-file, ITIN, forbearance, CRA areas) are the most common source of fair-lending audit failures and the least common in commercial test corpora.
Run a five-case audit against any candidate corpus before procurement. The hour saved when three of five come up empty is the best return-on-time in synthetic-data evaluation.

Frequently asked questions

How often should we refresh the catalog and the test corpus?+

Annually at minimum, more often if your engine touches actively-changing regulations (SECURE Act updates, new state tax rules, crypto guidance). The catalog should track regulatory and product-type changes; the test corpus should be regenerated against the updated catalog at refresh.

How do we exercise edge cases that appear in <0.1% of records — do we just oversample?+

Two approaches. Oversampling: explicit overlay in the synthetic corpus that produces, say, 5% Section 1042 ESOP rollovers even though the population frequency is < 0.1%. This works for engineering testing but distorts any population-level analysis you want to run on the same corpus. Stratified corpus: ship two corpora, one population-realistic and one engineering-test-oversampled. Most production teams end up with the stratified approach once they realize the oversampled corpus is dangerous for backtesting.

Can we generate edge-case-only corpora rather than coverage corpora?+

Yes, and we ship some. The risk is that an engine that passes an edge-case-only test fails on the cases the corpus didn't include — a corpus of 100 multi-state filers tells you the engine handles multi-state, not whether it handles multi-state-with-RSUs-vesting-from-pre-move-grants. Edge-case-only corpora are useful as targeted regression suites and dangerous as primary test data.

What's the right population frequency for a representative wealth-tech corpus?+

Match the actual population you serve where possible. If you serve mass-affluent broadly, the catalog's frequency ranges are roughly right. If you serve UHNW, multiply many of the tax-edge-case frequencies by 5–10x — UHNW households accumulate edge cases. If you serve a specific segment (military households, faith-based investors, ESOP companies), the relevant overlays should be effectively 100%.