QA leads in wealth-tech work in a domain where the cost of a missed bug is denominated in regulatory penalty and customer trust, where test data is the structural bottleneck for almost every meaningful test, and where the test-data conversation rarely makes it onto the engineering organization's roadmap until the third or fourth sprint of a release crisis. This guide is for QA leads who want to elevate that conversation to where it belongs.
The walkthrough is structured in four parts: what wealth-tech testing actually has to verify (functional, regulatory, performance, edge), why test data is the layer most teams under-invest in, the four anti-patterns we see in production QA programs (CSV-in-a-spreadsheet, masked-prod, single-corpus-fits-all, fixture-by-developer), and a six-tier architecture with a multi-quarter roadmap to move whatever the team has now into something that holds up under audit.
What wealth-tech testing actually requires
A wealth-tech application — a planning tool, an RIA platform, a robo-advisor, a tax engine, a brokerage interface — is a system that makes high-stakes decisions about people's money. The QA program supporting it has to verify, at minimum:
A single test corpus can't exercise all of these. The structural insight that distinguishes mature QA programs from less-mature ones is: test data is a tier of test infrastructure, not a one-time fixture.
The four common test-data anti-patterns
In QA programs we've reviewed, four anti-patterns recur:
Anti-pattern 1: The fixtures-file-from-2019
A tests/fixtures/households.json with twenty households checked into the repo. Originally built for a feature that shipped three years ago. Subsequently amended by every engineer who needed test data for their specific feature. Now contains forty-five households, each subtly load-bearing for some test.
Why it persists. Frictionless to add to. The cost of replacing it is concentrated; the cost of leaving it is diffuse.
Why it fails. No statistical realism. No coverage guarantee. Updates break tests written years ago. The file grows monotonically until it's unmaintainable.
The transition out. Inventory which tests depend on which fixture. Group tests by what they're testing. Replace the fixture file with purpose-specific corpora per test category.
Anti-pattern 2: Production data with masking
The team copies production to staging and "masks" it — usually replacing names and SSNs, sometimes addresses, rarely doing more. Test cases are written against this masked dataset.
Why it persists. Feels realistic. No engineering investment. Doesn't require thinking about what test cases are needed.
Why it fails. Privacy compliance is shaky (the 2023 GLBA Safeguards Rule amendments effectively require any environment with customer information to apply production-grade controls). Coverage is whatever production happens to contain. Tests are non-deterministic (production data changes on refresh). And examiners notice — "show me how you tested against the rare-pattern cases" doesn't have a good answer.
The transition out. Migrate staging off production data. Use synthetic data calibrated to production-realistic distributions. Augment with archetype-driven synthetic data for the cases production under-represents. Apply the migration playbook.
Anti-pattern 3: Faker-generated mock data
A script using Faker, Mockaroo, or similar generates 5,000 households for testing. Names look like names. Addresses look like addresses. Numbers fall within reasonable-looking ranges.
Why it persists. Free or near-free. Generates volume.
Why it fails. No internal coherence. A 28-year-old with $5M in a Roth IRA. A retiree filing Form 8615. The records exist as collections of individually-plausible fields without relationships between them. Models, algorithms, and rule engines that depend on the relationships will perform unpredictably.
The transition out. Identify which tests depend on field-level relationships and household coherence (most do, in wealth-tech). Use Faker for the load tests and smoke tests that don't; use archetype-driven synthetic data for the rest. See the maturity curve.
Anti-pattern 4: The hand-curated edge case file
The team has, in addition to the main fixture file, an edge_cases.json with twenty manually-constructed households representing specific edge cases. Built by a senior engineer over a weekend during a release crisis.
Why it persists. Catches the cases the main fixture file misses. Owned by a specific person.
Why it fails. Capacity-limited by whoever owns it. Coverage is whatever specific cases that person has thought of. Doesn't generalize — the edge cases each test is targeting aren't documented as a typology, just as individual records.
The transition out. Convert the edge cases from records-in-a-file to specifications-in-a-document. Each edge case is a documented test pattern with what it's testing, why it's important, and what production failure would surface if it weren't covered. Then generate records that satisfy each specification. Our Wealth-App Edge-Case Test Coverage Audit Checklist is a starting template.
A defensible test-data architecture
Here's the architecture mature QA programs converge to:
| Purpose | Typical size | Source | |
|---|---|---|---|
| Tier 1 — Smoke + unit | Verify a known-good record on a code path | 5–50 records | Hand-curated, in repo. Faker OK for non-logic fields. |
| Tier 2 — Functional | Exercise actual business logic across realistic inputs | 500–5,000 | Archetype-driven, statistically realistic. |
| Tier 3 — Edge case | Regression test that the engine handles known-hard cases | 100–500 distinct cases | Each record satisfies a documented edge-case specification. |
| Tier 4 — Regulatory | Exercise specific regulatory requirements | Smaller, carefully calibrated | Reg BI households, fair-lending households, AML households, AG-49-A policies. |
| Tier 5 — Load + performance | Capacity planning, performance regression | 50K–500K | Statistically realistic, doesn't need specific business-logic exercise. |
| Tier 6 — Production-validation | Validate that synthetic-trained behavior generalizes | Held-out production sample, smaller | Real production, careful governance. |
A QA program operating across all six tiers has the structural answer to the auditor question "show me how you tested this." A program operating with only Tiers 1 and 2 has gaps that will surface in production.
How to make the case to engineering and product
A common QA-lead frustration: knowing what's needed and not being able to get the engineering investment to build it. A few framings that have worked:
What we'd give a QA lead starting fresh
If you've inherited a QA program with one of the four anti-patterns above and you're trying to elevate it, here's the sequence we'd recommend:
- Q1Inventory + categorizeMap every test in the suite to what it's actually testing — functional, edge case, regulatory, performance, integration. Inventory which tests depend on which test data.
- Q2Build the edge-case specification documentDocument each edge case the engine has to handle. Input pattern, expected output, regulatory or business requirement satisfied.
- Q3Stand up the Tier 3 edge-case corpusGenerate records satisfying each specification. Replace the hand-curated edge case file. Wire into the regression suite.
- Q4Migrate Tiers 1 + 2 to coherent synthetic dataReplace the fixtures-file-from-2019 with archetype-driven generation. Keep Faker only for simple-smoke cases that don't depend on coherence.
- Year 2Expand to Tiers 4 + 5Build regulatory-test corpora for each compliance requirement. Build the load-test corpus.
- Year 3Establish production-validation cadenceFinal-stage production validation as a regular release step. Document the cadence. Quantify the synthetic-to-production calibration.
This is a multi-year transition. Each quarter produces a concrete improvement that justifies the next.
The procurement question
QA leads often face the build-versus-buy question for synthetic test data. The honest answer:
| Build | Buy | |
|---|---|---|
| When to choose | Unusual domain requirements no off-the-shelf corpus addresses; meaningful in-house data engineering capacity; 3+ year maintenance burden acceptable. | Standard wealth-tech testing needs; no funded synthetic-data engineering function; time-to-coverage matters more than per-record cost. |
For most wealth-tech QA programs, "buy the standard cases, build the firm-specific cases" is the right answer. WealthSchema's 30 purpose-built Data Sets cover the most common edge-case categories — Reg BI, tax-loss harvesting, retirement sequencing, equity comp, fair lending, AML, illustration validation. The firm-specific cases that require your particular domain knowledge are the ones worth building in-house.
Our comparison on WealthSynth vs. Building Synthetic Data In-House walks through the math in more detail.
Closing
The QA function in wealth-tech is structurally about test data — the algorithms can be tested if the test data exists, and the tests don't run if it doesn't. Programs that treat test data as a tier of test infrastructure, with its own architecture, lifecycle, and investment, produce QA programs that catch the bugs that matter. Programs that treat test data as a fixture file or an afterthought produce QA programs that catch the bugs that don't matter and miss the ones that do.
If your team is at one of the anti-patterns described above and you're trying to plot the transition out, we'd recommend starting with the edge-case specification work — that's the part that surfaces what the test corpus actually needs to cover and produces the document that justifies subsequent investment.
Key takeaways
- Test data is a tier of test infrastructure, not a fixture file. Six tiers (smoke, functional, edge case, regulatory, load, production-validation) each have distinct lifecycle requirements.
- Four anti-patterns recur (stale fixtures, masked production, Faker-volume, hand-curated edge cases). Each has a specific exit path; none is appropriate at scale on its own.
- Convert edge cases from records-in-a-file into specifications-in-a-document — generation can replace records, not the other way around.
- The cheapest way to win the investment argument is to map the cost of the current state in dollars, then propose a small Tier 3 pilot — not a full architectural overhaul.
- For most wealth-tech QA programs, buy the standard cases and build the firm-specific cases. The math on building everything in-house rarely works.
Related reading: