A robo-advisor looks, from the outside, like a single product: deposit money, get an asset allocation, watch the dashboard. Inside, it is six modeling problems entangled — onboarding suitability scoring, asset allocation, lot-level tax-loss harvesting, retirement projection, withdrawal sequencing, behavioral-finance nudging — each with its own correctness criteria. The robo-advisors that ship correct answers all six tests are the ones whose test corpus exercises all six. The robo-advisors that ship surprises in production are usually the ones whose corpus only exercised the easy three. See edge cases at every constraint-binding scenario.
This article is the working note we hand fintech engineering teams building robo-advisors. The six modeling problems, the synthetic-data shape each one needs, and the validation gates that catch the bugs that customers eventually find.
What a robo-advisor actually has to model
The minimum-viable robo-advisor — the kind that gets approved by the SEC under Reg BI, holds custody, and can take real customers — has six modeling problems. Each is independently solvable; the engineering complexity comes from making them work together.
| Module | Common bug class | Synthetic-data resolution required | |
|---|---|---|---|
| Onboarding suitability | Scoring drift across edge-case demographics; Reg BI documentation gaps | Demographic spread, household-composition variations | |
| Asset allocation | Drift detection; rebalancing thresholds; tax-aware vs naive choice | Multi-account households with realistic balance ratios | |
| Tax-loss harvesting | Lot-level wash-sale across accounts; specific-ID vs FIFO; QSBS handling | Lot-level resolution with 30–300 lots/position | |
| Retirement projection | Stationary IID-normal returns; missed sequence risk; withdrawal seasonality | Monthly longitudinal with 96 snapshots per household | |
| Withdrawal sequencing | Bracket-fill recommendations that miss IRMAA / NIIT / ACA constraints | Constraint-binding edge cases at every threshold | |
| Behavioral nudging | Recommendations that ignore the household's actual cash position | Within-month cash-flow data, not just monthly aggregates |
Module 1: onboarding suitability
The first thing a robo does on the customer's signup is collect the inputs that drive a Reg BI suitability assessment — investment objectives, risk tolerance, time horizon, financial situation, tax status. The output is a recommended account structure (taxable / IRA / Roth / 529) and a target asset allocation.
The bugs ship in the edge cases. A 28-year-old with a $40K salary and a $2M inheritance is plausible but rare; a 65-year-old planning to retire in 6 months but still rolling over a 401(k) is the case the robo's onboarding has to handle correctly because it gets sent there by the marketing funnel. Test corpora that don't include these edge cases produce engines that ship "out of risk band" exceptions to real customers.
Module 2: asset allocation
Asset-allocation engines look simple on paper — risk profile in, allocation out — and produce real bugs at scale. The bug classes:
- Drift detection. Once the household holds the allocation, it drifts as markets move. The engine has to detect when drift exceeds threshold and rebalance. Engines with poor synthetic test data routinely fail to detect drift in unusual market regimes.
- Tax-aware rebalancing vs naive. A naive engine sells whatever's overweight; a tax-aware engine sells the lots with the smallest tax impact, prefers losses over gains, and respects long-term holding-period thresholds. See tax-aware portfolio rebalancer for the lot-selection logic. The two engines produce dramatically different after-tax outcomes.
- Multi-account allocation. A household has $300K split across taxable, Roth, and traditional IRA. The engine has to allocate by household totals while placing assets in the most tax-efficient account class — bonds in tax-deferred, equities in taxable, etc. The asset-location decision is the cleanest robo win, and it's only solvable with multi-account data.
The synthetic data this needs: multi-account households with realistic balance ratios across taxable / tax-deferred / Roth, lot-level resolution within each account, and explicit current-value vs cost-basis tracking.
Module 3: tax-loss harvesting
TLH is one of the highest-value robo features and one of the most error-prone in production. The bugs we see most often:
TLH bug inventory
- Cross-account wash-sale: harvesting a loss in taxable while buying a substantially-identical security in IRA the next day. The IRS rule is taxpayer-wide, not account-local.
- Specific-identification not honored: the engine sells lots in FIFO order despite specific-ID being more tax-efficient.
- Holding-period miscalculation: long-term threshold is one year + one day; engines that get this wrong by a day produce short-term gains the customer didn't expect.
- QSBS lots accidentally harvested: Section 1202 stock has a 5-year hold to qualify; harvesting it for a small loss can vaporize a much larger future tax exclusion.
- Rebalancing-induced wash sales: rebalancing during the 30-day window can trigger wash-sale on a recent harvest, undoing the tax benefit silently.
The synthetic data needed: lot-level resolution at 30–300 lots/position, multiple linked accounts (including non-trading IRAs and HSAs that can still trigger cross-account wash-sales), special-status flags including QSBS, and time-series that span the wash-sale 30+30 day window.
Module 4: retirement projection
The retirement projection module produces the headline output most robo customers actually look at — the "how am I tracking" dashboard. The math behind it is, in honest implementations, a regime-switching Monte Carlo simulation with adaptive withdrawals; in many implementations, it's IID-normal returns with constant inflation-adjusted withdrawals.
The bugs that ship from the simpler version:
- Confidence bands that are too narrow (don't capture regime risk)
- Sequence-of-returns scenarios that aren't worst-case enough
- Withdrawal projections that ignore within-year cash-flow seasonality
- "Probability of success" headlines that hide the path-dependent failures
A robo's test corpus needs households with monthly cash-flow patterns, realistic income seasonality (SS month, RMD month, K-1 timing), and projection-window spanning multiple market regimes. The corpus has to support backtest replays of the engine's recommendations against real historical paths.
Module 5: withdrawal sequencing
Once the household enters retirement, the robo's withdrawal-sequencing module decides which accounts to draw from in which order. Related: annuity modeling retirement income. The textbook rule (taxable first, then tax-deferred, then Roth) is wrong for the households who benefit most from optimization — pre-Medicare retirees with ACA subsidies, post-Medicare retirees near IRMAA bracket boundaries, households with significant capital-loss carryforwards.
The synthetic data this needs: households at every constraint-binding scenario. A test corpus where every retiree is post-Medicare with no NIIT exposure tests one branch of the optimizer. The corpus that catches optimizer bugs has households at IRMAA bracket boundaries, at ACA premium-subsidy cliffs, at NIIT thresholds, and with capital gains stacking.
Module 6: behavioral nudging
The robo's behavioral-finance module is what differentiates a "calculator" from a "coach." It nudges the customer toward higher savings, away from market-timing reactions during downturns, toward contributing the IRA limit by deadline. The nudges can be wrong in two directions:
- Tone-deaf to actual cash position. Nudging a customer to increase contributions while they're in a Q1 cash crunch produces churn, not behavior change.
- Blind to tax interactions. A "contribute to your IRA" nudge sent to a customer whose AGI is already in the IRA contribution phase-out is wrong advice.
The synthetic data this needs: within-month cash-flow tracking, AGI projection, and contribution-deadline calendar awareness.
The cross-module integration tests
The harder bugs aren't in any single module — they're in module interactions. The integration tests we run on every robo we audit:
- Test 1Onboarding → Allocation consistencyThe allocation produced by the engine matches the suitability profile from onboarding. A 'conservative' profile that gets a 70/30 allocation is a bug.
- Test 2Rebalancing → TLH coordinationRebalancing trades don't trigger wash-sales on recently-harvested losses. Engines that schedule rebalancing without considering harvest history fail this routinely.
- Test 3Projection → Withdrawal sequencing alignmentThe retirement projection assumes a withdrawal sequence; the actual sequencing module has to match. Mismatches produce projections the engine cannot achieve.
- Test 4Cash-flow → Nudging alignmentNudges are conditioned on the household's actual within-month cash position, not the year-end total. Test that nudges don't fire during cash crunches.
- Test 5Multi-year regressionRun the full module stack across a 96-month projection. Validate that no module's output contradicts another's. The most common cross-module bug surfaces in this test.
What a production-grade test corpus looks like
A robo-advisor's test corpus, at minimum:
Robo test corpus essentials
- 200+ households spanning age, income, net worth, and life-stage diversity
- Multi-account structure: every household has at least taxable + IRA; many have additional HSA, 401(k), 529, Roth
- Lot-level resolution within each account (30–300 lots typical)
- 96-month longitudinal with monthly cash-flow seasonality
- Edge cases: pre-Medicare ACA, post-Medicare IRMAA, NIIT, QSBS, multi-state
- Regime-spanning historical-return paths for backtesting
- Behavioral-event scenarios: market drops, customer panic events, deadline-driven decisions
- Reg BI documentation requirements: every onboarding produces a structured suitability record
A test corpus missing any of these is a corpus that exercises a fraction of the engine. Robos shipping production from incomplete corpora are the robos that ship surprises. The engineering investment in a complete corpus is real — typically $20K–$50K of synthetic data spend for a corpus of meaningful coverage — but a single production incident on a real customer eclipses that cost in regulatory exposure alone.
Key takeaways
- A robo-advisor is six modeling problems entangled. The test corpus has to exercise all six, with their interactions, not just the prettiest module. Companion piece: [HNW family office platforms](/articles/building-hnw-family-office-platform).
- Onboarding, allocation, TLH, retirement projection, withdrawal sequencing, and behavioral nudging are the six. Each has its own bug class and its own synthetic-data resolution requirement.
- Module integration is where the hardest bugs live. The most production incidents we see come from rebalancing-TLH coordination failures and projection-vs-actual sequencing mismatches.
- Test corpus essentials: multi-account households at lot-level resolution, 96-month monthly longitudinal, regime-spanning return paths, edge cases at every constraint-binding scenario.
- Synthetic data spend for a production-grade robo corpus runs $20K–$50K. Single production incidents on real customers eclipse the cost in regulatory exposure alone.