96 monthly snapshots of VTI, BND, AGG, a handful of sector ETFs, and the household's individual equity positions: every one of them has to mark to a price the rest of the engine can reason against. Three methods produce those prices in production — random-walk parametric (fast, fat-tail-blind), historical replay (faithful to what happened, mute on what didn't), and regime-switching with cross-asset correlation calibrated to multi-regime history (the most expensive, the only one defensible for a 30-year projection).
This article is the working note for engineering teams whose product has any time-series-aware feature: retirement Monte Carlo, sequence-of-returns testing, drawdown analysis, factor exposure, performance attribution. Each method, its failure mode, and the use cases that justify the migration from one to the next.
Three methods
The three methods cover essentially every approach in production. Hybrids exist (block bootstrap, regime-switching with parametric tails) and they tend to be one of the three with refinements rather than a fourth approach.
| Method | What it samples | Strength | Weakness | |
|---|---|---|---|---|
| Random walk (GBM) | IID draws from a normal (or fat-tailed) distribution per asset | Trivially cheap, deterministic given seed, works in closed form for many statistics | No volatility clustering, no regime persistence, cross-asset correlation is constant | |
| Historical replay | Actual historical returns, sampled or replayed in order | Correct by definition for the period it saw — every statistical fingerprint matches | Silent on regimes that didn't happen yet; sampling errors on short windows; cannot extrapolate | |
| Regime-switching | Returns drawn from a regime-conditional distribution; regime sampled from a Markov chain | Captures persistence, fat tails, regime-dependent correlation; flexible for forward simulation | More parameters to calibrate; subjective regime definitions; computationally heavier |
Method 1: Random walk / Geometric Brownian Motion
The default in numpy.random.normal and every quant library that pre-dates the 2008 crisis. Returns are independent, identically distributed normals (or sometimes log-normals). For a single asset, the path is a random walk with drift; for a portfolio, the joint distribution is a multivariate normal with a constant correlation matrix.
This is the method most generic synthetic-data tools use, when they generate returns at all. It is the method that produced the retirement Monte Carlo failures we covered earlier: IID-normal returns underestimate fat-tail risk by an order of magnitude, ignore volatility clustering entirely, and treat the 2008 financial crisis as a 5-sigma event that should not have happened in any 30-year sample.
For wealth-data use cases, GBM has one defensible application: education and demonstration. A planning UI showing what a retirement might look like under "smooth" market conditions can use GBM and label it as such. Anything that drives a real decision — a real Monte Carlo, a real risk metric, a real sequence-of-returns test — needs more.
Method 2: Historical replay (with bootstrap variants)
Historical replay samples actual return realizations from history. The simplest variant — block bootstrap — samples contiguous blocks (typically 1–12 months) from the historical record and concatenates them to produce a synthetic path of the desired length. More elaborate variants (stationary block bootstrap, circular block bootstrap) handle the boundary conditions differently.
The strength is the reason replay exists at all: every statistical fingerprint matches by construction. Volatility clustering is preserved if the blocks are long enough. Cross-asset correlation matches the historical realization. Fat tails are present because the historical record has them.
The weaknesses are subtle and matter. Replay can only sample regimes that are already in the record. A backtest using replay over the 2009–2021 period samples a low-inflation, low-rate, high-equity-return regime — and only that regime. The 2022 inflation surge was outside the sample. The 1970s stagflation was outside the sample. Any forward simulation over a multi-decade horizon assumes the future regime will be drawn from past regimes, weighted by their historical frequency. That assumption breaks down whenever the future regime is genuinely new (which is, empirically, often).
Replay also has a sample-size problem on uncommon statistics. The bootstrap distribution of 30-year terminal wealth, computed from 75 years of monthly data, has only 75 distinct 30-year windows that could contain any specific 12-month subwindow. The effective sample size for tail statistics is much smaller than the apparent sample of "millions of paths."
Method 3: Regime-switching simulation
The production answer for any forward simulation that has to produce defensible tail statistics. Returns are drawn from a regime-conditional distribution; the regime itself follows a Markov process whose transition matrix is calibrated to the empirical persistence of historical regimes.
A typical regime structure has 2–4 states: high-volatility crisis, recovery, stable expansion, late-cycle. Each regime has its own per-asset return distribution, its own per-asset volatility, and its own cross-asset correlation matrix. The Markov transition matrix specifies the probability of switching from each regime to each other regime in any given period.
The output is dramatically wider than GBM and structurally different from replay. Within each regime path, returns have the correct conditional moments. Across regime paths, the simulation samples regime sequences that include both historically-observed sequences (1973-style stagflation, 2008-style crisis) and historically-novel sequences (a 1970s-style inflation regime followed by a 2008-style banking crisis, which has not happened in the record but is not ruled out by the calibration).
// Regime model parameters (4-state example)
{
"regimes": ["expansion", "late_cycle", "crisis", "recovery"],
"transition_matrix": [
[0.95, 0.04, 0.005, 0.005],
[0.10, 0.85, 0.04, 0.01],
[0.0, 0.15, 0.70, 0.15],
[0.20, 0.05, 0.05, 0.70]
],
"regime_returns": {
"expansion": {
"equities": { "mean_monthly": 0.011, "std_monthly": 0.035, "skew": -0.4, "kurt": 4.5 },
"bonds": { "mean_monthly": 0.003, "std_monthly": 0.012, "skew": 0.0, "kurt": 3.2 }
},
"crisis": {
"equities": { "mean_monthly": -0.025, "std_monthly": 0.080, "skew": -1.5, "kurt": 9.5 },
"bonds": { "mean_monthly": 0.005, "std_monthly": 0.020, "skew": 0.5, "kurt": 5.0 }
}
/* ... */
},
"regime_correlations": {
"expansion": [[1.0, -0.10], [-0.10, 1.0]],
"crisis": [[1.0, 0.55], [ 0.55, 1.0]]
}
}
The most important calibration detail is in the last block: cross-asset correlations are regime-conditional. Bonds and equities are slightly negatively correlated in expansion regimes and strongly positively correlated in crisis regimes. A simulator that uses a single average correlation matrix produces tail diversification benefits that do not exist in production.
The three calibration mistakes
Regime-switching is the answer, but a poorly-calibrated regime model is barely better than GBM. The three mistakes we see most often:
Overfitting on the most recent regime
If the regime parameters are estimated only from the most recent decade, the simulation is no better than GBM-with-recent-data. The regime structure has to be calibrated against a window that contains multiple regime examples — typically 50+ years of data, or specific historical episodes (1929 crash, 1973–75 oil shock, 1987 crash, 2000 dot-com, 2008 financial crisis, 2020 COVID, 2022 inflation surge) treated as anchor calibration points.
Ignoring cross-asset correlation breakdown in tail regimes
The single most-cited statistic in the post-2008 literature is "diversification fails when you need it most." Equity-bond correlation flipped from -0.2 in the 2003–2007 expansion to +0.5 in the 2008 crisis to -0.3 again in the 2010–2019 expansion to +0.6 in the 2022 inflation regime. A simulator that uses a single 75-year average correlation systematically overstates the diversification benefits a 60/40 portfolio enjoys in crisis regimes.
Confusing volatility regimes with return regimes
Volatility regimes (low-vol vs high-vol) and return regimes (expansion vs contraction) are correlated but not identical. The 2009–2010 period had high realized volatility and high returns; the 2017 period had low realized volatility and high returns. A model that conflates the two produces a regime structure where high vol always means low returns, which contradicts the historical record.
r_t | s_t = N(μ_s, Σ_s), P(s_{t+1} = j | s_t = i) = T_{ij}- r_t
- = vector of asset returns at time t
- s_t
- = regime state at time t (e.g. expansion, crisis)
- μ_s
- = regime-conditional mean return vector
- Σ_s
- = regime-conditional covariance matrix
- T_{ij}
- = Markov transition probability from regime i to j
In an expansion regime, equity returns ~ N(0.011, 0.035²) per month; bond returns ~ N(0.003, 0.012²); ρ(eq,bd) = -0.1. In a crisis regime, equity returns ~ N(-0.025, 0.080²); bond returns ~ N(0.005, 0.020²); ρ(eq,bd) = +0.55. Transition matrix calibrated so expansion is sticky (95% persistence) and crises are short but contagious (15% probability of moving to recovery vs. 15% staying in crisis vs. 70% transition out of crisis state per month).What we use in the WealthSynth corpus
The 96-month longitudinal data attached to every household in our synthetic wealth dataset catalog is generated using a four-state regime-switching model calibrated against the 1928–2025 US equity record, the 1953–2025 US Treasury record, and the 1990–2025 international equity record. Cross-asset correlations are regime-conditional. The regime sequence for each household is sampled independently, so the corpus contains households whose retirement horizon lands in expansions, crises, and (rarely) regime sequences that are historically novel.
The methodology document attached to each bundle includes the regime-conditional moments, the transition matrix, and the validation suite we use to check that the generated returns reproduce the empirical moments of the historical record on the regimes we calibrated against. Buyers running their own Monte Carlo retirement simulator can either consume our pre-generated returns directly or use the household's regime-sequence labels to drive their own model.
When to use which
The three methods are not interchangeable. The decision depends on the use case.
| Use case | Recommended method | Why | |
|---|---|---|---|
| UI demo / education | Random walk (labeled) | Speed and clarity matter; users aren't making decisions | |
| Backtest of a past period | Historical replay | The period is real; replay reproduces it exactly | |
| Forward retirement Monte Carlo | Regime-switching | Tail risks and regime sequences matter for the answer | |
| Stress test against named scenario | Replay (anchored) | Replay the named historical episode; treat as anchor not sample | |
| Sequence-of-returns sensitivity | Regime-switching | Sequence-of-returns risk is regime-driven; replay sample is too narrow | |
| Factor backtest | Replay or regime-switching | Replay if factor model is the question; regime-switching if portfolio robustness is |
Where this connects
The returns generation choice flows downstream into every time-series-aware system. For the retirement-projection failure modes, see Monte Carlo for retirement: where standard libraries break. For the methodology comparison framed for buyers (synthetic vs. replay vs. bootstrap), see Synthetic time series vs. historical replay. For the QA tells that diagnose poor returns generation in any synthetic dataset, see Detecting unrealistic patterns in synthetic time-series wealth data. For the umbrella view, see Time-Series Fidelity in Synthetic Wealth Data.
The single deepest interaction is with drawdown sequencing and tax-aware withdrawal: the right answer for which account to draw from in a retirement is regime-conditional, and a withdrawal engine tested against GBM returns will recommend strategies that fail the moment a real regime shift hits.