wealthschemaresourcesarticlesGenerating synthetic historical returns — random walk, regime-based, replay
Article

Generating synthetic historical returns — random walk, regime-based, replay

Random-walk lognormal returns — fast, and wrong about the fat tails that drive sequence-of-returns risk. Historical replay — right for 1973-1974 and 2008, silent on the 1999-style melt-up that 2032 may or may not bring. Regime-switching with calibrated cross-asset correlation is the production answer.

WealthSchema StaffPipeline architectureMay 9, 20267 min read

96 monthly snapshots of VTI, BND, AGG, a handful of sector ETFs, and the household's individual equity positions: every one of them has to mark to a price the rest of the engine can reason against. Three methods produce those prices in production — random-walk parametric (fast, fat-tail-blind), historical replay (faithful to what happened, mute on what didn't), and regime-switching with cross-asset correlation calibrated to multi-regime history (the most expensive, the only one defensible for a 30-year projection).

This article is the working note for engineering teams whose product has any time-series-aware feature: retirement Monte Carlo, sequence-of-returns testing, drawdown analysis, factor exposure, performance attribution. Each method, its failure mode, and the use cases that justify the migration from one to the next.

Three methods

The three methods cover essentially every approach in production. Hybrids exist (block bootstrap, regime-switching with parametric tails) and they tend to be one of the three with refinements rather than a fourth approach.

 MethodWhat it samplesStrengthWeakness
Random walk (GBM)IID draws from a normal (or fat-tailed) distribution per assetTrivially cheap, deterministic given seed, works in closed form for many statisticsNo volatility clustering, no regime persistence, cross-asset correlation is constant
Historical replayActual historical returns, sampled or replayed in orderCorrect by definition for the period it saw — every statistical fingerprint matchesSilent on regimes that didn't happen yet; sampling errors on short windows; cannot extrapolate
Regime-switchingReturns drawn from a regime-conditional distribution; regime sampled from a Markov chainCaptures persistence, fat tails, regime-dependent correlation; flexible for forward simulationMore parameters to calibrate; subjective regime definitions; computationally heavier

Method 1: Random walk / Geometric Brownian Motion

The default in numpy.random.normal and every quant library that pre-dates the 2008 crisis. Returns are independent, identically distributed normals (or sometimes log-normals). For a single asset, the path is a random walk with drift; for a portfolio, the joint distribution is a multivariate normal with a constant correlation matrix.

This is the method most generic synthetic-data tools use, when they generate returns at all. It is the method that produced the retirement Monte Carlo failures we covered earlier: IID-normal returns underestimate fat-tail risk by an order of magnitude, ignore volatility clustering entirely, and treat the 2008 financial crisis as a 5-sigma event that should not have happened in any 30-year sample.

For wealth-data use cases, GBM has one defensible application: education and demonstration. A planning UI showing what a retirement might look like under "smooth" market conditions can use GBM and label it as such. Anything that drives a real decision — a real Monte Carlo, a real risk metric, a real sequence-of-returns test — needs more.

Method 2: Historical replay (with bootstrap variants)

Historical replay samples actual return realizations from history. The simplest variant — block bootstrap — samples contiguous blocks (typically 1–12 months) from the historical record and concatenates them to produce a synthetic path of the desired length. More elaborate variants (stationary block bootstrap, circular block bootstrap) handle the boundary conditions differently.

The strength is the reason replay exists at all: every statistical fingerprint matches by construction. Volatility clustering is preserved if the blocks are long enough. Cross-asset correlation matches the historical realization. Fat tails are present because the historical record has them.

The weaknesses are subtle and matter. Replay can only sample regimes that are already in the record. A backtest using replay over the 2009–2021 period samples a low-inflation, low-rate, high-equity-return regime — and only that regime. The 2022 inflation surge was outside the sample. The 1970s stagflation was outside the sample. Any forward simulation over a multi-decade horizon assumes the future regime will be drawn from past regimes, weighted by their historical frequency. That assumption breaks down whenever the future regime is genuinely new (which is, empirically, often).

Replay also has a sample-size problem on uncommon statistics. The bootstrap distribution of 30-year terminal wealth, computed from 75 years of monthly data, has only 75 distinct 30-year windows that could contain any specific 12-month subwindow. The effective sample size for tail statistics is much smaller than the apparent sample of "millions of paths."

Method 3: Regime-switching simulation

The production answer for any forward simulation that has to produce defensible tail statistics. Returns are drawn from a regime-conditional distribution; the regime itself follows a Markov process whose transition matrix is calibrated to the empirical persistence of historical regimes.

A typical regime structure has 2–4 states: high-volatility crisis, recovery, stable expansion, late-cycle. Each regime has its own per-asset return distribution, its own per-asset volatility, and its own cross-asset correlation matrix. The Markov transition matrix specifies the probability of switching from each regime to each other regime in any given period.

The output is dramatically wider than GBM and structurally different from replay. Within each regime path, returns have the correct conditional moments. Across regime paths, the simulation samples regime sequences that include both historically-observed sequences (1973-style stagflation, 2008-style crisis) and historically-novel sequences (a 1970s-style inflation regime followed by a 2008-style banking crisis, which has not happened in the record but is not ruled out by the calibration).

// Regime model parameters (4-state example)
{
  "regimes": ["expansion", "late_cycle", "crisis", "recovery"],
  "transition_matrix": [
    [0.95, 0.04, 0.005, 0.005],
    [0.10, 0.85, 0.04, 0.01],
    [0.0,  0.15, 0.70, 0.15],
    [0.20, 0.05, 0.05, 0.70]
  ],
  "regime_returns": {
    "expansion": {
      "equities": { "mean_monthly": 0.011, "std_monthly": 0.035, "skew": -0.4, "kurt": 4.5 },
      "bonds":    { "mean_monthly": 0.003, "std_monthly": 0.012, "skew": 0.0, "kurt": 3.2 }
    },
    "crisis": {
      "equities": { "mean_monthly": -0.025, "std_monthly": 0.080, "skew": -1.5, "kurt": 9.5 },
      "bonds":    { "mean_monthly": 0.005, "std_monthly": 0.020, "skew": 0.5, "kurt": 5.0 }
    }
    /* ... */
  },
  "regime_correlations": {
    "expansion": [[1.0, -0.10], [-0.10, 1.0]],
    "crisis":    [[1.0,  0.55], [ 0.55, 1.0]]
  }
}

The most important calibration detail is in the last block: cross-asset correlations are regime-conditional. Bonds and equities are slightly negatively correlated in expansion regimes and strongly positively correlated in crisis regimes. A simulator that uses a single average correlation matrix produces tail diversification benefits that do not exist in production.

The three calibration mistakes

Regime-switching is the answer, but a poorly-calibrated regime model is barely better than GBM. The three mistakes we see most often:

Overfitting on the most recent regime

If the regime parameters are estimated only from the most recent decade, the simulation is no better than GBM-with-recent-data. The regime structure has to be calibrated against a window that contains multiple regime examples — typically 50+ years of data, or specific historical episodes (1929 crash, 1973–75 oil shock, 1987 crash, 2000 dot-com, 2008 financial crisis, 2020 COVID, 2022 inflation surge) treated as anchor calibration points.

Ignoring cross-asset correlation breakdown in tail regimes

The single most-cited statistic in the post-2008 literature is "diversification fails when you need it most." Equity-bond correlation flipped from -0.2 in the 2003–2007 expansion to +0.5 in the 2008 crisis to -0.3 again in the 2010–2019 expansion to +0.6 in the 2022 inflation regime. A simulator that uses a single 75-year average correlation systematically overstates the diversification benefits a 60/40 portfolio enjoys in crisis regimes.

Confusing volatility regimes with return regimes

Volatility regimes (low-vol vs high-vol) and return regimes (expansion vs contraction) are correlated but not identical. The 2009–2010 period had high realized volatility and high returns; the 2017 period had low realized volatility and high returns. A model that conflates the two produces a regime structure where high vol always means low returns, which contradicts the historical record.

Formula
Regime-conditional cross-asset return
r_t | s_t = N(μ_s, Σ_s), P(s_{t+1} = j | s_t = i) = T_{ij}
r_t
= vector of asset returns at time t
s_t
= regime state at time t (e.g. expansion, crisis)
μ_s
= regime-conditional mean return vector
Σ_s
= regime-conditional covariance matrix
T_{ij}
= Markov transition probability from regime i to j
Example
In an expansion regime, equity returns ~ N(0.011, 0.035²) per month; bond returns ~ N(0.003, 0.012²); ρ(eq,bd) = -0.1. In a crisis regime, equity returns ~ N(-0.025, 0.080²); bond returns ~ N(0.005, 0.020²); ρ(eq,bd) = +0.55. Transition matrix calibrated so expansion is sticky (95% persistence) and crises are short but contagious (15% probability of moving to recovery vs. 15% staying in crisis vs. 70% transition out of crisis state per month).

What we use in the WealthSynth corpus

The 96-month longitudinal data attached to every household in our synthetic wealth dataset catalog is generated using a four-state regime-switching model calibrated against the 1928–2025 US equity record, the 1953–2025 US Treasury record, and the 1990–2025 international equity record. Cross-asset correlations are regime-conditional. The regime sequence for each household is sampled independently, so the corpus contains households whose retirement horizon lands in expansions, crises, and (rarely) regime sequences that are historically novel.

The methodology document attached to each bundle includes the regime-conditional moments, the transition matrix, and the validation suite we use to check that the generated returns reproduce the empirical moments of the historical record on the regimes we calibrated against. Buyers running their own Monte Carlo retirement simulator can either consume our pre-generated returns directly or use the household's regime-sequence labels to drive their own model.

When to use which

The three methods are not interchangeable. The decision depends on the use case.

 Use caseRecommended methodWhy
UI demo / educationRandom walk (labeled)Speed and clarity matter; users aren't making decisions
Backtest of a past periodHistorical replayThe period is real; replay reproduces it exactly
Forward retirement Monte CarloRegime-switchingTail risks and regime sequences matter for the answer
Stress test against named scenarioReplay (anchored)Replay the named historical episode; treat as anchor not sample
Sequence-of-returns sensitivityRegime-switchingSequence-of-returns risk is regime-driven; replay sample is too narrow
Factor backtestReplay or regime-switchingReplay if factor model is the question; regime-switching if portfolio robustness is

Where this connects

The returns generation choice flows downstream into every time-series-aware system. For the retirement-projection failure modes, see Monte Carlo for retirement: where standard libraries break. For the methodology comparison framed for buyers (synthetic vs. replay vs. bootstrap), see Synthetic time series vs. historical replay. For the QA tells that diagnose poor returns generation in any synthetic dataset, see Detecting unrealistic patterns in synthetic time-series wealth data. For the umbrella view, see Time-Series Fidelity in Synthetic Wealth Data.

The single deepest interaction is with drawdown sequencing and tax-aware withdrawal: the right answer for which account to draw from in a retirement is regime-conditional, and a withdrawal engine tested against GBM returns will recommend strategies that fail the moment a real regime shift hits.