Monte Carlo has been the standard answer to "project a 30-year retirement plan" for thirty years. The libraries — numpy.random.normal, the GBM simulators in every quant package — make it cheap. Ten thousand paths, a 90% success probability, a clean fan chart of percentile bands. The output looks rigorous.
Routinely-implemented retirement Monte Carlo produces overconfident plans. Four assumption failures drive most of the overconfidence; the standard libraries do not flag any of them. Related: robo-advisor synthetic data requirements. Below: which assumptions fail, by how much, and what a production simulator has to do instead.
What retirement Monte Carlo is supposed to model
The use case is straightforward in concept. Given a starting balance, a withdrawal plan, an asset allocation, and a return distribution, project the balance forward 30+ years and report the probability that the household runs out of money. The output is consumed by financial planners, robo-advisors, and increasingly by AI agents that produce planning advice directly to consumers. Related: high-net-worth family office.
The model has high stakes. A 95% success probability that should have been 75% sells households a retirement they can't afford. Companion pieces: CRT CLAT estate planning models, AG 49-A IUL illustration modeling, 1031 exchange tax planning, and insurance illustration software testing. A 75% probability that should have been 90% pushes households into unnecessary frugality during years they could have enjoyed. The error bars on the model output have direct welfare consequences for real people, and the model is the load-bearing piece of every retirement-planning product in market.
Failure 1: independent normal returns
The default assumption in numpy.random.normal and its cousins is independent and identically distributed (IID) returns drawn from a normal distribution. Real equity returns are neither independent nor normal.
Equity returns exhibit volatility clustering: high-volatility periods are followed by high-volatility periods, low by low. The unconditional distribution looks roughly normal; the conditional distribution given recent realized volatility is much wider. A simulator that ignores volatility clustering systematically underestimates the probability of multiple bad years in a row — and multiple bad years in a row is exactly the failure mode that breaks retirement plans.
Real equity returns also have fat tails. The empirical distribution of monthly S&P 500 returns over the last 80 years has kurtosis around 8–10 versus 3 for a normal distribution. Five-sigma events occur roughly every 18 months. The 1987 crash, the 2008 financial crisis, the 2020 COVID crash, and the 2022 tech selloff are all real events that the IID-normal model would assign probability essentially zero.
Failure 2: stationary distributions
The standard simulator assumes return distributions are stationary — the parameters of the return process do not change over the 30-year projection horizon. This is wildly inconsistent with both economic theory and historical experience. Inflation regimes change. Interest-rate regimes change. Equity-risk-premium regimes change. See estate planning 2026 sunset modeling for a related regime-shift problem. The 1970s were not the 1990s and were not the 2010s.
A stationary simulator picking parameters from one regime will project that regime forward for 30 years. If you calibrated to 2009–2021 (a low-inflation, low-rate, high-equity-return regime), you produced retirement plans that look spectacular and break in any other regime. If you calibrated to 1970–1985 (a high-inflation, high-rate, mediocre-equity regime), you produced retirement plans that are absurdly conservative for any subsequent regime.
The fix is regime-switching simulation: sample regime states (low/high inflation, low/high rates, low/high equity premium) from a Markov process whose transition matrix is calibrated to historical regime persistence. Within each regime, sample returns from the regime-conditional distribution. The simulator output is no longer a single fan chart — it is a fan chart over regime paths, which is dramatically wider than the stationary version.
| Stationary IID-normal | Regime-switching with fat tails | |
|---|---|---|
| Width of 30-year terminal distribution | Narrow — confidence is fictional | Wide — confidence reflects real ambiguity |
| Sequence-of-returns sensitivity | Underweights early bad years | Captures the real impact of early-retirement crashes |
| Inflation modeling | Ignored or constant | Co-varies with rate and equity regimes |
| Computational cost | Trivial | 10–50× more expensive but fits in a single API call |
| Validation difficulty | Low — math is closed-form | Higher — requires regime-classification validation |
Failure 3: clean withdrawals
The standard Monte Carlo treats withdrawals as a constant inflation-adjusted amount drawn at the start of each year. Real retirement withdrawals are nothing like this.
Real retirees have lumpy expense profiles: property tax in Q1, estimated tax payments in Q2 / Q3 / Q4 / January, RMDs forced at year-end, healthcare bills concentrated in late life, long-term-care expenses concentrated in the last 1–3 years. Real retirees also adjust withdrawals to market conditions — guard-rails strategies, dynamic spending rules, the well-documented behavioral pattern of households reflexively cutting spending after a market drop. Companion piece: annuity modeling synthetic data. A simulator that ignores within-year timing produces plans that look fine on paper and fail under within-year cash crunches; a simulator that ignores adaptive spending overstates portfolio depletion in bad scenarios.
W_t = base_t × inflation_adj_t × guard_rail(portfolio_t / target_t) + lumpy_t + tax_t- base_t
- = Initial withdrawal rate × starting balance (the 4%-rule baseline)
- inflation_adj_t
- = Cumulative inflation since retirement, regime-conditional
- guard_rail
- = Spending adjustment based on portfolio-vs-target ratio (0.85–1.15 typical band)
- lumpy_t
- = Property tax, healthcare deductible, LTC expense — concentrated by month
- tax_t
- = Federal + state taxes on RMDs, conversions, withdrawals — varies by year
Failure 4: probability of success as the wrong metric
The headline output of retirement Monte Carlo is usually "probability of success" — the percentage of paths in which the household never runs out of money. This is the wrong metric for almost every real decision.
A 90% success probability means 10% of paths end in failure, but it does not say how they fail. A path that runs out at age 92 is a different failure than one that runs out at age 75; a path where the household ate cat food for the last decade is a different failure than one where they spent down to a normal level and then needed minor lifestyle adjustments. The probability metric collapses all these into a single binary.
The metrics that actually drive sensible decisions are conditional ones: median real consumption in the worst-decile path; probability of a >20% real spending cut at any age; expected age at portfolio depletion conditional on depletion occurring. These metrics are not harder to compute — they are computed from the same simulation paths — but they are not the default in standard libraries, so most products don't show them.
What a production-grade simulator does
A production-grade retirement Monte Carlo, in our view as of 2026, has the following non-negotiable properties:
- Property 1Regime-switching return modelMarkov-chain regime states with transition matrix calibrated to long-horizon historical regime persistence. At minimum, low-vol vs. high-vol equity regime + low-rate vs. high-rate regime.
- Property 2Fat-tailed conditional distributionsWithin-regime returns from a Student-t or skew-t distribution, not Gaussian. Tail parameters calibrated to historical empirical kurtosis.
- Property 3Co-varying inflationInflation drawn jointly with returns and rates, not independently. The 1970s pattern (high inflation + mediocre real equity returns) must be reachable in simulation.
- Property 4Adaptive withdrawal modelGuard-rail spending rule + lumpy expenses + explicit tax modeling. Pure-flat withdrawals are for spec demos, not for real product output.
- Property 5Decision-relevant outputConditional metrics on top of probability of success: worst-decile real consumption, probability of large real cuts, expected age at depletion. The dashboard tells the planner what they need to know.
- Property 6Sensitivity transparencyUser-controllable inputs explicitly call out their effect on the output. A 10 bp change in equity premium assumption shifts success probability by N points; the user sees this.
The cost of doing this right is roughly an order of magnitude more engineering than the IID-normal version — but the IID-normal version is producing wrong answers. The right comparison is not "simple vs complex" but "calibrated vs uncalibrated."
Why this connects to test data
A wealth-tech product running a retirement Monte Carlo needs test data that exercises the simulator's edge cases. A test corpus of mid-career households who all retire at 65 with normal asset allocations will pass any simulator's smoke test. See longitudinal synthetic financial data design for the time-series structure that exposes these bugs. The simulator that breaks in production is the one tested only against this corpus.
The corpus that catches simulator bugs has explicit edge cases: households retiring into a crash year, households with concentrated equity positions, households with significant Roth conversion windows, households with lumpy late-life expenses, households facing IRMAA bracket transitions during their projection. Add HSA-as-stealth-retirement households to that list — see HSA investment and triple-tax-advantage modeling for the account class most simulators routinely fold into "checking" and miss. A simulator validated only on a 65-year-old retiring with 60/40 in two accounts has approximately zero coverage of the cases that actually break in production; the corpus has to carry the regime states, the withdrawal rules, and the edge demographics that exercise every branch the simulator can take.
Key takeaways
- Standard Monte Carlo libraries assume IID-normal returns. Real returns are neither independent nor normal — fat tails and volatility clustering matter for retirement projection.
- Stationary distributions assume the next 30 years look like the last 12. They will not. Regime-switching is mandatory for any honest 30-year projection.
- Withdrawals are lumpy, tax-aware, and adaptive. Constant inflation-adjusted withdrawals are a simulation convenience, not a model of real households.
- Probability of success is the wrong headline metric. Conditional metrics — worst-decile real consumption, probability of large real cuts — drive better decisions.
- A production-grade simulator costs roughly 10× the engineering of the textbook version. The textbook version is producing wrong answers; the cost is real and worth it.
Frequently asked questions
Doesn't bootstrapping historical returns avoid the fat-tail problem?+
How important is correlation between asset classes?+
What about Social Security and annuity income — are those handled the same way? See [SECURE 2.0 RMD engineering](/articles/rmd-secure-act-2-engineering-rebuild) and [Social Security claiming optimization](/articles/social-security-claiming-optimization) for the underlying mechanics.+
How do we test our retirement Monte Carlo against synthetic data?+
For the methodology of the regime-switching returns generation referenced above, see Generating synthetic historical returns: random walk, regime-based, replay — and the methodology comparison Synthetic time series vs. historical replay for when each approach belongs in a Monte Carlo workflow. The umbrella view of the time-series concerns this article touches is at Time-Series Fidelity in Synthetic Wealth Data.