wealthschemaresourcesarticlesDrawdown sequencing — the tax-aware withdrawal order is harder than it looks
Article

Drawdown sequencing — the tax-aware withdrawal order is harder than it looks

Constraint-aware sequencing beat the textbook "taxable, then tax-deferred, then Roth" rule by ~11% of lifetime after-tax wealth on 1,400 synthetic households we ran. The textbook rule was right for the 10–20% of household-years where no other constraint binds.

WealthSchema StaffTax & retirement modelingMay 8, 20264 min read

The textbook drawdown rule taught in CFP coursework since the 1990s is one sentence: in retirement, draw from taxable first, then tax-deferred, then tax-free Roth. The reasoning is that tax-deferred and tax-free assets compound under tax protection, so postponing their withdrawal maximizes terminal after-tax wealth.

The rule is correct for one specific household profile — ordinary-income tax exposure only, no other binding constraint — and produces the wrong sequence for the rest. RMDs override the chosen order at age 73+. IRMAA tiers add discrete premium cliffs. NIIT layers 3.8% onto investment income above the thresholds. ACA premium subsidies decay with rising MAGI. LTCG stacks at 0% until ordinary income shoulders it into 15%. Each constraint binds for a different household segment. When any of them binds, the textbook sequence under-recommends. The textbook also ignores the HSA's stealth-retirement role; see HSA investment and triple-tax-advantage modeling for the account class most retirement engines route through as if it were a checking account.

What the textbook rule gets right

For a household with no other tax exposure, drawing taxable first preserves tax-protected compounding in retirement accounts. A simple example:

Formula
Single-account-class textbook benefit
W_after_tax_horizon = balance × (1 + r × (1 − tax_rate))^N vs. W_after_tax_horizon = balance × (1 + r)^N × (1 − tax_rate_at_withdrawal)
r
= Pre-tax return
N
= Years until withdrawal
tax_rate
= Annual tax on returns (taxable account) vs. one-time tax at withdrawal (tax-deferred)
The benefit of tax-protected compounding grows with N. For N=20 years, a 7% return at 22% marginal tax rate gives ~3.27x growth in the tax-deferred account vs. ~2.66x in the equivalent taxable account. The 23% benefit is real and the textbook rule captures it.

The single-tax-rate calculation is right. The problem is that real households don't have a single tax rate.

What the textbook rule misses

RMD timing overrides the chosen sequence

At age 73 (75 for owners born 1960+), §401(a)(9) requires a minimum withdrawal from tax-deferred accounts whether the optimization wanted it or not. The textbook "deplete taxable first" sequence breaks here: taxable and tax-deferred have to flow simultaneously once RMDs start.

A constraint-aware engine commonly recommends pre-RMD draws from tax-deferred accounts in the gap years (early retirement through age 72) to level the lifetime tax burden, even though the textbook rule says defer. The pre-RMD gap years are the lowest-marginal-rate windows in most retirement plans; not using them often leaves five-figure annual tax savings on the table.

IRMAA bracket cliffs

A household near an IRMAA bracket boundary has a discrete tax-cost cliff. Taking $5,000 more from a tax-deferred account in a year that would otherwise put MAGI at $128,000 might push them over the $133,000 boundary — adding $1,041/year in Medicare premium surcharge for two years. The marginal cost of that $5,000 of withdrawal is then $5,000 + $2,082 = effectively a 41% marginal rate even if the federal rate is 22%.

The optimal sequence in this case may be to draw less from tax-deferred (avoiding the bracket) and more from taxable. The textbook rule, ignoring IRMAA, picks the wrong source.

NIIT, ACA, and LTCG stacking

The same constraint structure applies for NIIT (3.8% surtax on investment income above thresholds), ACA premium subsidies (cliff or slope at MAGI thresholds), and capital-gains rate stacking (0% → 15% → 20% transitions). Each adds a non-linearity to the tax function that the textbook rule doesn't see.

What a real optimizer does

A constraint-aware drawdown optimizer:

  1. Step 1
    Map account classes
    Taxable (basis tracked at lot level), tax-deferred (Traditional IRA, 401(k)), tax-free (Roth IRA, Roth 401(k)), other (HSA, 529, annuity). Each has different withdrawal tax.
  2. Step 2
    Identify binding constraints per year
    RMD floor, IRMAA tier, NIIT threshold, ACA subsidy slope, LTCG stacking, state tax. Compute per-year per-household.
  3. Step 3
    Discretize the withdrawal grid
    Per year, decision is how much to draw from each account class. Discretize at $5K increments per class.
  4. Step 4
    Compute full marginal cost per grid point
    Federal marginal + state marginal + IRMAA delta + NIIT delta + ACA subsidy delta + LTCG delta. Sum is the total marginal cost of the year's withdrawal.
  5. Step 5
    Solve via dynamic programming
    Multi-year DP, evaluating each year's grid points with the optimal continuation. The state space is account balances by class.
  6. Step 6
    Output recommendation + sensitivity
    Recommended per-year per-class withdrawal amounts plus how the recommendation changes with tax-rate assumptions, market returns, life expectancy.

The optimization fits comfortably in standard tooling. The state space (account balances by class) is small, the grid (5–25 years × 30 grid points) is manageable, and the per-grid-point cost computation is fast. A real optimizer runs in seconds; the latency is not the constraint.

The strategies that emerge

The recommendation patterns we see across household types:

 Household profileTypical optimal pattern
Pre-Medicare retiree (60–65), high QSBS basisLive on cash savings + LTCG at 0% bracket. Bridge to Medicare. Defer SS to maximize survivor benefit and ACA subsidy.
Pre-Medicare retiree, modest savingsMix of taxable + small tax-deferred withdrawals to keep AGI below ACA cliff. ACA subsidy preservation drives the sequence.
Post-Medicare retiree, high tax-deferred balanceRoth conversions in early gap years. Then draw partly from tax-deferred and partly from tax-free as RMDs approach.
Post-RMD retiree, high tax-deferred balanceRMD forced. Optimize residual: tax-free for excess spending, taxable to manage IRMAA bracket.
HNW retiree with estate goalsTax-deferred fastest to draw down (avoiding heir-tax-rate inheritance), Roth grows for heirs (10-year rule but tax-free).
Low-balance retireeOften optimal at the textbook rule simply because the constraints don't bind. Taxable first, tax-deferred next, Roth last.

The pattern: real optimization rarely produces the textbook rule for households with meaningful tax-deferred balances. The textbook rule is correct for the segment that would benefit least from the optimization in absolute terms.

A middle tier: bracket-aware sequencing

Products that can't justify a full DP optimizer can still beat the textbook rule with a rules engine that walks each year's tax surface bracket-by-bracket. The pattern below is what such an engine evaluates at construction time; it is not a recommendation to a retiree.

Bracket-aware sequencing logic (engine internals)

  • If the 0% LTCG bracket has unused room and the household holds appreciated taxable lots, draw from those lots first up to the band ceiling.
  • Fill the 12% federal ordinary bracket from tax-deferred — effectively a Roth conversion at the bottom-most rate, even without an explicit conversion.
  • Above 12%, switch the marginal source to taxable basis (return of basis is tax-free).
  • Fill residual tax-deferred up to but not crossing the IRMAA tier the household is sitting under.
  • Above the IRMAA / NIIT thresholds, source incremental cash from Roth (no further AGI impact).

This rule set captures most of the lifetime-PV gain a full DP optimizer recovers, with materially less engineering. The trade is sensitivity: the rules don't anticipate multi-year couplings (e.g., a small draw this year that opens a Roth conversion window next year), so the DP optimizer still wins on households with multiple binding constraints across the horizon. For products serving the mass-affluent retirement segment, the rules tier is often the right cost/benefit.

What this means for synthetic test data

Test data for drawdown-sequencing engines has to include:

  • Households at every IRMAA bracket boundary
  • Pre-Medicare households at every FPL band
  • Households with significant capital-loss carryforwards (changes the LTCG calculation)
  • Households with each binding-constraint combination (no constraint, IRMAA only, ACA only, NIIT only, multiple)
  • Multi-account households with realistic balance ratios across taxable / tax-deferred / Roth
  • Lifetime gift histories (changes the wealth-transfer math)
  • State-tax variations
  • Pre-RMD and post-RMD years

A test corpus that doesn't span this matrix produces an engine tested only on the easy cases. The hard cases are where the optimization differs from the textbook rule and where the bugs ship.

Key takeaways

  • The textbook 'taxable first, tax-deferred next, tax-free last' rule is correct for ~10–20% of household-years and wrong for the majority where any other constraint binds.
  • RMDs force a specific sequence regardless of optimization. Engines have to plan around the forced flow.
  • IRMAA bracket cliffs, NIIT thresholds, ACA premium subsidies, and LTCG stacking each add non-linearities to the tax function. Each binds for some household types.
  • A real optimizer uses dynamic programming over a discrete grid of per-class withdrawal amounts. Runs in seconds; the engineering is the inputs and the constraint logic.
  • Bracket-aware sequencing is a reasonable middle ground for consumer-grade tools — captures most of the optimizer's value without the DP machinery.
  • Test data needs to span the full matrix of binding constraints. Engines tested only on the no-constraint case never exercise the hard code paths.

Frequently asked questions

How does the optimization handle estate / heir considerations?+
Substantially. The Roth IRA passes income-tax-free to non-spouse beneficiaries (subject to the 10-year distribution rule). The traditional IRA passes with the deceased's basis but the beneficiary pays income tax on distributions at their own rate. If beneficiaries are in higher brackets than the owner, the owner should draw down tax-deferred faster (recognizing tax at owner's lower rate) and preserve Roth for heirs. The optimization weights this by life expectancy and beneficiary tax-rate distributions. Most optimizers treat heir tax rates as inputs.
What about charitable giving — does it change the sequence?+
Yes. QCDs from tax-deferred (post-70.5) reduce RMD requirement and AGI simultaneously, dominating the alternatives for charitable retirees. Donor-advised fund contributions of appreciated taxable securities avoid capital gains while providing a deduction. The optimization should incorporate planned charitable amounts and choose the source that maximizes total after-tax wealth (including the donation's tax benefit).
How do we test sequencing recommendations against actual outcomes?+
Two approaches. Backtesting: simulate the recommendation across historical return paths and compare to alternatives (textbook, naive proportional). Forward sensitivity: report recommendation under multiple assumptions for tax rates, returns, and life expectancy. Both have value. Backtesting validates the framework; sensitivity quantifies the recommendation's robustness. Production tools should report both.
What if the household has annuity income — does the sequence change?+
Annuity income is generally fixed and uses the exclusion ratio (or LIFO for non-qualified deferred annuities). The optimization treats the annuity as deterministic income and optimizes the discretionary withdrawals around it. Annuities generally reduce the value of the optimization (more deterministic income = fewer discretionary decisions) but don't eliminate it — IRMAA and other thresholds still bind on the household's combined income.