Tax-loss harvesting (TLH) is the practice of selling securities at a loss to offset realized gains, then re-establishing exposure with a non-substantially-identical security. It sounds simple. The reality at production scale — across thousands of accounts, hundreds of thousands of lots, with wash-sale rules that span 30 days before and after the sale — is anything but.
This theme covers what your TLH engine actually needs to model, and why aggregate position data is insufficient.
Why aggregate positions are not enough
A position-level snapshot tells you the holder owns 1,000 shares of VTI at a $215 cost basis. It does not tell you when each lot was acquired, which lots were transferred in from another account, which lots are short-term vs long-term, or whether any portion is subject to a wash-sale disallowance from a sale 22 days ago in a different account.
A working TLH engine needs lot-level data, not aggregates. Every lot carries:
acquisition_dateandcost_basisper shareholding_periodderivation (short vs long)wash_sale_disallowedcarrying flag and adjusted basis- Linked-account purchase history for cross-account wash-sale detection
- Special-status flags: QSBS (§1202), Section 1256 contracts, restricted stock
What the data shape looks like
A correct TLH lot record is roughly:
{
"lot_id": "L-2024-03-14-VTI-001",
"symbol": "VTI",
"shares": 50,
"acquisition_date": "2024-03-14",
"acquisition_method": "purchase",
"cost_basis_per_share": 218.42,
"holding_period": "short",
"wash_sale_disallowed": 0.00,
"wash_sale_adjustment_basis": null,
"qsbs_qualified": false,
"section_1256": false
}
Realistic synthetic data carries 50–300 lots per mass-affluent household, distributed across 3–8 accounts (taxable brokerage, IRA, Roth, HSA, spouse mirror). Equity-comp grants add another layer — see the equity comp theme.
Comparing TLH approaches
| Position-level | Lot-level | |
|---|---|---|
| Wash-sale detection | Impossible to compute correctly | Native — runs against the lot ledger |
| Holding period mix | Aggregate only | Per-lot, accurate to the day |
| Cross-account merge | Manual reconciliation | Single ledger keyed by tax ID |
| QSBS handling | Not modeled | Per-lot flag with §1202 5-year clock |
| Realistic for backtesting | No — masks real outcomes | Yes |
Edge cases your test data must cover
A serious TLH engine has to handle, and your test data has to exercise:
- Pre-existing wash-sale disallowances carried into the testing window from prior periods
- Spouse-account triggers where the wash-sale comes from a separate filing entity
- IRA-buyback disallowances (the loss is permanently disallowed, not deferred)
- Same-day buy-and-sell with intra-day price swings
- Corporate actions (splits, mergers, spin-offs) that re-base lot acquisition dates
- §1256 60/40 contracts that bypass the standard short/long-term rules
- QSBS lots approaching the 5-year holding requirement under §1202
Key takeaways
- Aggregate position data cannot drive a correct TLH engine — you must store lot-level acquisitions.
- Wash-sale rules apply taxpayer-wide, not account-local; merge all linked accounts before flagging.
- Cross-account replay (taxable → IRA → spouse) is the most common production bug.
- QSBS, §1256 contracts, and corporate actions are not edge cases — your test data has to include them or your engine will ship a regression.