The single most common architectural mistake we see in wealth-tech engines is treating tax basis as a position-level field. A holder owns 1,000 shares of VTI at a cost basis of $215. The engine stores (account_id, security_id, shares, cost_basis, acquisition_date). The engine then makes decisions — TLH, wash-sale flagging, QSBS qualification, foreign-tax-credit allocation, charitable-gifting basis selection — that are wrong for any holder whose 1,000 shares are not actually 1,000 identical lots.
Real holders have 30–300 lots per position. The lots have different acquisition dates, different acquisition costs, different short-term vs long-term status, different wash-sale-disallowed adjustments, and sometimes different special-status flags (QSBS, Section 1042, ESPP qualifying disposition). A tax-aware engine has to operate on the lot, not the position. This article is the working note on the data model that makes that possible.
What's in a lot
The minimal lot record has more fields than most engines initially budget for. Our reference shape:
lot = {
lot_id, account_id, security_id, owner_taxpayer_id,
shares, acquisition_date, cost_basis_per_share,
acquisition_method, holding_period_basis,
wash_sale_disallowed_amount,
special_status: [{type, params}],
parent_lot_ids
}- lot_id
- = Globally unique identifier; survives transfers and broker changes
- owner_taxpayer_id
- = Required for cross-account wash-sale aggregation; not the same as account_id
- acquisition_method
- = Purchase, gift, inheritance, RSU vest, ISO exercise, ESOP allocation, transfer-in
- holding_period_basis
- = Date that drives short/long-term classification; not always the acquisition_date (gifts inherit the donor's basis date)
- special_status
- = Array — a single lot can have multiple statuses (QSBS + ISO disqualifying, e.g.)
- parent_lot_ids
- = For lots created by partial sales, transfers, or wash-sale adjustments — the lots they descend from
Why position-level data fails
A tax-aware engine running on position-level data makes decisions on aggregates and ships errors that are invisible at the position level. The four canonical failure modes:
TLH selection. A position-level engine sees that the holder is at a $40 unrealized loss on 1,000 VTI. It selects the position for harvest. The position's actual lots are: 600 shares purchased above current price (loss), 400 shares purchased below current price (gain). The position-level loss is the sum, but harvesting requires selling the loss lots specifically. A position-level engine that "harvests the position" actually realizes $0 net or even a small gain.
Wash-sale detection. The IRS rule is that a loss is disallowed if substantially identical securities are purchased within 30 days before or after the sale, anywhere in the taxpayer's accounts including IRAs and HSAs. A position-level engine sees the wash-sale only at the position level — and only within an account. It misses cross-account triggers and cross-lot triggers within an account.
QSBS qualification. Section 1202 stock has to be held for more than 5 years to qualify for the exclusion. Per-lot holding period matters; the position-level "average holding period" is meaningless. An engine making 1042 / 1202 decisions on a position-level model is producing wrong answers for any holder with mixed-date lots.
FIFO vs specific-identification accounting. Brokers default to FIFO for cost basis reporting; specific identification produces lower tax in most cases. Engines that don't track lots can't implement specific identification.
The events that mutate basis
A working lot-level engine has to handle the events that change basis. The inventory:
Lot-mutating events
- Acquisition (purchase, gift, inheritance, vest, exercise, transfer-in) — creates new lot.
- Sale (full or partial) — destroys lot or creates child lot from remainder.
- Wash-sale adjustment — increases the basis of replacement lot by the disallowed loss.
- Stock split / reverse split — adjusts shares and basis-per-share, preserves total basis.
- Spin-off — creates new lot at allocated basis from parent.
- Cash dividend reinvestment — creates new lot at the reinvestment-date price.
- Return of capital — reduces basis of every lot proportionally.
- Section 1031 / 1033 exchange — basis carries from old asset to new.
- Section 351 / 368 reorganization — basis carries through the corporate event.
- Estate step-up — at owner's death, basis is reset to fair market value (or alternate valuation date).
- Charitable gift — reduces holder's basis to zero (gifted out); donor's deduction is FMV up to limit.
- Constructive sale (short against the box, etc.) — triggers immediate gain recognition with adjusted basis.
Each event is a code path in the engine. Engines that only handle "purchase" and "sale" produce wrong results for every other event. The most commonly under-handled events are wash-sale adjustments, return of capital, and estate step-up.
The linked-account structure
The IRS rules apply at the taxpayer level, not the account level. A loss in a taxable brokerage can be disallowed by a purchase in an IRA, an HSA, a 401(k), a spouse's account, or even a controlled corporation's account. The lot-level data model needs to support cross-account aggregation.
The data model that supports cross-account aggregation has a taxpayer or household_unit entity that owns multiple accounts. Each lot's owner_taxpayer_id points to the unit, and wash-sale detection runs queries across all lots in all accounts of the same unit within the relevant 60-day window.
- Layer 1AccountBrokerage, IRA, 401(k), HSA, joint trust, custodial — each with its own custodian relationship and tax form.
- Layer 2TaxpayerAn individual who owns one or more accounts. Has SSN/ITIN; this is the IRS's unit for reporting.
- Layer 3Household unitFamily with shared tax filing (MFJ). For wash-sale purposes, married filers' accounts are aggregated; for other purposes (Roth IRA, HSA), each spouse is independent.
- Layer 4Controlled-entity ringS-corps, partnerships, and trusts the household controls. Wash-sales between household and these entities are still disallowed if the relationship triggers IRC §267 attribution.
A real engine's data model has all four layers explicit. Many engines have only the first two and produce subtle bugs at the third and fourth layers.
Special-status flags
Some lots carry special tax status that changes their treatment. The flags that matter most:
| Trigger | Effect on engine logic | |
|---|---|---|
| QSBS (Section 1202) | Pre-IPO C-corp founder stock + 5-year hold | Up to $10M (or 10x basis) excluded from gain. Engine has to track 5-year clock + acquisition method. |
| Section 1042 ESOP rollover | Sale of C-corp stock to ESOP + reinvestment in qualified replacement property | Defers gain indefinitely. Engine has to track replacement property and disposition events. |
| ISO statutory holding period | ISO exercised + held 2 years from grant + 1 year from exercise | Qualifying disposition treats entire gain as LTCG. Engine has to track grant date, not just exercise date. |
| ISO disqualifying disposition | Above holding violated | Spread at exercise becomes ordinary income; further appreciation is capital. Engine has to bifurcate the basis. |
| ESPP qualifying disposition | §423 plan + holding period satisfied | Lower tax treatment; bifurcation between ordinary and capital. Engine has to track grant date and discount. |
| Section 1244 small-business stock | Original-issue stock from qualifying small corp + < $1M corporate cap | Loss treated as ordinary up to $50K/$100K. Engine has to track issuance status. |
| Section 1256 contracts | Regulated futures, foreign currency contracts, etc. | Mark-to-market at year-end with 60/40 split. Engine has to track the contract type. |
Each flag is a multi-year tracking commitment. An engine that supports only the first two flags can't claim QSBS coverage; an engine that supports all of them needs systematic test data with each flag exercised.
What the data model means for synthetic test data
Test data for a lot-level engine has to include:
- Households with 30–300 lots per major position (matching real-world distribution)
- Lots with different acquisition methods (purchase, gift, inheritance, vest, exercise)
- Wash-sale-adjusted lots with adjusted basis and adjusted holding-period
- Cross-account scenarios (IRA + taxable + HSA + 401(k)) with potential cross-account triggers
- Special-status lots: QSBS pre-5-year, QSBS post-5-year, ISO pre-qualifying, ISO disqualifying, ESPP, Section 1244
- Estate step-up scenarios (lots that were inherited, with stepped-up basis)
- Multi-broker scenarios (lots transferred between brokers, with original acquisition data preserved)
A general-purpose synthetic financial corpus that lacks lot-level resolution is unusable for tax-aware engine testing. The engineering investment to produce lot-level synthetic data is meaningfully larger than position-level — but that investment is exactly the difference between a corpus that exercises the engine's full code path and one that doesn't.
Key takeaways
- Position-level data is insufficient for any tax-aware engine. Real holders have 30–300 lots per position with different acquisition dates, costs, and special statuses.
- The minimal lot record has a dozen+ fields including parent_lot_ids for tracking lineage through wash-sale adjustments, splits, and corporate events.
- A working engine has to handle 12+ lot-mutating events. Most engines handle only purchases and sales, producing wrong results for everything else.
- Wash-sale rules apply taxpayer-wide, not account-locally. A loss in a taxable account can be disallowed by an IRA purchase. Engines without cross-account aggregation silently ship wrong harvest numbers.
- Special-status flags (QSBS, Section 1042, ISO, ESPP, Section 1244, Section 1256) each require multi-year tracking. Each unsupported flag is a category of household the engine cannot serve correctly.
- Synthetic test data for lot-level engines needs lot-level resolution with realistic distributions of lot counts, acquisition methods, special statuses, and cross-account scenarios.