Lot-level basis tracking across linked accounts — the data model

WealthSchema StaffTax modelingMay 8, 20264 min read

The single most common architectural mistake we see in wealth-tech engines is treating tax basis as a position-level field. A holder owns 1,000 shares of VTI at a cost basis of $215. The engine stores (account_id, security_id, shares, cost_basis, acquisition_date). The engine then makes decisions — TLH, wash-sale flagging, QSBS qualification, foreign-tax-credit allocation, charitable-gifting basis selection — that are wrong for any holder whose 1,000 shares are not actually 1,000 identical lots.

Real holders have 30–300 lots per position. The lots have different acquisition dates, different acquisition costs, different short-term vs long-term status, different wash-sale-disallowed adjustments, and sometimes different special-status flags (QSBS, Section 1042, ESPP qualifying disposition). A tax-aware engine has to operate on the lot, not the position. This article is the working note on the data model that makes that possible.

What's in a lot

The minimal lot record has more fields than most engines initially budget for. Our reference shape:

Formula

Minimal lot record

lot = {
  lot_id, account_id, security_id, owner_taxpayer_id,
  shares, acquisition_date, cost_basis_per_share,
  acquisition_method, holding_period_basis,
  wash_sale_disallowed_amount,
  special_status: [{type, params}],
  parent_lot_ids
}

lot_id: = Globally unique identifier; survives transfers and broker changes
owner_taxpayer_id: = Required for cross-account wash-sale aggregation; not the same as account_id
acquisition_method: = Purchase, gift, inheritance, RSU vest, ISO exercise, ESOP allocation, transfer-in
holding_period_basis: = Date that drives short/long-term classification; not always the acquisition_date (gifts inherit the donor's basis date)
special_status: = Array — a single lot can have multiple statuses (QSBS + ISO disqualifying, e.g.)
parent_lot_ids: = For lots created by partial sales, transfers, or wash-sale adjustments — the lots they descend from

The parent_lot_ids field is what most engines miss. Lots are not immutable — they can be split, merged, and adjusted. The lineage is important for audit (proving basis to the IRS) and for downstream calculations (e.g., long-term holding period of a basis-adjusted replacement lot).

Why position-level data fails

A tax-aware engine running on position-level data makes decisions on aggregates and ships errors that are invisible at the position level. The four canonical failure modes:

TLH selection. A position-level engine sees that the holder is at a $40 unrealized loss on 1,000 VTI. It selects the position for harvest. The position's actual lots are: 600 shares purchased above current price (loss), 400 shares purchased below current price (gain). The position-level loss is the sum, but harvesting requires selling the loss lots specifically. A position-level engine that "harvests the position" actually realizes $0 net or even a small gain.

Wash-sale detection. The IRS rule is that a loss is disallowed if substantially identical securities are purchased within 30 days before or after the sale, anywhere in the taxpayer's accounts including IRAs and HSAs. A position-level engine sees the wash-sale only at the position level — and only within an account. It misses cross-account triggers and cross-lot triggers within an account.

QSBS qualification. Section 1202 stock has to be held for more than 5 years to qualify for the exclusion. Per-lot holding period matters; the position-level "average holding period" is meaningless. An engine making 1042 / 1202 decisions on a position-level model is producing wrong answers for any holder with mixed-date lots.

FIFO vs specific-identification accounting. Brokers default to FIFO for cost basis reporting; specific identification produces lower tax in most cases. Engines that don't track lots can't implement specific identification.

The events that mutate basis

A working lot-level engine has to handle the events that change basis. The inventory:

Lot-mutating events

Acquisition (purchase, gift, inheritance, vest, exercise, transfer-in) — creates new lot.
Sale (full or partial) — destroys lot or creates child lot from remainder.
Wash-sale adjustment — increases the basis of replacement lot by the disallowed loss.
Stock split / reverse split — adjusts shares and basis-per-share, preserves total basis.
Spin-off — creates new lot at allocated basis from parent.
Cash dividend reinvestment — creates new lot at the reinvestment-date price.
Return of capital — reduces basis of every lot proportionally.
Section 1031 / 1033 exchange — basis carries from old asset to new.
Section 351 / 368 reorganization — basis carries through the corporate event.
Estate step-up — at owner's death, basis is reset to fair market value (or alternate valuation date).
Charitable gift — reduces holder's basis to zero (gifted out); donor's deduction is FMV up to limit.
Constructive sale (short against the box, etc.) — triggers immediate gain recognition with adjusted basis.

Each event is a code path in the engine. Engines that only handle "purchase" and "sale" produce wrong results for every other event. The most commonly under-handled events are wash-sale adjustments, return of capital, and estate step-up.

The linked-account structure

The IRS rules apply at the taxpayer level, not the account level. A loss in a taxable brokerage can be disallowed by a purchase in an IRA, an HSA, a 401(k), a spouse's account, or even a controlled corporation's account. The lot-level data model needs to support cross-account aggregation.

The data model that supports cross-account aggregation has a taxpayer or household_unit entity that owns multiple accounts. Each lot's owner_taxpayer_id points to the unit, and wash-sale detection runs queries across all lots in all accounts of the same unit within the relevant 60-day window.

Layer 1
Account
Brokerage, IRA, 401(k), HSA, joint trust, custodial — each with its own custodian relationship and tax form.
Layer 2
Taxpayer
An individual who owns one or more accounts. Has SSN/ITIN; this is the IRS's unit for reporting.
Layer 3
Household unit
Family with shared tax filing (MFJ). For wash-sale purposes, married filers' accounts are aggregated; for other purposes (Roth IRA, HSA), each spouse is independent.
Layer 4
Controlled-entity ring
S-corps, partnerships, and trusts the household controls. Wash-sales between household and these entities are still disallowed if the relationship triggers IRC §267 attribution.

A real engine's data model has all four layers explicit. Many engines have only the first two and produce subtle bugs at the third and fourth layers.

Special-status flags

Some lots carry special tax status that changes their treatment. The flags that matter most:

	Trigger	Effect on engine logic
QSBS (Section 1202)	Pre-IPO C-corp founder stock + 5-year hold	Up to $10M (or 10x basis) excluded from gain. Engine has to track 5-year clock + acquisition method.
Section 1042 ESOP rollover	Sale of C-corp stock to ESOP + reinvestment in qualified replacement property	Defers gain indefinitely. Engine has to track replacement property and disposition events.
ISO statutory holding period	ISO exercised + held 2 years from grant + 1 year from exercise	Qualifying disposition treats entire gain as LTCG. Engine has to track grant date, not just exercise date.
ISO disqualifying disposition	Above holding violated	Spread at exercise becomes ordinary income; further appreciation is capital. Engine has to bifurcate the basis.
ESPP qualifying disposition	§423 plan + holding period satisfied	Lower tax treatment; bifurcation between ordinary and capital. Engine has to track grant date and discount.
Section 1244 small-business stock	Original-issue stock from qualifying small corp + < $1M corporate cap	Loss treated as ordinary up to $50K/$100K. Engine has to track issuance status.
Section 1256 contracts	Regulated futures, foreign currency contracts, etc.	Mark-to-market at year-end with 60/40 split. Engine has to track the contract type.

Each flag is a multi-year tracking commitment. An engine that supports only the first two flags can't claim QSBS coverage; an engine that supports all of them needs systematic test data with each flag exercised.

What the data model means for synthetic test data

Test data for a lot-level engine has to include:

Households with 30–300 lots per major position (matching real-world distribution)
Lots with different acquisition methods (purchase, gift, inheritance, vest, exercise)
Wash-sale-adjusted lots with adjusted basis and adjusted holding-period
Cross-account scenarios (IRA + taxable + HSA + 401(k)) with potential cross-account triggers
Special-status lots: QSBS pre-5-year, QSBS post-5-year, ISO pre-qualifying, ISO disqualifying, ESPP, Section 1244
Estate step-up scenarios (lots that were inherited, with stepped-up basis)
Multi-broker scenarios (lots transferred between brokers, with original acquisition data preserved)

A general-purpose synthetic financial corpus that lacks lot-level resolution is unusable for tax-aware engine testing. The engineering investment to produce lot-level synthetic data is meaningfully larger than position-level — but that investment is exactly the difference between a corpus that exercises the engine's full code path and one that doesn't.

Key takeaways

Position-level data is insufficient for any tax-aware engine. Real holders have 30–300 lots per position with different acquisition dates, costs, and special statuses.
The minimal lot record has a dozen+ fields including parent_lot_ids for tracking lineage through wash-sale adjustments, splits, and corporate events.
A working engine has to handle 12+ lot-mutating events. Most engines handle only purchases and sales, producing wrong results for everything else.
Wash-sale rules apply taxpayer-wide, not account-locally. A loss in a taxable account can be disallowed by an IRA purchase. Engines without cross-account aggregation silently ship wrong harvest numbers.
Special-status flags (QSBS, Section 1042, ISO, ESPP, Section 1244, Section 1256) each require multi-year tracking. Each unsupported flag is a category of household the engine cannot serve correctly.
Synthetic test data for lot-level engines needs lot-level resolution with realistic distributions of lot counts, acquisition methods, special statuses, and cross-account scenarios.

Frequently asked questions

What about lot-tracking for crypto and DeFi?+

Crypto requires the same lot-level discipline plus additional events (hard forks, airdrops, staking rewards, liquidity-pool token receipts). The IRS treats each receipt as a basis event. Engines that worked for traditional securities have to be extended with the crypto-specific events; they typically cannot just be re-pointed at the new asset class. Crypto lot tracking is its own engineering project, not a small extension of equity lot tracking.

How does the data model interact with broker-supplied 1099-B data?+

Brokers supply per-disposition data with cost basis and holding period. The engine should reconcile broker data with internal lot-level data and flag discrepancies — broker basis is sometimes wrong (e.g., on transferred-in lots where the original acquisition data was lost), and the taxpayer is allowed to use correct basis with appropriate documentation. Engines that just trust 1099-B data inherit broker errors.

What's the right approach for engines that historically operated on position-level data?+

Migration is meaningful work. The best pattern we've seen is a parallel lot-level data store with bridge logic that produces position-level outputs for compatibility. New code paths use lot-level; old code paths can be migrated incrementally. Big-bang rewrites of basis logic are risky because the test surface is huge and the failure modes are silent (wrong tax, not crashed engines).

Are there simpler alternatives for engines that only need approximate basis?+

Sometimes. Engines that only need approximate basis for portfolio analytics (not tax decisions) can sometimes use position-average basis with explicit error bars. Engines that drive tax decisions cannot — there is no defensible 'approximate basis' for IRS reporting. The decision of whether to invest in lot-level data is largely the decision of whether the engine drives tax outcomes.