wealthschemaresourcesarticlesModeling corporate actions in synthetic portfolios
Article

Modeling corporate actions in synthetic portfolios

A 2-for-1 split, an ABC→XYZ merger, a partial-share spinoff with cash-in-lieu. Each propagates basis, holding-period, and tax-lot changes through every system touching the position. Mock-data tools generate static positions; none of them generate the propagation.

WealthSchema StaffPipeline architectureMay 9, 20268 min read

A corporate action is any event a company takes that changes the holder's position without a buy or sell on the open market. Stock splits, cash dividends, return-of-capital distributions, mergers (cash, stock, and mixed), spinoffs, rights offerings, tender offers, and reverse splits are all corporate actions. They are also the events that mock-data tools — Faker, Mockaroo, generic SDV configurations — cannot generate, because the events do not change a single record so much as they propagate changes across every system of record that touches the position.

A synthetic dataset that does not model corporate actions can be useful for one-shot validation. It cannot be used for any backtest longer than a few months, any tax-aware sell engine, or any reporting platform that has to produce a Form 1099 at year-end. This article is the data shape your engine actually needs.

Why a single position can't carry a corporate-action history

The intuitive model — store the position, record the corporate action as a property of the position — fails on the first multi-event security. A single ETF position will routinely see four to six distinct events in a year: ex-dates for quarterly distributions (some ordinary income, some return-of-capital, some long-term capital-gain distribution), an annual rebalance that's reported to holders as a partial-redemption notice, and possibly a special distribution if the fund family had an unusual year. Storing each of these on the position itself produces a mutable record that both your tax engine and your performance engine fight over.

The working pattern is event-sourced corporate actions: a separate table of corporate-action events keyed by symbol and date, and a derivation step that projects each event onto the lots of the position holding that symbol. The lots store the consequences of corporate actions (adjusted basis, share-count changes); the corporate-action log stores the events themselves.

// Corporate-action event (one record per symbol-event)
{
  "ca_id": "CA-2025-VTI-2025-09-29",
  "symbol": "VTI",
  "ca_type": "qualified_dividend",
  "ex_date": "2025-09-29",
  "record_date": "2025-09-30",
  "pay_date": "2025-10-15",
  "amount_per_share": 0.7325,
  "tax_treatment": "qualified",
  "currency": "USD"
}

// Corporate-action event (a 2-for-1 split)
{
  "ca_id": "CA-2025-NVDA-SPLIT-1",
  "symbol": "NVDA",
  "ca_type": "stock_split",
  "ex_date": "2025-06-10",
  "split_ratio": "2:1",
  "fractional_share_treatment": "cash_in_lieu"
}

The five action types every wealth dataset has to model

A minimum-viable synthetic dataset for any time-series-aware downstream system has to model the events below. The table is the schema-level summary; the rest of the article is the per-action detail.

 ActionEffect on sharesEffect on basisEffect on cashTax event
Stock split (forward)× ratio (2:1, 3:2)÷ ratio per share, total unchangedPossible cash-in-lieu for fractionalNone at split itself
Reverse split÷ ratio (1:10)× ratio per share, total unchangedCash-in-lieu for fractional remainderOn cash-in-lieu portion
Cash mergerPosition closesRealized gain/loss on full positionCash equal to deal price × sharesYes — capital gain on closure
Stock mergerReplaced by acquirer shares × exchange ratioCarries over with allocationPossible cash-in-lieu for fractionalGenerally no, with exceptions
SpinoffNew ticker, allocated sharesAllocated by FMV ratioPossible cash-in-lieuGenerally no, basis is allocated
Special / RoC distributionNo share changeReduced by RoC portionCash distribution to holderDeferred until basis = 0
Rights offeringNo immediate change; rights vestPossible basis allocation if rights tradedNone until exercised or soldOn sale of rights or exercise

Stock splits and reverse splits

The straightforward case. A 2-for-1 split doubles share count and halves per-share basis; the holder is in the same position economically. The complication is fractional shares: most brokerages issue cash in lieu of a fractional residual, which is a taxable event on the residual. A 3-for-2 split applied to a 100-share position produces 150 shares cleanly. A 3-for-2 applied to 17 shares produces 25.5 shares, which most brokers settle as 25 shares plus cash equal to the half-share's market value — and that cash is a sale of a fractional share at the post-split price.

Reverse splits create the same fractional-share residual but in larger quantities. A 1-for-10 reverse split on a 47-share position produces 4 shares plus cash for 0.7 shares. Engines that ignore fractional cash-in-lieu silently lose the basis allocation — and on positions that participate in multiple reverse splits over several years (common in distressed equities), the basis discrepancies compound until the position's reported gain on a future sale is wrong by 10–20%.

Cash mergers — the orphaned-position trap

A cash merger is the cleanest action to model in principle: the position is sold at the deal price on the close date, the holder receives cash, and there is a realized gain or loss against the lots' basis. The trap is what happens to the position record afterward.

A naive model leaves the position in the household's account record with quantity zero and a "closed" flag. This works until the next monthly snapshot, when a downstream report tries to fetch current price for "the holder's positions" and either errors out (because the symbol is no longer tradable) or pulls a stale price (because the price feed retained the symbol after delisting). Either failure mode appears in production systems that worked fine against mock data and broke on the first live merger.

The correct model is to remove the position from the active record on the merger close date and write an immutable event row to a separate position_history table. The lots involved in the realized gain are written to the realized-gain ledger. The household's net worth on the next snapshot reflects the cash credit, not a phantom position.

// Position lifecycle on a cash merger
// State at T-1 (before merger):
{ "account_id": "A-001", "symbol": "ACQUIREE", "shares": 100, "lots": [...], "status": "active" }

// State at T+0 (merger close):
// 1. Lots realized at deal price; entry written to realized_gains ledger
// 2. Position record updated to status: "closed_merger"; written to position_history
// 3. Cash account credited with deal price × shares
// 4. Active position record removed

// State at T+1 (next monthly snapshot):
// Active positions: [...] (no ACQUIREE entry, even with shares: 0)
// Position history: [..., { closed_merger event }]
// Cash: increased by exactly merger proceeds

Stock and mixed mergers — exchange ratios and cash-in-lieu

A stock merger replaces shares of the acquired with shares of the acquirer at the exchange ratio. A mixed merger adds a cash component. Both produce one trap that all mock-data tools share: the cost basis carries over to the new position with allocation rules, and the holding period carries over from the original lots. Engines that re-clock the holding period at the merger close date silently convert long-term lots into short-term — which can flip a gain from 15% LTCG to 37% short-term ordinary on a planned sale six months later.

The allocation rule for a mixed-cash-and-stock merger is governed by §354/§356 and is the source of the most subtle bug: the cash portion is taxable to the extent of the gain, but the stock portion's basis is the original basis less the cash received plus the gain recognized. Working that out correctly across thousands of lots is the job of the lot engine, but the data model has to provide the lot engine with the original basis, the deal terms, and the holding-period start date — not just the post-merger position.

Spinoffs — basis allocation by FMV ratio

A spinoff distributes shares of a subsidiary to existing shareholders, typically in a fixed ratio. The parent's basis is allocated between the parent and the spinoff in the ratio of fair-market values on the spinoff date. The lot's holding period carries to the spinoff shares.

The synthetic-data requirement is the FMV-ratio history. Your dataset has to include both the parent's price and the spinoff's price on the distribution date — which means the price history of both has to start on (or before) the spinoff date for the spinoff. Mock-data tools that generate independent price series for each ticker can produce a spinoff dataset where the basis allocation is mathematically consistent on day one and drifts by the next snapshot because the two price series are uncorrelated. Real spinoff price action is highly correlated for the first several months as arbitrageurs unwind the parent-spinoff combined trade.

Formula
Spinoff cost-basis allocation
B_parent_new = B_parent_old × (FMV_parent ÷ (FMV_parent + r × FMV_spinoff))
B_parent_new
= post-spinoff parent basis per share
B_parent_old
= pre-spinoff parent basis per share
FMV_parent
= fair-market value of parent on distribution date
FMV_spinoff
= fair-market value of spinoff on distribution date
r
= spinoff ratio (spinoff shares per parent share)
Example
Pre-spinoff basis $50/share. Distribution: 1 spinoff per 4 parent. Parent FMV $80 post-spinoff, spinoff FMV $40. New parent basis: 50 × (80 / (80 + 0.25 × 40)) = $44.44. Spinoff basis per share: 50 × ((0.25 × 40) / (80 + 0.25 × 40)) ÷ 0.25 = $22.22.

Return-of-capital distributions and the basis-zero edge

A return-of-capital (RoC) distribution is cash paid to a shareholder that the fund or company designates as a return of the holder's investment, not income. RoC reduces the lot's basis without producing immediate taxable income. When the basis hits zero, further RoC becomes capital gain (long- or short-term per holding period). REITs, MLPs, and BDCs all routinely distribute RoC; ordinary corporate stocks rarely do.

The synthetic-data requirement is twofold. First: the dataset has to flag distribution character correctly — the same $0.50 quarterly distribution can be qualified dividend, ordinary income, or RoC depending on the issuer's accounting. Second: the dataset has to age positions long enough that the basis-to-zero edge actually triggers in some lots. A REIT held for 15 years with full RoC reinvestment will have lots with basis below the original purchase price by the 7th or 8th year; lots reaching zero basis are a real production data shape that mock data essentially never produces.

// Quarterly distribution with mixed character (post-year-end reclassification)
{
  "ca_id": "CA-2025-O-DIV-Q3",
  "symbol": "O",
  "ca_type": "distribution",
  "pay_date": "2025-09-15",
  "amount_per_share": 0.788,
  "character_breakdown": {
    "qualified_dividend": 0.412,
    "ordinary_dividend": 0.105,
    "return_of_capital": 0.241,
    "long_term_capital_gain": 0.030,
    "section_199A_dividend": 0.000
  }
}

The reconciliation contract

The single most important property of a corporate-action-aware synthetic dataset is what we call the reconciliation contract: at every monthly snapshot, the value of each holding has to equal the sum of the lot-level basis-plus-unrealized-gain calculations, the cash account has to reflect every distribution and cash-in-lieu event between snapshots, and the position-history table has to account for every closed position.

Every monthly snapshot in our 96-month longitudinal generation runs this reconciliation as a hard validation gate. If a household's position-snapshot value drifts from the lot-level reconstruction by more than $1, the household fails validation and is regenerated with a corrected corporate-action sequence. The same reconciliation is what your downstream engine has to run against any synthetic data you load — and the absence of that reconciliation is the single best diagnostic for whether a synthetic dataset has been built corporate-action-aware or not.

  1. T-1 mo
    Pre-event snapshot
    Position carries N shares at basis B; lot ledger sum equals position carrying value.
  2. Day 0
    Ex-date / event date
    Corporate-action event row written; lot adjustments computed and applied; position-history row written for closures.
  3. T+0
    Pay-date / settlement
    Cash account credited (distributions, cash-in-lieu, merger consideration). Position carrying value may differ from pre-event by exactly the action's economic value.
  4. T+1 mo
    Post-event snapshot
    Reconciliation gate: position value = sum of post-event lot-level values. Failure here triggers regeneration.

What this looks like in our catalog

Every household in the WealthSynth corpus carries a corporate-action history of its holdings across the 96-month longitudinal window, calibrated to issuer-specific real events: Apple's 2020 4-for-1 split, AT&T's 2022 WBD spinoff, Berkshire's perpetual non-distribution policy. A backtest run against the data sees the same corporate-action sequence the production system would have processed over the same window — and the lot-level reconciliation contract above runs as a hard validation gate at generation time, so a household whose lot ledger doesn't reconcile to its position ledger never makes it into the corpus.

For a deeper look at the lot ledger that consumes these events, see Lot-level basis tracking data model. For the wash-sale interaction (which corporate-action-mapped same-stock substitutions can trigger), see Wash-sale tracking algorithms across accounts. For the umbrella view of why time-series fidelity matters, see Time-Series Fidelity in Synthetic Wealth Data.