wealthschemaresourcesarticlesBuilding a crypto / DeFi tax engine — every receipt is a basis event
Article

Building a crypto / DeFi tax engine — every receipt is a basis event

Hard forks. Airdrops. Staking rewards. Liquidity-pool tokens. Cross-chain bridges. NFTs. Each is a tax event with its own basis treatment, and the engine has to handle every one.

WealthSchema StaffTax modelingMay 9, 20262 min read

A traditional brokerage tax engine has a clear taxonomy of events: buy, sell, dividend, split, return-of-capital. Crypto tax engines start with that vocabulary and discover a half-dozen new event types within the first month of real customer data — and a dozen more within the first year. The engine that handles all of them is a substantially more complex product than a securities tax engine; the engine that doesn't ships customers wrong tax filings.

This article is the working note for engineering teams building crypto / DeFi tax engines. The events that matter, the basis-tracking complications DeFi introduces, and the synthetic-data shape needed to test the engine's coverage of the long tail.

What "every receipt is a basis event" means

In traditional securities, basis is established at acquisition and adjusted at a few defined events (splits, returns of capital, wash-sale adjustments). In crypto, basis is potentially affected by every receipt — and "receipt" includes far more than purchases.

 EventTax treatment at receiptBasis at receipt
Purchase (fiat → crypto)Non-taxable acquisitionUSD spent + fees
Trade (crypto → crypto)Taxable disposition of source assetFMV at trade for received asset
Hard forkOrdinary income at FMV (per Rev. Rul. 2019-24)FMV at receipt
AirdropOrdinary income at FMVFMV at receipt
Staking rewardsOrdinary income at FMV (per Jarrett v. United States, IRS Notice 2014-21)FMV at receipt
Mining rewardsOrdinary income at FMV (Self-employment if business)FMV at receipt
Liquidity-pool token receiptDisposition vs. non-disposition is unsettledMultiple defensible treatments
Wrapped token (e.g. WETH ↔ ETH)Most likely non-taxable; conservative treatment as taxableCarry-over basis (most common interpretation)
Cross-chain bridgeTaxable if bridge is custodial; non-taxable if non-custodialCarry-over basis if non-taxable
NFT mintGenerally non-taxable (cost-of-goods)Cost of minting + gas
NFT receipt as giftNon-taxable to recipient; donor's gift-tax considerationsCarry-over basis from donor
Wash-trading / disposition for capital lossCapital loss; wash-sale rules apply (post-2024 final regs)Loss disallowed if wash-sale triggered

Each row is a code path. The engine that handles only purchase + trade has gaps for every other row.

The basis-tracking model

A crypto basis-tracking model has to extend the traditional lot-level model with crypto-specific fields:

Formula
Crypto lot record
crypto_lot = { lot_id, wallet_id, asset_id, network_id, shares, acquisition_date, basis_per_share, acquisition_method, fmv_at_acquisition, parent_lot_ids, network_specific: { transaction_hash, block_number, on_chain_metadata }, special_status: { is_staking_reward, is_airdrop, is_fork, associated_position (LP token), nft_metadata } }
wallet_id
= Identifier for the holding wallet — multiple wallets per user, each tracked separately for custody and movement events
network_id
= Blockchain — Ethereum, Bitcoin, Solana, etc. Cross-chain movement is a tax event-relevant transition.
fmv_at_acquisition
= Fair market value at the moment of receipt — required for ordinary-income events (forks, airdrops, staking)
transaction_hash
= On-chain identifier for the specific transaction creating this lot
associated_position
= For LP tokens — the underlying position they represent
The on-chain metadata isn't just an audit trail — it's the source-of-truth for the events that affect basis. Engines that don't capture transaction hashes can't reconcile their basis calculations against blockchain history if disputed.

The DeFi-specific complications

DeFi protocols introduce events that don't have securities analogs. The hardest:

DeFi complication inventory

  • Liquidity pool deposits — depositor receives LP tokens representing the pool position. Whether deposit is a taxable disposition is unsettled; conservative treatment as disposition; aggressive treatment as non-disposition. Engine has to support both interpretations.
  • Liquidity pool withdrawals — symmetric to deposit. Returns of pool tokens for underlying assets (rebalanced for protocol fees and impermanent loss).
  • Yield-farming compound rewards — periodic claim of rewards (yield) from staking LP tokens. Each claim is an ordinary-income event at receipt FMV.
  • Flash loan interactions — within-block borrow + transact + repay. Whether this creates basis events on the borrow side is contested; most engines treat as non-events.
  • Governance token receipts — protocol governance distributions, similar to airdrops. Some are pre-claimed (delivered automatically); some require active claiming, which itself may be a tax event.
  • Synthetic / wrapped derivatives — exposure to underlying assets without ownership. Generally treated as constructive ownership for tax purposes, but the line is unclear.
  • Bridges and rollups — cross-chain or layer-2 transfers. Custodial bridges typically taxable; non-custodial bridges typically non-taxable; verifying which is which is non-trivial.
  • Token migrations — V1 to V2 token swaps mandated by the protocol. Generally non-taxable carryover basis; depends on whether the swap is mandatory and the underlying value is preserved.
  • Slashing — proof-of-stake validators losing staked tokens for misbehavior. Capital loss event.
  • Maximum extractable value (MEV) — searcher rewards from arbitrage. Ordinary income for the searcher; affects basis for affected pool participants.

A real engine has to support each of these — or explicitly disclaim coverage and require the user to handle the event manually with a dedicated tax preparer.

What synthetic test data needs to include

A synthetic crypto-tax test corpus has to span:

  1. Spread 1
    Asset coverage
    Bitcoin, Ethereum, top 50 by market cap, plus stablecoins. NFTs. Wrapped tokens. LP tokens. Staking-derivative tokens. Each has different tax treatment.
  2. Spread 2
    Network coverage
    Multi-chain holders are common — Ethereum + Solana + Bitcoin + L2s. Cross-chain bridge events are a category of edge case.
  3. Spread 3
    Activity types
    Pure HODL, active trader, DeFi yield farmer, NFT collector, staker, validator. Each profile has a different event distribution.
  4. Spread 4
    Lifetime patterns
    Year-1 portfolio (basis cleanly tracked), year-3 portfolio (multiple migrations / forks), legacy portfolio (pre-2018 acquisitions with incomplete records).
  5. Spread 5
    Edge cases
    Hard fork events. Airdrop receipts. NFT minting. Staking with re-staking compounded rewards. LP tokens through impermanent loss. Cross-chain bridge transitions. Slashing events.
  6. Spread 6
    Tax-jurisdiction variations
    US federal + state (varies). UK, Germany, Singapore, India each have distinct crypto tax regimes. Multi-jurisdiction users are increasingly common.

A test corpus missing any spread is a corpus where the engine has untested branches. Crypto-tax engines shipping production from incomplete corpora typically discover the gaps when the first crypto-experienced user files their return.

Key takeaways

  • Crypto tax engines have a substantially larger event taxonomy than equity engines — every receipt can be a basis event, and DeFi adds protocol-specific events that don't have securities analogs.
  • The basis-tracking model extends traditional lots with wallet, network, transaction-hash, and special-status fields. Engines that don't capture on-chain metadata can't reconcile their basis against blockchain history.
  • DeFi complications include LP deposits, yield farming, governance tokens, synthetic derivatives, bridges, token migrations, slashing, and MEV. Each is a distinct code path.
  • IRS Form 1099-DA reporting starts with 2025 transactions. Tax engines have to recompute basis from on-chain data because broker-side basis reporting will be incomplete in early years.
  • Test corpus has to span asset, network, activity-type, lifetime-pattern, edge-case, and jurisdiction dimensions. Missing any leaves untested branches that real users will eventually exercise.

Frequently asked questions

How do we handle pre-2018 records where on-chain data may be incomplete?+
The IRS allows reasonable reconstruction. Engines should support manual entry of pre-broker-reporting acquisitions with documentation requirements (exchange CSVs, wallet exports, third-party tax tools' historical data). The reconstruction quality affects audit defensibility — engines should flag low-confidence basis entries so the user knows their downstream filing carries audit risk.
What about holders who use centralized exchanges and DeFi simultaneously?+
Common — most active crypto users have multiple custody relationships. The engine has to ingest from multiple sources (exchange APIs / CSVs, wallet addresses, manual entries) and consolidate into a single per-user view. The consolidation involves reconciling acquisition events across sources and de-duplicating. This is engineering-heavy and where many engines have correctness issues; explicit user review of consolidated data is the standard mitigation.
Does the engine need to support institutional crypto custodians?+
If serving institutional users, yes. Coinbase Custody, Anchorage, BitGo, Fidelity Digital Assets all have specific data formats and event types. Institutional staking, yield products, and wrapped-token derivatives have different implementations than retail. The engine architecture has to support multiple custodian-specific adapters.
How important is supporting multi-jurisdiction tax rules?+
Increasingly important. The US, UK, Germany, Switzerland, Singapore, Australia, India, and Japan all have distinct crypto tax regimes; the EU is moving toward harmonization but member states vary. Engines serving global users need jurisdiction-specific tax modules layered on the universal event-tracking core. The synthetic test corpus has to include jurisdiction-specific scenarios for each supported jurisdiction.