wealthschema/data sets/crypto-defi-tax-pack
All Data Sets

Crypto & DeFi Tax Pack

Crypto tax compliance has gone from a niche to a mandate. The 1099-DA form is rolling out, the IRS has both the data feed from exchanges and the staffing to enforce reporting, and every wealth platform serving HNW clients now needs to handle digital-asset positions alongside traditional brokerage. The complexity is genuine: cost basis spans self-custody and exchange-custody wallets, DeFi positions generate income that doesn't fit a 1099-INT or 1099-DIV mold, NFT sales create inventory-vs-investment ambiguities, and airdrops are taxable at receipt at fair market value. The Crypto & DeFi Tax Pack is 50 crypto-heavy households built specifically for the tax-tech tools that need to handle this without falling back to manual reconstruction.

Households
50
Archetypes
3
Formats
JSON, CSV, Parquet
Deviation
High

Why this Data Set exists

Most crypto tax tools were built when 'crypto tax' meant 'compute capital gains on Coinbase trades.' That world is over. Today's crypto-heavy household has positions across five exchanges, three self-custody wallets, two DeFi lending protocols, an LP position on a DEX, staked ETH, and a smattering of NFTs that may or may not be inventory depending on facts and circumstances. Reconciling cost basis across these is genuinely hard — and the IRS reports an audit backlog of crypto-related returns that the new 1099-DA matching is going to dramatically expand.

For builders of tax tools, the problem is structural: production tax-engine fixtures are dominated by the W-2 + brokerage modal customer. Crypto-heavy households are sparsely represented or absent. You can't validate 1099-DA matching logic against a corpus that doesn't include realistic 1099-DA mismatches; you can't validate DeFi yield-income classification against a corpus that doesn't include realistic LP-position structures.

This Data Set fills the gap. 50 crypto-heavy households with the structural depth real crypto tax work requires: wallet inventories with self-custody flags, lot-level cost basis with FIFO and specific-ID methods, DeFi position structures (liquidity provision, staking, lending), NFT inventories with disposition history, airdrop and hard-fork events with FMV at receipt, and the 1099-DA reconciliation fields needed to match exchange reporting.

Use Cases

Crypto tax reporting validation
1099-DA reconciliation
Cost basis tracking across wallets
DeFi yield income modeling

Who uses this Data Set

Crypto Tax Software Builder

Validates the platform's cross-wallet cost-basis tracking, DeFi income classification, and 1099-DA reconciliation against 50 households whose structures span the realistic range — from self-custody-only Bitcoin maximalists to DeFi-heavy yield farmers with LP positions across multiple protocols.

Tax Engine Engineer at a Wealth Platform

Tests the platform's ability to integrate digital-asset cost basis alongside traditional brokerage holdings for HNW clients, ensuring TLH and rebalancing logic doesn't silently mishandle crypto positions.

Crypto-Aware CPA Building Internal Tools

Validates the firm's crypto tax workflow against realistic clients before tax season hits. Surfaces where the firm's reconciliation tooling falls short — particularly for DeFi LP positions and the staking-rewards-as-income classification.

Audit-Defense Consultant

Walks through 1099-DA reconciliation cases — where exchange-reported amounts differ from taxpayer-reported amounts due to cost-basis-method choice or transfer history — using realistic households where the discrepancy is pre-structured for analysis.

DeFi Protocol Tax Reporting Builder

Tests the protocol-level reporting logic (yield distributions, LP-share basis tracking, governance token distributions) against households whose interactions with the protocol exhibit the realistic complexity of multi-protocol participants.

What's inside

The 50 households cluster around three crypto-heavy archetypes: B-02 Overconfident DIY Investors (with concentrated crypto positions that mirror the equity-concentration risks the bundle name suggests), N-01 Crypto-Heavy Portfolios (long-time crypto holders with diversified digital-asset allocations), and CR-01 Crypto-Heavy / DeFi Investors (specifically the DeFi-active subset). The mix is deliberately diverse: about 30% are spot-only (Bitcoin + ETH dominant), 35% are diversified across major-cap altcoins and stablecoins, and 35% are DeFi-active with at least one LP position, staking position, or lending-protocol deposit.

Each household carries the structured wallet inventory: every wallet identified as self-custody (hardware wallet, software wallet, custody platform) or exchange-custody (named exchange or generic exchange custody), with the address-or-account identifier hashed for privacy. Lots are tracked at the position level: acquisition date, cost basis, cost-basis method (FIFO dominant for older holdings, specific-ID for tax-aware traders), and current market value. DeFi positions include the protocol name, the position type (LP, staking, lending, borrowing), the deposit history, the accrued yield, and the current value. NFT positions are structured separately with the inventory-vs-investment classification flagged. Airdrops and hard-fork events appear in the events log with FMV at receipt.

The Data Set ships as JSON, CSV, and Parquet (Parquet is recommended for analytical work over the lot-level data — there are roughly 1,200 lots across the 50 households, so the columnar format makes ad-hoc queries fast). The WealthSynth Methodology PDF documents the wallet taxonomy, the DeFi position structures, the 1099-DA reconciliation methodology, and the calibration sources for typical crypto-heavy household profiles (a blend of Chainalysis, Glassnode, and IRS Cyber Crime Unit research).

Preview a sample household

A redacted summary of one household from this Data Set — names, employers, exact balances, and metro area are stripped. Ages are bucketed, income and net worth are reported as bands. The full record (and all 50 like it) ships in the ZIP.

B-02·Overconfident DIY Investor
representative archetype household
Household
Domestic Partnership
State
TX
Gross income (band)
$100k–$200k
Net worth (band)
Dependents
2
Income source types
w2 salary, w2 bonus
Members (4)
primary
Age 40–44
retail
spouse
Age 40–44
professional services
dependent
Age 15–19
dependent
Age 5–9

Technical Highlights

Wallet inventory with self-custody flag
Lot-level cost basis (FIFO + specific-ID)
DeFi position modeling (LP, staking, lending)
1099-DA reconciliation fields

Sample Schema Fields

sample_record.json
{
  "assets.crypto.wallets[]": <value>,
  "assets.crypto.lots[]": <value>,
  "assets.crypto.defi_positions[]": <value>,
  "income.staking_income": <value>,
  "taxes.form_8949_records": <value>
}

Sample queries

Find DeFi LP positions with impermanent-loss exposure

Returns LP positions where the constituent token prices have diverged by more than 25% since the deposit, indicating significant impermanent loss that the tax tool needs to handle correctly when the position is closed.

households.flatMap(h =>
  h.assets.crypto.defi_positions.filter(p =>
    p.position_type === 'liquidity_provision' &&
    Math.abs(p.token_a_price_change_pct -
             p.token_b_price_change_pct) > 0.25
  )
)
Surface 1099-DA discrepancies

Returns tax-reporting events where the exchange-reported gross proceeds (1099-DA) differ from the taxpayer's calculated gross proceeds — the audit-defense queue. The discrepancy is typically driven by cost-basis method differences (exchange uses FIFO; taxpayer used specific-ID).

households.flatMap(h =>
  h.taxes.form_8949_records.filter(r =>
    Math.abs(r.exchange_reported_proceeds -
             r.taxpayer_calculated_proceeds) > 1
  )
)
Identify staking-income recognition events

Returns staking-yield events where rewards were received during the year — taxable as ordinary income at FMV at receipt under current IRS guidance. Useful for end-of-year income-recognition reconciliation.

households.flatMap(h =>
  h.assets.crypto.defi_positions
    .filter(p => p.position_type === 'staking')
    .flatMap(p => p.rewards_history.filter(r =>
      r.received_date.startsWith(currentTaxYear)))
)
Track airdrop and hard-fork income events

Returns airdrop or hard-fork token receipts during the year, with the FMV at receipt — the income-recognition events most often missed by taxpayers and most frequently flagged by IRS letters.

households.flatMap(h =>
  h.events.life_events.filter(e =>
    ['airdrop', 'hard_fork'].includes(e.type) &&
    e.date.startsWith(currentTaxYear)
  ).map(e => ({
    household: h.id,
    event: e,
    income_recognition_amount: e.fmv_at_receipt
  }))
)

Methodology

Each household's crypto profile is generated against archetype-specific portfolio profiles: spot-only Bitcoin maximalists vs. diversified altcoin holders vs. DeFi-active yield farmers. Wallet inventories are sampled to produce realistic distributions of self-custody vs. exchange-custody, with wallet-to-wallet transfer histories that produce the realistic 1099-DA matching complexity (where moving coins between wallets you control creates cost-basis tracking challenges that exchanges often fail to handle). DeFi position structures use realistic protocol mechanics: LP positions on Uniswap-style AMMs with constant-product math, staking positions on validator-delegate structures, lending positions on Aave-style protocols. NFT positions include a mix of investment-classified (held for appreciation) and inventory-classified (active trading) positions; the corpus distribution roughly matches IRS-published taxpayer-categorisation patterns. Airdrops and hard-forks are seeded at realistic per-year frequencies (about 1.5 events per crypto-active household per year on average). The corpus passes the WealthSynth consistency validator and LLM-as-judge gate. Annual refresh tracks IRS interpretive guidance updates and the rolling 1099-DA implementation status.

Included Archetypes (3)

Frequently asked questions

How is the 1099-DA reconciliation methodology calibrated?+

The 1099-DA matching logic in the corpus reflects the IRS's published methodology for the 2025 phased rollout. The structured `taxpayer_calculated_proceeds` vs. `exchange_reported_proceeds` fields let your tools test the most common discrepancy sources: cost-basis method choice (FIFO vs. specific-ID), wallet-to-wallet transfer cost-basis tracking, and the exchange-versus-taxpayer view of gross-vs-net proceeds.

Are stablecoin positions treated as cash or property?+

Per current IRS guidance, all crypto positions including stablecoins are treated as property. The corpus structures stablecoin holdings with cost basis tracking (relevant where a stablecoin temporarily depegs and triggers a sale at non-$1 price). Households holding only stablecoins for working-capital purposes still have lot-level structure for the rare events where the cost-basis matters.

Does the wash-sale rule apply to crypto in this Data Set?+

No. Per current IRS guidance, the wash-sale rule does not apply to crypto (since crypto is not a 'security' for §1091 purposes). The corpus reflects this — there are no wash-sale flags on crypto positions. If Congress changes this (proposed legislation has been floated multiple times), a future corpus refresh will incorporate the change.

How are DeFi LP positions structured?+

LP positions carry the protocol name, the constituent tokens, the deposit history (initial deposit and any subsequent additions), the accrued LP-fee yield, the current value of the LP position, and the constant-product math indicating impermanent loss. When the LP is closed, the structured data supports the gain/loss calculation correctly under either the dispose-of-LP-tokens-as-asset approach or the underlying-token approach (which currently has interpretive ambiguity).

Are NFTs treated as investment or inventory?+

Each NFT position has a `classification` field with values 'investment' (held for long-term appreciation, taxed at LTCG rates), 'inventory' (active trader, ordinary-income treatment), or 'collectible' (the 28% LTCG rate for collectibles applies). The classification mirrors the IRS's increasingly-clarified position on NFT taxation. About 60% of NFT positions in the corpus are classified as collectibles, 30% as investment, 10% as inventory.

What about Section 1031 like-kind exchange treatment?+

Section 1031 like-kind exchange treatment for crypto-to-crypto trades was disallowed effective 2018 (only real estate qualifies). The corpus reflects this — no crypto-to-crypto trades in the corpus claim §1031 deferral, regardless of when the original position was acquired.

Does the corpus include staking in proof-of-stake networks?+

Yes. About 40% of the corpus has at least one staking position (typically ETH 2.0 staking, Solana, Polygon, or Cosmos). The staking rewards are recognised as ordinary income at FMV at receipt, per the IRS Rev. Rul. 2023-14 position. The structured `rewards_history` field on each staking position lets your tools enumerate and aggregate the income-recognition events.

How does this fit alongside B02 (TLH)?+

B02 focuses on traditional taxable-account TLH (lot-level cost basis on equities). B21 focuses on the digital-asset side. They're complementary: a HNW client with both brokerage and crypto positions benefits from both bundles. Tax-tech tools serving the modern HNW client typically need both — buying B02 + B21 together is the common pattern.

Related Wealth Data Sets

$6,000
one-time purchase
50 households (ZIP)
Methodology PDF
JSON, CSV, Parquet formats
Account required to purchase

Purchases are for internal use only. Redistribution or resale of data is prohibited under the WealthSchema Data License.

View data license →