Modeling Plaid, Yodlee, Akoya, and MX outputs in synthetic households

WealthSchema StaffIntegration patternsMay 9, 20266 min read

A wealth-tech platform that integrates with one or more aggregators sees the world through that aggregator's normalization layer. The underlying institution might be Schwab, Fidelity, Vanguard, a small RIA on Pershing's clearing platform, or a credit union with a Symitar core — but the data your platform receives is normalized to the aggregator's own schema, with all the loss-of-fidelity and edge-case-papering-over that implies. This article walks through what the four major US aggregators (Plaid, Yodlee, Akoya, MX) each return for investment and brokerage data, where their schemas diverge, and what your test data has to look like to exercise the integration honestly.

The four aggregators in one paragraph each

Plaid. The largest US aggregator by consumer-app integrations. Originally banking-focused; the Investments product launched in 2018 and has been the most-adopted aggregator endpoint for fintechs that need brokerage data without going custodian-direct. Output schema is documented; FDX-conformant for most institutions. Position-level data is provided, but lot-level basis is not consistently available even when the underlying institution supports it.

Yodlee. The original US aggregator (now a Envestnet subsidiary), with the deepest institutional coverage and the longest history with wealth-management buyers. Output is richer than Plaid's at the wealth end (some lot-level basis, more granular account-type taxonomy, better support for alternative-investment fields), at the cost of higher integration complexity. Yodlee's schema predates FDX and remains its own thing.

Akoya. A bank-consortium-backed aggregator (Fidelity, US Bank, Wells, others) that emerged from the post-2018 push for token-based access without screen scraping. Akoya is FDX-native and routes data through institutions' own published APIs; the result is high-fidelity data on a smaller institution footprint than Plaid or Yodlee. The wealth-management piece is recent and is the area most actively expanding.

MX. Originally a credit-union-focused aggregator; broader investment-data product launched more recently. MX's distinguishing claim is data-quality enrichment (categorization, transaction labeling, normalization) layered on top of the raw aggregator feed. Investment data is FDX-aligned; the depth varies by institution.

	Field / capability	Plaid	Yodlee	Akoya
FDX conformance	Mostly	Hybrid	Native	Aligned
Position-level data	Yes	Yes	Yes	Yes
Lot-level cost basis	Inconsistent	Often	Institution-dependent	Inconsistent
Pending transactions	Yes	Yes	Institution-dependent	Yes
Corporate-action history	Limited	Yes	Native via institution API	Limited
Tokenized access (no screen-scraping)	Mostly migrated	Mixed	Native	Native
Account-type taxonomy depth	~30 types	~80 types	FDX-defined	~50 types
Distribution-character (qualified vs RoC)	Generally absent	Sometimes present	Institution-dependent	Generally absent

The four traps your test data has to surface

1. Account-type taxonomy is not portable

Plaid's account-type enum has roughly 30 values; Yodlee's has roughly 80; FDX defines its own. A "Roth IRA" is 'roth' in Plaid, 'ROTH_IRA' in Yodlee, 'IRA_ROTH' in some FDX contexts, and a separate field-pair (account_type='IRA', subtype='ROTH') in Akoya. A wealth-tech platform that only tested against Plaid's enum and then added Yodlee will discover all of these mappings the hard way — and worse, will discover the mappings the underlying institutions sometimes get wrong (a Fidelity Inherited Roth IRA reported as a regular Roth IRA, a Schwab self-directed IRA reported as a generic IRA).

Realistic synthetic data has to include households with multiple aggregators feeding the same household — and the test corpus has to include the same logical account reported under different aggregator taxonomies, so the platform's normalization layer can be exercised against the actual mapping problem.

2. Lot-level basis is the missing layer

The single largest fidelity gap between aggregator output and custodian-direct data is lot-level cost basis. All four aggregators provide position-level data (you own 137 shares of VTI). None of them consistently provide lot-level basis (you own 50 shares acquired 2022-04-12 at $215, 75 shares acquired 2023-09-05 at $238, 12 shares from DRIP at varying dates and prices).

For any tax-aware feature — TLH, gain-loss harvesting, Roth conversion modeling, charitable gifting of appreciated shares — lot-level data is not optional. Aggregator-only platforms ship with a known correctness gap on these features, and the gap manifests as the platform showing aggregate basis that disagrees with the custodian's tax-lot statement at year-end.

The synthetic-data implication: realistic test data has to include both the aggregator-shape view (position-level only) and the custodian-shape view (lot-level), so the platform's tax-aware code paths can be exercised against the data shape they'll actually see in production.

// Plaid investments holdings response (simplified)
{
  "account_id": "P_abc123",
  "security_id": "S_VTI",
  "quantity": 137,
  "institution_price": 245.18,
  "cost_basis": 30821.76,           // aggregate, not lot-level
  "iso_currency_code": "USD"
}

// Yodlee holdings response (simplified)
{
  "id": 11838291,
  "accountId": 12001838,
  "securityId": 8211,
  "symbol": "VTI",
  "quantity": 137,
  "value": { "amount": 33589.66, "currency": "USD" },
  "costBasis": { "amount": 30821.76, "currency": "USD" },
  "lotInfo": [    // present for some institutions, absent for many
    { "quantity": 50, "acquisitionDate": "2022-04-12", "costBasis": 10750.00 },
    { "quantity": 75, "acquisitionDate": "2023-09-05", "costBasis": 17850.00 },
    { "quantity": 12, "acquisitionDate": "2024-12-15", "costBasis": 2898.00 }
  ]
}

// FDX investments accountHoldings (simplified)
{
  "holdingId": "h-7821",
  "securityId": "VTI",
  "units": 137,
  "marketValue": 33589.66,
  "costBasis": 30821.76,
  "lots": [
    /* FDX 5.x defines a Lot resource; institution support varies */
  ]
}

3. Distribution character is institution-dependent

Aggregators normalize transaction streams to their own categorization (income.dividends, income.interest, transfer.deposit). The downstream loss is distribution character — the aggregator typically reports a $112.50 distribution on O (Realty Income) as income.dividends, even when the underlying 1099-DIV will classify it as ~50% qualified, 30% return-of-capital, 15% ordinary income, 5% Section 199A.

For tax-engine testing, this is a real problem. A platform that ingests aggregator data for tax preparation features will produce wrong 1040 numbers unless it either (a) overrides aggregator classifications with year-end 1099-DIV data, or (b) handles RoC and 199A as a separate post-processing layer. Test data has to include distributions where the aggregator's income.dividends label hides a multi-character underlying breakdown — and the platform has to be tested for handling the year-end reclassification correctly.

4. Pending vs. settled transactions

Each aggregator handles pending transactions differently. Plaid surfaces a pending: true flag and re-emits the transaction with pending: false when it settles (with a different transaction_id). Yodlee uses a separate "running" balance vs. "available" balance pair. Akoya inherits the institution's own pending semantics. MX has its own conventions.

The test-data trap: most mock-data tools generate only settled transactions. Real production streams have 5–15% pending at any given moment, those pending transactions often have estimated amounts that change on settlement, and the same logical transaction can have different identifiers across pending and settled states. A platform tested only against settled-only mock data will have idempotency bugs on the pending-to-settled transition.

What FDX changes (and what it doesn't)

The Financial Data Exchange (FDX) standard, now at version 6.x, is the industry's attempt to converge aggregator and institution data exchange on a single open spec. FDX defines an OAuth-based tokenized access model, a JSON resource taxonomy (Account, Holding, Transaction, Customer, Statement, etc.), and a conformance certification.

What FDX changes: the access model. Tokenized access via FDX-conformant institutions eliminates screen-scraping and replaces credential storage with refresh-token-based authorization. This is a security improvement, a reliability improvement (institutions can issue/revoke tokens without breaking aggregator integration), and a compliance improvement (auditable access trails).

What FDX doesn't change: the schema-fidelity floor. FDX defines a Holding resource with a costBasis field; it doesn't mandate that the institution populate lot-level information. FDX defines a Transaction resource with categorical taxonomy; it doesn't mandate that the institution populate per-distribution character. Two FDX-conformant institutions can return data of dramatically different fidelity for the same underlying account.

Realistic synthetic test data has to include both FDX-conformant and non-FDX paths, and the FDX-conformant data has to span the conformance levels (Phase 1 baseline through Phase 4 advanced). Platforms that test only against FDX Phase 4 data will under-test the migration scenarios where most production integrations live.

A working test-data shape

Pulling the four traps together: the realistic test corpus for an aggregator-integrated platform has to include, per household:

	Layer	Required test cases
Aggregator view	At least one household with each of: Plaid-shape, Yodlee-shape, Akoya/FDX-shape, MX-shape
Account-type mapping	Same logical account (e.g. Roth IRA) reported under each aggregator's taxonomy; including a deliberately-wrong one to test mapping failure handling
Lot-level depth	Households with lot-level data present (some aggregators, some institutions) and households without (others); platform's tax-aware code path tested under both
Pending transactions	At least 5% of transactions in pending state at snapshot time, with the corresponding settlement transactions present in subsequent snapshots
Distribution character	Households holding REIT/MLP/BDC where the aggregator's `income.dividends` classification hides a multi-character year-end breakdown
FDX conformance levels	Households spanning Phase 1 through Phase 4 conformance to test migration scenarios
Token-state edge cases	Token expiry, token revocation, institution-side credential reset; the platform's re-auth flow exercised

How this shows up in our catalog

The institutional bundles in the catalog ship with an aggregator_view overlay per household for each major aggregator's shape — the same canonical household JSON projected through Plaid-, Yodlee-, Akoya-, and MX-style schemas. The lot-level data is in the canonical view; the aggregator views project (or hide) lot data per the actual institution-by-aggregator availability matrix. Test corpora can request specific aggregator-view combinations as part of the bundle configuration.

For the deeper view of how aggregator data interacts with custodian-direct feeds in a single household, see Reconciling aggregator output with custodian source-of-truth. For the procurement-side framing, see Aggregator API vs. direct custodian feed. For the umbrella view, see Aggregator & Custodian Integration.