Aggregator & Custodian Integration — Test Data That Survives the Round Trip

Updated May 9, 20264 min read

The single largest hidden surface in any wealth-tech platform is the integration layer. The product UI shows positions, balances, and performance. The integration layer is what populates them — pulling from one or more aggregators (Plaid, Yodlee, Akoya, MX), one or more direct custodian feeds (Schwab, Fidelity, Pershing, BNY Mellon), or both, then reconciling the streams into a single household record. Every line of integration code is a place where realistic test data can find a bug that mock data hides.

This theme covers what good integration test data looks like: the aggregator-output shapes, the custodian-direct quirks, the ACATS transfer flow, and the reconciliation contract that the two streams have to satisfy when both feed the same account.

US households w/ aggregator link

60%+

Plaid claims ~12,000 institutions and 1 in 4 US adults; Yodlee/Akoya/MX cover the long tail

ACATS transfers / yr

~9.5M

DTCC reported volume; partial transfers are the majority and the harder test case

Reconciliation breaks / 1k accounts

5–20

Typical aggregator-vs-custodian discrepancy rate on actively-traded brokerage accounts

FDX-conformant institutions

~95%

By account count — but the spec has optional fields and conformance levels that aren't always met

Why integration is its own data problem

A naive view: an aggregator returns positions and balances; the platform stores them; that's the data layer. The view falls apart on the second case. Aggregators normalize across institutions, but the normalization is lossy — Plaid's investments product has a different schema from Yodlee's wealth product, both lose information present in the underlying custodian feed, and neither captures the lot-level data needed for any tax-aware feature. Direct custodian feeds (FIX, FpML, OFX, NACHA, custodian-specific REST/SFTP) are richer but heterogeneous; reconciling Pershing's daily file with Schwab's intraday API requires hand-written translation logic per source.

The integration layer is where the platform decides how much of this complexity to surface to its own engine. Most platforms decide too late, then have to refactor when the first major customer asks for cross-custodian tax-loss harvesting or the first audit reveals that aggregator-fed positions don't reconcile with custodian-direct feeds for the same account.

Realistic synthetic test data is how you find this before production. The four pieces below cover the major dimensions.

	Failure class	What breaks
Aggregator-output drift	Schema differences across Plaid/Yodlee/Akoya/MX surface as silent type mismatches; account-type taxonomy is not portable	Onboarding flows, position aggregation, multi-aggregator deployments
ACATS edge cases	Partial transfers, in-kind lots, fractional shares, transfer-during-corporate-action all produce inconsistent state during the 5-7 business day settlement window	New-account funding, account migration, broker-to-broker transfers
Custodian-specific quirks	Account-number formats, statement frequency, lot-relief defaults, intra-month vs end-of-month snapshots all differ; mock data treats them as uniform	Multi-custodian platforms, RIA aggregation, family-office consolidation
Reconciliation breaks	Aggregator and custodian disagree on the same account; duplicate-account detection fails; the institution_id taxonomy fragments	Any platform that uses both aggregator and direct feeds, which is most institutional ones

The four pieces under this theme

Modeling aggregator outputs

Modeling Plaid, Yodlee, Akoya, and MX outputs in synthetic households is the schema-level walkthrough of the four major aggregators' output shapes, what they share, where they diverge, and what your mock data has to look like to exercise an aggregator-driven onboarding flow honestly. Includes the FDX-conformance question and what changes when you migrate from a non-FDX aggregator to an FDX one.

ACATS modeling

ACATS modeling: partial transfers, in-kind lots, settlement-window traps is the deep dive on the Automated Customer Account Transfer Service flow. Full vs. partial transfers, how in-kind lots preserve their cost basis and acquisition date through the transfer, what happens when a corporate action lands during the settlement window, and the rejection codes your engine has to handle.

Custodian quirks

Custodian-specific data quirks: Schwab, Fidelity, Pershing, BNY Mellon covers the four custodians most wealth-tech platforms have to integrate with directly. Account-number formats, lot-relief defaults, statement-cycle gotchas, the post-Schwab/TDA reconciliation problem, and what realistic test data has to model per custodian.

Reconciliation

Reconciling aggregator output with custodian source-of-truth is the cross-stream piece. When both an aggregator and a direct custodian feed populate the same account, the two will disagree on edge cases — fractional-share rounding, intraday vs. end-of-day balances, distribution-character classification. The article covers the reconciliation contract and the test-data requirements for exercising it.

The methodology comparison

Aggregator API vs. direct custodian feed is the procurement-side comparison: when each approach belongs in a wealth-tech stack, where the cost-and-fidelity tradeoff lands for different use cases, and the hybrid pattern most institutional platforms converge on.

Supporting glossary terms

ACATS — the DTCC-operated transfer service that moves brokerage accounts between firms in 5–7 business days.
DTC — the Depository Trust Company, the central securities depository that holds the actual share certificates and processes corporate actions on aggregate.
ACH — the Automated Clearing House network that moves cash between bank accounts and is the backbone of most non-wire money movement.
FDX — the Financial Data Exchange, the open standard that aggregators and institutions use to exchange consumer financial data via tokenized access.
Aggregator API — the API category covering Plaid, Yodlee, Akoya, MX, and others — third-party services that normalize data across thousands of financial institutions.
Tokenized account access — the OAuth-style authorization model that's replacing screen-scraping for aggregator-to-institution data flow.

Where this connects

Integration testing intersects with several other content threads:

Time-Series Fidelity in Synthetic Wealth Data — the time-series properties that aggregator and custodian feeds carry differently.
Onboarding fintech clients with no PII exposure — the playbook for using synthetic data in pre-production integration testing.
Migrate prod to synthetic — the migration pattern most teams adopt when their integration stack outgrows production-data testing.
Lot-level basis tracking data model — the lot-level fidelity that aggregator outputs typically lack and direct custodian feeds carry.
Modeling corporate actions in synthetic portfolios — the corporate-action handling that interacts with ACATS settlement windows.