A wealth-tech platform that uses both aggregator data (Plaid/Yodlee/Akoya/MX) and direct custodian feeds (Schwab/Fidelity/Pershing/BNY) for the same household — which is the typical case at institutional scale — has to handle a fundamental data-quality question: which feed is the source of truth, and what happens when the two disagree?
The two will disagree. Not occasionally — routinely. The disagreements are mostly small, mostly recoverable, and mostly invisible until they manifest as a reconciliation break in a year-end report or a wrong number in a tax form. This article is the reconciliation contract: where the disagreements come from, how to design the platform to handle them, and what synthetic test data has to look like to exercise the reconciliation logic.
Why the two feeds disagree
The disagreement isn't a bug in either feed. It's structural — the two are answering slightly different questions:
- Aggregator data is normalized. Plaid receives raw data from the institution and projects it through Plaid's normalized schema. The projection is lossy by design — it has to span 12,000+ institutions with very different underlying systems. Some fidelity is sacrificed for portability.
- Custodian-direct data is institution-shaped. A direct feed from Schwab carries Schwab's exact data shape, with Schwab's conventions for timing, rounding, and classification. Higher fidelity, but only for that one institution.
- The two feeds run on different cadences. Aggregators typically refresh hourly to daily; custodian-direct feeds can be intraday, end-of-day, or batch. The same account at the same wall-clock moment can show different values from each feed depending on when each last refreshed.
- The two feeds can have different reconciliation states. An aggregator may surface a pending transaction that the custodian feed has already settled, or vice versa. The settlement-state convention differs across feeds.
| Source of disagreement | Typical magnitude | Frequency | |
|---|---|---|---|
| Refresh-cycle lag | Hours to a day | Always present, especially mid-day | |
| Pending vs. settled state | Per-transaction | 5-15% of transactions on active accounts | |
| Fractional-share rounding | Sub-cent to a few cents per holding | On every fractional-share position | |
| Distribution character classification | Categorical (qualified vs RoC) | On every multi-character distribution | |
| Account-type taxonomy mismatch | Categorical | On accounts the aggregator misclassifies (~1-3% of IRA/Roth distinctions) | |
| Corporate-action timing | Days | Around every ex-date for affected holdings | |
| Cost-basis depth | Lot-level vs aggregate | On every position with multiple lots |
The reconciliation contract
The platform has to make a few core decisions to handle dual-source data correctly:
1. Source-of-truth assignment per field
Different fields have different right-source-of-truth. The contract for a typical institutional platform:
| Field | Aggregator wins? | Custodian wins? | Rationale | |
|---|---|---|---|---|
| Account existence | Yes (broad) | — | Aggregator surfaces accounts the user authorized; custodian is single-institution | |
| Position quantities | — | Yes | Custodian is canonical; aggregator typically derives from custodian feed anyway | |
| Lot-level basis | — | Yes (when present) | Aggregator typically lacks lot data; custodian carries CBRS-format lots | |
| Pending transactions | Yes | — | Aggregator surfaces pending faster; custodian publishes after settlement | |
| Settled transactions | — | Yes | Custodian's settlement is canonical; aggregator inherits | |
| Distribution character (year-end) | — | Yes (1099-DIV) | Year-end 1099 is the only correct classification source | |
| Account-type / registration | — | Yes | Custodian holds the legal account-type record | |
| Real-time intraday balances | Maybe | Maybe | Depends on which feed is fresher; usually a max(timestamp) decision |
The contract is best implemented as a per-field source-of-truth registry rather than a per-feed precedence rule. Some fields prefer aggregator data; some prefer custodian data; some are time-conditional.
2. Reconciliation cadence
The platform has to run a reconciliation pass on a regular cadence — typically nightly batch, with intraday spot-checks for high-stakes flows (trade execution, withdrawal initiation). The pass compares the two feeds field-by-field, applies the source-of-truth rules, and surfaces unresolved disagreements as exceptions.
Realistic exception rates per 1,000 active accounts: 5–20 exceptions per nightly run on routine fields, dropping to single-digit exceptions after a stable steady state. New accounts in their first 90 days run higher exception rates because aggregator linking is most fragile in the early days.
3. The duplicate-account problem
A specific reconciliation failure mode that deserves its own treatment: the same logical account is sometimes surfaced by an aggregator under a different identifier than the custodian feed uses. The customer's Schwab account #82145678 might be linked through Plaid as Plaid-account-id P_xyz789 with a Plaid-internal account number 82145678, but the custodian-direct feed reports it with Schwab's institutional ID SCH-INST-82145678. The platform's account-matching logic has to recognize these as the same account or it will produce a household with double the actual position count.
The matching key can't be the account number alone — different institutions use overlapping number-spaces. The reliable matching keys are (institution_id, account_number) tuple, with fallback to (institution_id, last_4_account, registration_match, position_overlap). Test data has to include at least some same-account-different-source cases so the matching logic gets exercised.
// Same account, two views
// Plaid view
{
"plaid_account_id": "P_abc123xyz",
"institution": { "name": "Charles Schwab", "institution_id": "ins_109511" },
"mask": "5678",
"official_name": "INDIVIDUAL Account 82145678",
"type": "investment",
"subtype": "brokerage"
}
// Schwab direct view
{
"schwab_account_number": "82145678",
"account_type": "INDIVIDUAL_TAXABLE",
"registration": "JOHN Q SMITH"
}
// Platform's reconciled record
{
"platform_account_id": "ACC-2025-9821",
"institution_id": "schwab",
"account_number": "82145678",
"match_keys": [
{ "source": "plaid", "key": "P_abc123xyz" },
{ "source": "schwab_direct", "key": "82145678" }
],
"primary_data_source": "schwab_direct",
"secondary_data_source": "plaid"
}
What synthetic test data has to look like
The minimum-viable corpus for testing dual-source reconciliation:
| Test scenario | What it exercises | |
|---|---|---|
| Same account from both sources | Account-matching logic; per-field source-of-truth resolution | |
| Aggregator-only account | Coverage of accounts the platform doesn't have direct custodian feed for | |
| Custodian-only account | Coverage of accounts the user hasn't linked via aggregator | |
| Disagreement on pending transactions | Pending-vs-settled reconciliation; aggregator-surfaced pending that custodian shows as already-settled | |
| Disagreement on fractional shares | Sub-cent rounding logic; tolerance thresholds for 'agreement' | |
| Distribution character override | Year-end 1099-DIV reclassification overriding aggregator's preliminary classification | |
| Account-type mismatch | Aggregator misclassifies a Roth IRA as a Traditional IRA; custodian is correct; platform's mismatch-resolution logic | |
| Aggregator-link broken | Plaid token expired, Yodlee re-auth required; platform falls back to custodian-direct only and resumes when aggregator returns |
A test corpus that includes only single-source households cannot exercise the reconciliation code paths. Realistic dual-source data — same household, same account, two views — is the test shape that matters.
The year-end reconciliation cycle
The single highest-stakes reconciliation moment is the year-end 1099-DIV / 1099-B reconciliation. Aggregator feeds may have classified distributions, basis, and gains throughout the year based on preliminary information. The year-end 1099 is the canonical correction — and the gap between aggregator-classified data and 1099-corrected data is sometimes substantial.
Common year-end corrections:
- Distribution character reclassification. A REIT distribution classified as "qualified dividend" through the year is reclassified at year-end to ~50% qualified, ~30% return-of-capital, ~15% ordinary, ~5% Section 199A. The 1099-DIV is the only correct source.
- Wash-sale adjustments. Wash sales identified by the custodian's tax-lot software (FIFO-applied across the year) may differ from what the aggregator reported in real-time. The 1099-B is canonical.
- Cost-basis corrections. Non-covered lots may have updated cost basis at year-end (customer-supplied documentation, custodian back-research). The 1099-B carries the corrected basis.
Test data has to include the year-end correction step — both the pre-correction state (what the aggregator showed throughout the year) and the post-correction state (what the 1099 ultimately said). Platforms tested only against post-correction data will fail the in-flight reconciliation logic that produces tax-aware insights mid-year.
How this shows up in our catalog
The institutional bundles in the WealthSynth catalog ship with paired aggregator-and-custodian views per account where both sources are configured. The disagreements are deliberately calibrated: 5–10% of transactions in pending-vs-settled disagreement at any snapshot, sub-cent fractional-share rounding gaps on every fractional position, multi-character distribution classification on REIT/MLP/BDC holdings with year-end 1099-DIV correction events. The matching keys are intentionally non-trivial — test households include at least some accounts where naive account-number matching would fail.
For the broader integration context, see Aggregator & Custodian Integration and the procurement-side Aggregator API vs. direct custodian feed comparison. For the per-source data shapes, see Modeling Plaid, Yodlee, Akoya, and MX outputs and Custodian-specific data quirks. For the time-series-fidelity properties that the dual-source reconciliation touches, see Time-Series Fidelity in Synthetic Wealth Data.