wealthschemaresourcesarticlesPerformance attribution test data for reporting platforms
Article

Performance attribution test data for reporting platforms

A reporting platform that ships with single-period Brinson and untested multi-period linking is a reporting platform with a known correctness gap. Synthetic test data is how you find it before the auditor does.

WealthSchema StaffPipeline architectureMay 9, 20265 min read

A wealth-tech reporting platform that produces performance numbers without an attribution decomposition is a platform whose users stop trusting the numbers the first time they're asked "why did we underperform the benchmark?" Performance attribution is how a reporting product earns its place in an institutional workflow — and it is also the part of the product that's hardest to test, because the attribution algebra interacts with corporate actions, regime changes, and multi-period linking in ways that single-snapshot mock data cannot exercise.

This article is the schema view: what a reporting platform's attribution engine consumes, what the realistic test inputs look like, and where the bugs hide. It's intended for engineers and product managers building the kind of reporting that has to certify against GIPS standards or comparable institutional benchmarks.

The three attribution methods you have to test

Most reporting platforms ship with one of three attribution decompositions — sometimes all three. Each decomposes the same total return differently, each consumes a different test-data shape, and each has its own most-common implementation bug.

 MethodDecomposes return intoRequired inputsMost common bug
Brinson (BHB / BF)Allocation effect, selection effect, interactionPortfolio weights & returns by sector, benchmark weights & returns by sectorTreating allocation effect as if benchmark sector return is zero
Factor attributionPer-factor contribution + residualFactor loadings per holding, factor returns, residual returnCross-product term in factor decomposition is dropped
Multi-period linkingPeriod-by-period contributions linked over timeEach period's attribution + a linking algorithm (Carino, Menchero, GRAP)Drift between sum of period contributions and total period return

Brinson — the test that any platform should pass

Brinson-Hood-Beebower (BHB) decomposes the active return — portfolio return minus benchmark return — into allocation, selection, and an interaction term. The sector-level allocation effect captures how the portfolio's sector weights differed from the benchmark's; the selection effect captures how the portfolio's within-sector returns differed from the benchmark's; the interaction is the cross-product.

The standard formula:

Formula
Brinson-Hood-Beebower attribution
A_i = (w_i^p - w_i^b) × r_i^b, S_i = w_i^b × (r_i^p - r_i^b), I_i = (w_i^p - w_i^b) × (r_i^p - r_i^b)
A_i
= allocation effect for sector i
S_i
= selection effect for sector i
I_i
= interaction effect for sector i
w_i^p
= portfolio weight in sector i
w_i^b
= benchmark weight in sector i
r_i^p
= portfolio return in sector i
r_i^b
= benchmark return in sector i
Example
Portfolio is 25% tech, 15% energy. Benchmark is 20% tech, 20% energy. Tech returns: portfolio +12%, benchmark +10%. Energy returns: portfolio -5%, benchmark -8%. Allocation effect (tech): (0.25 − 0.20) × 0.10 = +50bps. Selection effect (tech): 0.20 × (0.12 − 0.10) = +40bps. Interaction (tech): (0.05) × (0.02) = +10bps. Total tech effect: +100bps. Same algebra applies sector by sector; sum yields total active return.

The data your test corpus needs is a portfolio-vs-benchmark divergence at the sector level — meaning your synthetic households need positions classified into sectors, the sectors need to track a real-world sector taxonomy (GICS, NAICS, ICB), and the test corpus needs households whose sector weights deliberately differ from any plausible benchmark. Synthetic households where every household holds VTI 100% are useless for attribution testing because the BHB decomposition is degenerate.

Factor attribution — the schema gets richer

Factor attribution decomposes return into the contributions of named factors (size, value, momentum, quality, low-vol) plus a residual idiosyncratic component. The output is the equity analyst's tool: "why did the portfolio outperform" decomposes into "tilted toward value, against momentum, with positive selection within both."

The synthetic-data requirement for factor attribution is materially heavier than for Brinson. Each holding needs factor loadings (typically 5–10 factors), and the factor returns over the period need to be available. Factor loadings change over time as company fundamentals change; mock data that assigns a single static factor exposure per ticker can't exercise the time-varying-loading code path that real factor engines depend on.

// Holding-level factor exposure (one snapshot)
{
  "holding_id": "H-VTI-2025-09-30",
  "symbol": "VTI",
  "snapshot_date": "2025-09-30",
  "factor_exposures": {
    "size": -0.42,           // negative = large-cap
    "value": -0.05,          // slight growth tilt
    "momentum": 0.12,        // positive momentum
    "quality": 0.18,         // high-quality
    "low_volatility": -0.08, // slight high-vol
    "yield": 0.05            // dividend yield positive
  },
  "factor_model_version": "MSCI-USE5-2024.04",
  "residual_volatility_pct": 0.5
}

The factor model version field is not optional. Factor attribution outputs are model-dependent — the same portfolio analyzed under MSCI USE5 and Axioma WW21 will produce different per-factor contributions because the factor definitions differ. Test data has to record which factor model produced the loadings, and the reporting platform's attribution code path has to handle different factor models. A test corpus that uses only one factor model can't exercise the model-switching code, which is a source of subtle bugs in multi-currency portfolios where different regions use different factor models.

Multi-period linking — where most platforms fail

Single-period attribution is straightforward. Multi-period attribution — linking month-by-month attribution to produce a year-to-date or since-inception attribution — is the hard part, and the part where most reporting platforms have known correctness gaps.

The naïve approach is to sum up monthly attribution numbers. That's almost always wrong, because total period return is not the sum of monthly returns (it's the geometric link), and the sum of monthly attribution effects therefore doesn't equal the total period active return. The math discrepancy is small over short windows and grows over longer ones; the discrepancy is what makes the attribution numbers stop reconciling with the headline performance number that the same platform reports elsewhere.

The three most-common linking algorithms — Carino smoothing, Menchero smoothing, and GRAP (Geometric Reconciliation of Arithmetic Performance) — each redistribute the linking residual differently. GIPS-compliant platforms typically certify against one of them and have to be tested against test cases where the residual is meaningful (multiple periods of >2% active return per period are sufficient to make the linking residual visible).

What a realistic test corpus needs

Pulling the requirements together, here's the minimum viable synthetic-data shape for an attribution-testing engagement:

 RequirementWhat it exercises
Sector-classified holdings (GICS or equivalent)Brinson allocation/selection algebra; sector-reclassification handling
Sector divergence vs. benchmarkThe Brinson decomposition itself — equal-weighted VTI portfolios are degenerate
Factor loadings per holding per snapshotTime-varying-exposure code path; same-ticker loading drift over time
Factor model version labelMulti-model / multi-region attribution code path
Benchmark return series at sector levelThe 'compared to what' side; total-return (dividend-reinvested) benchmarks required
Multi-period test cases (12+ periods, 2%+ active each)Linking-residual code (Carino / Menchero / GRAP)
Corporate-action-aware holdingsSector and factor transitions across spinoffs and mergers
Currency layer for international holdingsCurrency attribution decomposition; FX-consistency check

Currency attribution — the often-skipped layer

Any reporting platform that supports international holdings has to handle currency attribution. The portfolio's USD return decomposes into a local-currency return component, a currency-translation component, and an interaction term. The same allocation/selection algebra applies, but with a currency overlay that has its own factor exposures and its own benchmark.

Synthetic data for currency attribution testing needs both the local-currency price history and the FX rate history for every non-USD holding. The two have to be consistent — if the EUR/USD rate appreciates 5% over a month and a Eurostoxx position gains 3% in EUR, the USD return is approximately 8% (and exactly: (1.05)(1.03) - 1 = 8.15%). Mock-data tools that generate independent USD returns and EUR returns can't exercise the FX consistency check, which is where the next class of bugs lives.

// International holding return decomposition
{
  "holding_id": "H-EWG-2025-09-30",
  "symbol": "EWG",
  "period": "2025-09-01 to 2025-09-30",
  "local_currency_return": 0.0312,      // EUR return
  "fx_translation_return": 0.0148,      // EUR/USD appreciation
  "interaction_return": 0.0005,         // (1.0312)(1.0148) - 1 - 0.0312 - 0.0148
  "base_currency_return": 0.0465,       // USD return
  "benchmark_local_return": 0.0289,
  "benchmark_fx_return": 0.0148,
  "benchmark_base_return": 0.0441
}

How this shows up in the WealthSynth corpus

The institutional and high-net-worth bundles in our synthetic data catalog include factor loadings on every holding, sector classifications updated for any reclassifications during the longitudinal window, and a benchmark series (Russell 3000 for equities, Bloomberg US Agg for fixed income; international benchmarks for international holdings). The test households are deliberately constructed with sector and factor tilts away from the benchmarks so that the resulting attribution decompositions exercise the full arithmetic — including the cases where the linking residual is large enough to distinguish Carino, Menchero, and GRAP.

For more on the underlying time-series-fidelity properties any reporting platform's tests depend on, see the Time-Series Fidelity theme, and especially Modeling corporate actions in synthetic portfolios (which covers the sector-reclassification and merger cases that attribution engines have to handle correctly). For the buyer-side QA view, see Detecting unrealistic patterns in synthetic time-series wealth data.