Detecting unrealistic patterns in synthetic time-series wealth data

WealthSchema StaffPipeline architectureMay 9, 20266 min read

This article is the buyer-side counterpart to the rest of the Time-Series Fidelity theme. The other pieces describe what good synthetic time-series wealth data looks like in production. This one is the diagnostic checklist: twelve specific patterns that show up in synthetic data when the producer skipped one of the realism dimensions, paired with a single SQL- or pandas-style query you can run against any synthetic corpus to detect each one.

The framing is procurement and QA. If you're evaluating a synthetic-data vendor, or auditing a corpus you already bought, or stress-testing in-house mock data your team built, run these twelve checks before you commit to a backtest. Each tell on its own is recoverable; a corpus that fails five or more is not fit for the use cases time-series fidelity matters in.

The twelve tells

The order is rough — early tells are easier to detect, later tells require more domain knowledge to interpret.

	#	Tell
1	Cost basis is suspiciously round	Lots were generated without a price-history-aware purchase simulator
2	Cash balances never go negative or near-zero	No overdraft / margin-call code path; bill-pay scenarios untested
3	No failed trades or settlement breaks	T+2 settlement edge cases and corporate-action-during-settlement traps unmodeled
4	Returns are too smooth (kurtosis ≈ 3)	Returns generation is IID-normal, not regime-aware
5	Equity-bond correlation is constant across the panel	Cross-asset correlation is single-regime; crisis diversification overstated
6	No corporate actions in the longitudinal window	Engine cannot be tested for split / merger / spinoff handling
7	Same-ticker delisted-issuer absent from history	Survivorship bias is structural, not just statistical
8	Distribution character is uniform (all qualified)	REIT / MLP / BDC tax handling untested; RoC-to-zero edge unreachable
9	Wash-sale window never triggers across accounts	Cross-account taxpayer-wide wash-sale logic unexercised
10	Account balances reconcile to the penny across all months	Missing rounding errors, missing dividend reinvestment fractional drift
11	Sector classifications never change for a holding	Reclassifications, spinoffs, M&A reclassifications all skipped
12	Beneficiary / household composition is static across 96 months	Life-event triggers (marriage, divorce, death, birth) absent from the longitudinal logic

1. Cost basis is suspiciously round

The check: histogram the per-share cost basis across the corpus. Real cost-basis distributions have a long tail of awkward values — $148.7245 from a DRIP fractional purchase, $73.92 from a market-on-close fill, $1142.50 from a private-fund initial subscription. Synthetic corpora generated without a price-history-aware purchase simulator typically produce basis values clustered at integer multiples of $5 or $10.

SELECT
  ROUND(cost_basis_per_share, 2) AS basis,
  COUNT(*) AS lots
FROM lots
GROUP BY ROUND(cost_basis_per_share, 2)
ORDER BY lots DESC
LIMIT 50;

If the top 20 values account for >25% of the corpus's lots, you're looking at synthesis without price-history grounding.

2. Cash balances never go negative or near-zero

The check: minimum cash balance across the longitudinal window per account. Real households routinely have months where the checking-account balance dips below $500, occasionally goes negative pending a deposit, and very occasionally triggers an overdraft fee. A synthetic corpus where the minimum cash balance for every account is comfortably above $1,000 has not modeled the within-month cash-flow dynamics that the within-year cash-flow seasonality article covers.

SELECT
  account_id,
  MIN(cash_balance) AS min_cash,
  AVG(cash_balance) AS avg_cash
FROM monthly_snapshots
WHERE account_type = 'checking'
GROUP BY account_id
HAVING MIN(cash_balance) > 1000;

If 95%+ of accounts pass that filter (i.e. always have >$1,000 in cash), the cash-flow model is not exercising the bill-pay / overdraft / margin-call code paths.

3. No failed trades or settlement breaks

Real brokerage data has failed trades — symbol mismatches, insufficient buying power, regulatory restrictions (Reg T violations on margin), corporate-action-during-settlement issues where a buy on the day before a record date cannot deliver dividend rights. Synthetic data that produces 100% successful trades has skipped a class of compliance and reporting edge cases.

The check is a join of the trade ledger against settlement events: every trade should produce one settlement record exactly two business days later, except for failed-trade events. A trade table that produces a settlement record for every trade has unrealistically clean execution.

4. Returns are too smooth (kurtosis ≈ 3)

The most diagnostic single number for a synthetic returns series. Real monthly equity returns have kurtosis 8–10. Daily returns have kurtosis 15+. A synthetic series with kurtosis close to 3 is sampling from a normal distribution; it has no fat tails; any backtest that depends on tail behavior is going to underestimate risk substantially.

import scipy.stats as stats
returns = synthetic_data['monthly_return']
print(f"Kurtosis: {stats.kurtosis(returns) + 3}")  # +3 for non-excess kurtosis
print(f"Skewness: {stats.skew(returns)}")
print(f"Pct |z| > 3: {(abs(stats.zscore(returns)) > 3).mean():.2%}")

Expected for real US equity monthly returns: kurtosis 8–10, skew negative (–0.4 to –0.7), |z| > 3 frequency around 1.5%. If your synthetic corpus shows kurtosis 2.5–3.5 and |z| > 3 at 0.27% (the normal-distribution rate), you've found the problem.

For the methodology behind getting these numbers right, see Generating synthetic historical returns.

5. Constant cross-asset correlation

The check: rolling correlation between equity and bond returns over time. Real equity-bond correlation flips between regimes (–0.3 in expansions, +0.5 in crises like 2008 and the 2022 inflation surge). Synthetic data with a constant correlation matrix produces a flat rolling correlation; you'll see this immediately in a 12-month-rolling correlation chart.

If the rolling correlation is approximately constant within ±0.05 over the entire 96-month window, the dataset is using a single covariance matrix and overstating diversification benefits in tail regimes.

6. No corporate actions in the longitudinal window

The check: count of corporate-action events per holding-month. A typical ETF position should see 3–8 events per year (quarterly distributions plus occasional special events). A typical individual-equity position should see 1–4 per year. A corpus with zero corporate-action events on most holdings is missing the corporate-actions schema entirely.

The auxiliary check: do positions in the corpus include any of the well-known corporate-action events of the past 5 years? AT&T's 2022 WBD spinoff, Apple's 2020 4-for-1 split, NVIDIA's 2024 10-for-1 split, the various COVID-era special distributions. If a household holds the relevant tickers but the corporate-action history doesn't reflect the events, the data has been generated without issuer-anchored event sequences.

7. Survivorship bias is structural

The check: any tickers in the corpus's longitudinal history that no longer trade. Real backtest data over a 5+ year window includes Lehman, Bear Stearns, GE Capital paper, Toys R Us bonds — issuers who failed, went private, or were acquired during the window. A synthetic corpus where every ticker in the history is still tradable today has scrubbed survivorship — and any backtest run against it will overstate returns by 1–2% per year for equities, more for fixed income.

For the deeper context on why survivorship-aware data is hard to produce and easy to skip, see the Survivorship Bias glossary entry.

8. Distribution character is uniform

Real fund distributions have mixed character: a single quarterly REIT distribution might be 50% qualified dividend, 30% return-of-capital, 15% ordinary income, 5% Section 199A. Synthetic data that classifies all distributions as "qualified" simplifies the test problem in a way that makes the tax-engine's RoC-handling and 199A-tracking code paths unreachable.

SELECT
  ca_type,
  COUNT(*) AS n,
  AVG(qualified_pct) AS avg_qual,
  AVG(roc_pct) AS avg_roc
FROM corporate_actions
WHERE ca_type = 'distribution'
GROUP BY ca_type;

If avg_qual is 0.95+ and avg_roc is 0, the distribution-character generator is uniform.

9. Wash-sale window never triggers across accounts

A taxpayer-wide wash sale (a loss in a taxable account followed by a purchase in the same household's IRA within 30 days) is the most subtle and most-often-skipped wash-sale rule. The check: scan loss-realizing trades and look for any same-ticker purchase in any other account in the same household within ±30 days.

WITH losses AS (
  SELECT account_id, household_id, symbol, trade_date, lot_id, gain
  FROM realized_gains WHERE gain < 0
)
SELECT COUNT(*) AS cross_account_wash_count
FROM losses l
JOIN trades t ON
  t.symbol = l.symbol
  AND t.account_id != l.account_id
  AND t.household_id = l.household_id
  AND t.trade_date BETWEEN l.trade_date - INTERVAL '30 days' AND l.trade_date + INTERVAL '30 days';

A corpus designed to exercise wash-sale logic should have hundreds to thousands of such cross-account events. Zero or near-zero is a warning.

10. Penny-perfect reconciliation across all months

Counterintuitively, a corpus where every account balance reconciles to the penny across every monthly snapshot is suspect. Real data has small reconciliation breaks — DRIP fractional rounding, dividend timing differences between settlement and snapshot, currency rounding on FX-translated holdings. A corpus that produces zero reconciliation breaks has either rounded everything aggressively (which loses information) or has skipped the fractional-share / FX edge cases entirely.

11. Sector classifications never change

Sector taxonomies (GICS, NAICS) are revised every few years; specific issuers get reclassified more often. NVIDIA was reclassified from "Information Technology" to remain in IT but with sub-industry changes during the 2018 GICS revision. Berkshire Hathaway's classification has shifted multiple times. A synthetic corpus where every holding has the same sector across all 96 monthly snapshots has not modeled the reclassification events that performance attribution engines have to handle.

12. Static beneficiary and household composition

A 96-month longitudinal window for a household with members in their 40s and 50s should occasionally produce life events: a child reaching majority, a parent's death triggering inheritance, a marriage or divorce changing beneficiary designations. A corpus where every household's composition is identical at month 1 and month 96 has skipped life-event modeling entirely — which makes any test of beneficiary-update workflows, inheritance handling, or QDRO processing unreachable.

How to use this checklist

We use this same diagnostic suite as part of the validation gate that runs at corpus close-out for our own catalog. The 1,451-household v4 corpus passes on all twelve. If you're evaluating a vendor (us included), ask the vendor to run their corpus through these specific queries and share the numbers. A vendor that can't answer "what's the kurtosis of monthly returns in your dataset?" is not the vendor whose data you want to backtest against.

For the deeper methodology view, see the Time-Series Fidelity theme and the four articles it gathers: corporate actions, returns generation, performance attribution, and the synthetic-vs-replay comparison. For the procurement framing, see also Synthetic data procurement & vendor evaluation and the Synthetic data quality rubric.