wealthschema/data sets/cash-flow-stress-test-dataset

Cash Flow Stress Test Dataset

Name: Cash Flow Stress Test Dataset
Brand: WealthSchema
SKU: B04
Price: 4000.00 USD
Availability: InStock

Most cash-flow planning tools demo well on a textbook salaried family with stable monthly income and predictable expenses. They fall over the moment they encounter the household types that actually need cash-flow planning — the freelancer with 6× income volatility, the single parent with a quarterly childcare bill, the gig worker whose Q1 is half of Q3. The Cash Flow Stress Test Dataset is 270 synthetic households built specifically for those messy cases, with 96 monthly snapshots each so your engine sees a full eight-year cash-flow trajectory through real shocks: job loss, medical emergency, expense spikes, divorce, distressed mortgages, and recovery.

Households

270

Archetypes

Formats

JSON, CSV, Parquet

Deviation

Minimal

Why this Data Set exists

If you're scoring liquidity adequacy or modeling income shocks, the populations that produce the most useful test signals are also the populations underrepresented in any sanitized prod dataset. Salaried W-2 workers dominate book-of-business data; variable-income earners, distressed borrowers, and post-divorce rebuilders are the long tail. Your model never gets exercised on them.

The second problem is temporal granularity. A point-in-time household snapshot can't stress-test a cash-flow engine — you need months of history with realistic seasonal patterns, irregular bills, and shock events that propagate forward through savings depletion, credit draw-down, and expense compression. Most synthetic data products give you the snapshot and stop there.

This Data Set solves both. The 270-household population is weighted toward variable-income, low-liquidity, and shock-exposed cohorts. Every household carries 96 monthly snapshots — the same longitudinal contract used by the Master Corpus — so your engine sees full trajectories, not just balance-sheet stills.

Use Cases

Cash-flow planning engine testing

Emergency fund adequacy scoring

Income shock scenario modeling

Liquidity & default risk assessment

Who uses this Data Set

Cash-Flow Planning Engine Engineer

Validates that the planning engine's monthly projection logic handles seasonal income, irregular expenses, and shock events without silently smoothing them away. The 96-month trajectories let regression tests catch numerical drift across long horizons.

Emergency Fund Adequacy Modeler

Trains and validates 'months-of-expenses' liquidity scoring against households whose actual liquidity buffer was tested by a real shock event in the historical data, instead of deriving expected behaviour from a steady-state assumption.

Default Risk / Underwriting Data Scientist

Uses pre-labeled income-shock and liquidity-stress events as supervised training data for default-risk models, including the rare combinations (variable income + distressed mortgage + thin emergency fund) that drive most actual defaults.

Fintech PM building a budgeting product

Demos the product's value proposition to investors and prospects using households that look like the target market — gig workers, single parents, freelancers — without using real customer data, eliminating the 'we need to wait for production data' chicken-and-egg.

Compliance Analyst at a Lender

Tests the firm's ability-to-repay assessment process against household types regulators have flagged in fair-lending reviews, ensuring variable-income borrowers aren't systematically scored worse than their actual repayment capacity warrants.

What's inside

Each of the 270 households is drawn from one of ten cash-flow-relevant archetypes spanning gig workers, single parents, distressed mortgages, post-divorce rebuilders, and artists with royalty income. The mix is intentional — 40% variable-income, 25% currently in financial stress (stress flags pre-set), 20% underwater on a major liability, and 15% transitioning out of a recent shock. The blended population is calibrated to surface the cash-flow patterns that drive real liquidity events.

Every household carries the full 96-month longitudinal track that's standard across the corpus: monthly net cash flow, savings rate, account balances, credit utilization, and an event log of shocks (job loss, medical, expense spike, income drop). Stress scenarios are pre-labeled — you can filter for the post-divorce rebuild trajectories, the variable-income recovery curves after gig-platform deactivation, or the medical-debt crisis populations specifically. Income volatility percentage is computed against the household's own 96-month series so it's calibrated, not assumed.

The Data Set ships as JSON (one file per household with embedded longitudinal array, plus a manifest), CSV (long-format with one row per household-month so it's join-friendly with your warehouse), and Parquet (columnar; recommended for analytical queries over the 25,920-row month grid). The Methodology PDF documents the longitudinal generation methodology; the shock-event taxonomy and the field-by-field stress-scenario structure are built into the Data Set.

Preview a sample household

A redacted summary of one household from this Data Set — names, employers, exact balances, and metro area are stripped. Ages are bucketed, income and net worth are reported as bands. The full record (and all 270 like it) ships in the ZIP.

F-02·Gig Economy Starter

representative archetype household

Household

Single

State

Gross income (band)

$50k–$100k

Net worth (band)

—

Dependents

Income source types

self employment, w2 bonus

Members (1)

primary

Age 20–24

professional services

Technical Highlights

→96 monthly snapshots per household

→Pre-built income shock scenarios

→Seasonal pattern realism

→Includes 1099 / variable-income households

Sample Schema Fields

sample_record.json

{
  "longitudinal.monthly[].net_cash_flow": <value>,
  "longitudinal.monthly[].savings_rate": <value>,
  "stress.scenarios[]": <value>,
  "liquidity.months_of_expenses": <value>,
  "cash_flow.income_volatility_pct": <value>
}

Sample queries

Find variable-income households with thin emergency funds

Surfaces households whose income volatility is above 30% AND who have less than two months of expenses in liquid reserves — the canonical 'one bad month from default' profile.

households.filter(h =>
  h.cash_flow.income_volatility_pct > 0.30 &&
  h.liquidity.months_of_expenses < 2
)

Identify post-shock recovery trajectories

Returns households where a labeled shock event (job loss, medical, divorce) occurred in the first 24 months of the longitudinal series, so you can see the recovery curve in months 25–96.

households.filter(h =>
  h.events.life_events.some(e =>
    e.month <= 24 &&
    ['job_loss', 'medical', 'divorce'].includes(e.type))
)

Surface seasonal income patterns

Computes the coefficient of variation of monthly income for each household, then returns the top decile — useful for testing your engine's handling of artists, royalty earners, and seasonal-business owners.

households
  .map(h => ({
    h,
    cv: stddev(h.longitudinal.monthly.map(m => m.income)) /
        mean(h.longitudinal.monthly.map(m => m.income))
  }))
  .filter(({cv}) => cv > 0.4)

Detect savings-depletion → credit-draw transitions

Returns months where liquid savings dropped below one month of expenses AND credit-card balance increased — the inflection point where a household pivots from saving to borrowing.

households.flatMap(h =>
  h.longitudinal.monthly.filter((m, i, arr) =>
    m.savings_balance < m.monthly_expenses &&
    i > 0 &&
    m.credit_card_balance > arr[i-1].credit_card_balance)
)

Methodology

The 270-household population is generated from the generation pipeline against ten cash-flow-stress-relevant archetypes. Each household begins with a baseline financial snapshot, then runs through a longitudinal generator that produces 96 monthly entries respecting income seasonality, expense irregularity, and shock-event triggers. Income volatility is sampled from archetype-specific distributions (gig workers ~40% CV, salaried W-2 ~5% CV) so the realism is calibrated rather than uniform. Shock events are seeded at probabilistic intervals consistent with each archetype's empirically observed life-event base rates. Every record passes the consistency validator (monthly cash flow reconciles with balance-sheet deltas; shock events propagate forward through savings, credit, and behavioral fields) and the LLM-as-judge quality gate. Each refresh re-runs against current minimum wage, gig-platform fee schedules, and unemployment benefit levels.

Included Archetypes (10)

F-02Gig Economy Starter

Formation

A-01Young Family — First Home

Accumulation

A-02Single Parent

Accumulation

A-04Small Business Owner (Early Stage)

Accumulation

S-01Divorce in Progress

Transfer

S-02Bankruptcy Recovery

Formation

S-03Medical Debt Crisis

Accumulation

MB-01First-Time Homebuyer

Accumulation

MB-02Distressed Mortgage / Underwater Homeowner

Accumulation

AR-01Artist / Creative (Royalties & Irregular Income)

Accumulation

Frequently asked questions

How is income volatility calculated in this Data Set?+

Each household's `cash_flow.income_volatility_pct` is computed as the coefficient of variation (standard deviation ÷ mean) of monthly gross income over the household's 96-month longitudinal series — calibrated against the household's own history, not assumed from archetype defaults. This matches the SEC's preferred methodology in fair-lending reviews.

What counts as a 'shock event' in the labeled events?+

Six categories are pre-labeled: job_loss, medical (defined as a single-month medical expense exceeding 50% of monthly income), divorce, expense_spike (a single-month expense exceeding 200% of trailing 12-month average), income_drop (a 6-month moving average drop greater than 20%), and family_caregiver_onset. Other life events (relocation, marriage, child) are also tracked but not classified as 'shocks' — they're in `events.life_events` with their own taxonomy.

Can I use this for a credit-decisioning model?+

Yes — that's a primary use case. The Data License explicitly permits training and validation of credit and underwriting models. Note that the synthetic population is designed to over-represent variable-income and shock-exposed households, so model performance metrics computed on this Data Set should not be extrapolated to a general lending portfolio without re-weighting.

Does the longitudinal data start from a fixed calendar date?+

No. Each household's 96-month series is anchored to a relative month_0 rather than a calendar date, so the corpus doesn't bake in COVID-era assumptions or any specific economic regime. Refreshed corpus versions retain this design — calendar-anchored data is available on request for backtesting against specific historical periods.

How does this differ from B13 (Mortgage Stress Test)?+

B13 focuses on mortgage-specific stress (DTI, LTV, modification eligibility, forbearance) on a smaller 90-household corpus. B04 is broader — household-level cash flow across multiple liability types — and uses the longitudinal contract to expose engines to multi-month dynamics. Many buyers purchase both; B04 first to validate cash-flow logic, B13 to focus on mortgage decisioning specifically.

What's the right format for warehouse ingestion?+

For analytical queries spanning the full 25,920-row month grid (270 households × 96 months), Parquet is recommended — it's columnar and compresses the longitudinal time series efficiently. CSV is also long-format (one row per household-month) and works well in Snowflake or BigQuery. JSON nests the longitudinal array inside each household record, which is best for record-by-record processing but worse for analytical scans.

How often do shock events occur in the corpus?+

Approximately 70% of households experience at least one labeled shock event in the 96-month window; about 30% experience two or more. This is intentionally elevated relative to a general population (where multi-shock households are rarer) because the Data Set is designed to stress-test handling of these events. The shock-frequency calibration is reflected in the labeled event data itself.

Are there households with NO shocks?+

Yes — about 30% of the corpus. These provide the negative-class examples needed for shock-detection models and serve as a control group for behavioural comparisons. Filter on `events.life_events.length === 0` to retrieve them.

Related Wealth Data Sets

B13·$5,000

Mortgage Stress Test Pack

90 households spanning the mortgage lifecycle: first-time buyers stretched to DTI limit, young families with new mortgages, distressed homeowners evaluating modification, and underwater scenarios. Includes complete mortgage application data, current LTV, and reserve adequacy.

B17·$5,000

Small Business / K-1 Tax Pack

140 small business owner households across LLC, S-Corp, and partnership structures. Each carries reasonable-salary calculations, K-1 distributions, QBI deduction modeling (with SSTB classification and phaseout), guaranteed payments, and capital account tracking.

B28·$3,500

Gig & Variable Income Pack

100 households with non-W-2 income patterns: gig workers, creators, freelancers, royalty-income artists, military with variable allowances, and remote-worker digital nomads. Variable cash flow modeling, quarterly estimated tax payments, self-employed retirement plans.

$4,000

one-time purchase

270 households (ZIP)

Methodology PDF

JSON, CSV, Parquet formats

Account required to purchase

Purchases are for internal use only. Redistribution or resale of data is prohibited under the WealthSchema Data License.

View data license →