wealthschema/data sets/cdfi-underbanked-pack
All Data Sets

CDFI / Underbanked Pack

Roughly 1 in 20 US households is unbanked. Another 1 in 7 is underbanked — they have a checking account but still rely on prepaid cards, check cashing, money orders, or payday lenders for some part of their financial life. These customers are not failing to bank; they're banking through a different system, often because the mainstream system has failed them: high overdraft fees, minimum balance requirements, identity-document barriers for ITIN-only filers, language access. The CDFI / Underbanked Pack is 80 synthetic households built for the institutions designing the products that bring this population into mainstream banking — and for the fintech builders who increasingly serve them better than legacy banks do.

Households
80
Archetypes
6
Formats
JSON, CSV
Deviation
Moderate

Why this Data Set exists

Building a banking product for underbanked customers is a category-design problem, not a feature-set problem. The product can't assume Social Security Number, FICO score, employer-direct-deposit, or English-language interface. It has to work for someone whose payment history is on a prepaid card and whose income is irregular cash. Most product teams design for the modal customer first and 'extend' the design to underbanked segments later — which produces products that work for everyone except the segment they were ostensibly designed for.

The data problem compounds the design problem. Production banking data captures the customers who already passed the existing-product filters. The customers you're trying to design FOR are statistically absent from your training set. You need synthetic data calibrated against external sources (FDIC unbanked surveys, CFPB consumer studies, World Bank remittance corridors) that captures the patterns the production data omits.

This Data Set is that calibration. 80 households across the canonical underbanked profiles, with structured banking-relationship history (including the prepaid-card-only or cash-only past), remittance corridor patterns, cash-economy income proxies, and the inclusion-trajectory data — what does the path from prepaid to checking actually look like for someone who's been outside the banking system for a decade? The Data Set provides the realistic answer.

Use Cases

CDFI product design
Prepaid card → checking transition
Remittance corridor analytics
Cash-economy income estimation

Who uses this Data Set

CDFI Product Designer

Designs a checking-account product for underbanked customers using realistic households whose actual financial behaviour (cash deposits, money order use, remittance sending) determines what the product needs to support — feature-by-feature.

Fintech Builder Targeting Underbanked Customers

Validates the onboarding flow against ITIN-only prospects, language-preference variants, and customers who don't have a US address history sufficient for a standard CIP — ensuring the product works for the segment, not just for an idealised version.

Remittance-Corridor Analyst

Models remittance behaviour by destination corridor (US→Mexico, US→Philippines, US→India, US→Vietnam) using households whose sending patterns reflect the regional differences in destination-country banking infrastructure.

Bank Inclusion-Strategy Lead

Tests the bank's 'second-chance' checking product against post-ChexSystems profiles (households with prior account closures) to find friction in the re-banking process and demonstrate inclusion programs to regulators.

Compliance / BSA Officer at a CDFI

Tests AML / suspicious-activity logic against the kinds of cash-deposit and remittance-sending patterns that legitimately occur in underbanked customer populations, ensuring the firm's controls don't generate false-positive SAR filings on lawful behaviour.

What's inside

The 80 households are drawn from six archetypes: F-04 First-Generation Wealth Builders, S-02 Bankruptcy Recovery, U-01 Unbanked / Recently Banked, U-02 Low-Income Working Families, U-03 Recent Immigrants, and N-04 Cannabis-Industry Workers. The mix is deliberate — about 30% currently unbanked or recently banked, 35% in the prepaid-card-dependent category, 20% post-ChexSystems looking for re-entry to mainstream banking, and 15% cash-economy participants whose income flows mostly outside formal banking.

Every household has structured banking-relationship history: account types currently held (checking, savings, prepaid, money-order purchases), account-closure history with reason codes, prior ChexSystems flags where applicable, and the 24-month trajectory showing whether banking-relationship age is increasing or decreasing. Remittance flows are structured by corridor (origin–destination country pair), frequency, channel (bank wire, MTO like Western Union or Remitly, increasingly crypto via stablecoin), and typical amount. Cash-income proxies use the structured methodology with confidence intervals: documented income from W-2 / 1099, plus an estimated cash-income component, plus an explicit uncertainty band.

The Data Set ships as JSON and CSV. The WealthSynth Methodology PDF documents the banking-access taxonomy (FDIC's six-category framework: fully banked, underbanked, unbanked, recently banked, recently unbanked, never banked), the remittance-corridor data sources (World Bank, FDIC, CFPB), and the cash-income estimation methodology with the IRS tax-gap calibration.

Preview a sample household

A redacted summary of one household from this Data Set — names, employers, exact balances, and metro area are stripped. Ages are bucketed, income and net worth are reported as bands. The full record (and all 80 like it) ships in the ZIP.

F-04·First-Generation Wealth Builder
representative archetype household
Household
Single
State
WI
Gross income (band)
<$50k
Net worth (band)
Dependents
0
Income source types
w2 salary, w2 bonus
Members (1)
primary
Age 25–29
technology

Technical Highlights

Banking access taxonomy
Remittance corridor tracking
Cash income proxies
Inclusion-trajectory modeling

Sample Schema Fields

sample_record.json
{
  "banking.account_type": <value>,
  "banking.prepaid_card_history": <value>,
  "banking.check_cashing_use": <value>,
  "income.remittance_outflows": <value>,
  "credit.thin_file_flag": <value>
}

Sample queries

Surface re-banking opportunities (post-ChexSystems)

Returns households flagged in ChexSystems whose flag is more than 5 years old (often the cooldown for re-entry to mainstream banking) and whose 24-month banking trajectory is improving — prime candidates for a second-chance checking offer.

households.filter(h =>
  h.banking.chexsystems_flagged &&
  h.banking.chexsystems_age_years > 5 &&
  h.banking.relationship_trajectory_24mo === 'improving'
)
Identify remittance-heavy customers by corridor

Returns households whose monthly remittance outflow exceeds 10% of household income, grouped by destination corridor — useful for sizing remittance-product opportunities and partnering with destination-country banks.

households.filter(h =>
  h.income.remittance_outflows.monthly > h.income.documented_annual / 12 * 0.10
).reduce((acc, h) => {
  const corridor = h.income.remittance_outflows.primary_corridor;
  acc[corridor] = (acc[corridor] || 0) + 1;
  return acc;
}, {})
Find prepaid-card → checking transition candidates

Returns households whose primary banking account is currently a prepaid card but whose monthly load amount and direct-deposit pattern would qualify them for a no-minimum checking product.

households.filter(h =>
  h.banking.account_type === 'prepaid' &&
  h.banking.prepaid_card_history.monthly_load_avg >= 1000 &&
  h.banking.prepaid_card_history.direct_deposit_present
)
Compute cash-economy share of household income

For each household, returns the ratio of estimated cash income to total income with the confidence interval — surfaces the populations whose financial profile is mostly invisible to traditional banking systems.

households.map(h => {
  const total = h.income.documented_annual +
                h.income.cash_income_estimate.value;
  return {
    id: h.id,
    cash_share: h.income.cash_income_estimate.value / total,
    ci_band: [
      h.income.cash_income_estimate.ci_low / total,
      h.income.cash_income_estimate.ci_high / total
    ]
  };
}).filter(x => x.cash_share > 0.20)

Methodology

The 80 households are generated against FDIC-aligned banking-access categorisations, with archetype-specific distributions calibrated from the FDIC's 2023 Survey of Unbanked and Underbanked Households, the CFPB's consumer studies on cash-economy participation, and the World Bank's bilateral remittance flow data. Cash-income estimates use the same structured methodology as B23 — a sampling-based approach with confidence intervals calibrated against the IRS's tax-gap analysis. Banking-relationship trajectories are generated as 24-month sequences with realistic transition probabilities (a household moves from prepaid to checking with non-zero but small monthly probability conditional on the right preconditions). The corpus passes the WealthSynth consistency validator and LLM-as-judge gate, with additional review by an inclusive-finance specialist to verify the structures align with current FDIC, OCC, and CFPB guidance on underbanked-customer segmentation. Annual refresh tracks the FDIC's annual unbanked survey results.

Included Archetypes (6)

Frequently asked questions

Why is this Data Set smaller than B23 (100 vs. 80)?+

B29 is more focused. B23 covers the full underserved-lending population including thin-credit-file mainstream borrowers; B29 zeroes in on banking-access specifically — unbanked, underbanked, recently banked, and ChexSystems-flagged households. The smaller size reflects the narrower targeting, not lower fidelity.

Are remittance corridors realistic by destination country?+

Yes. Remittance corridor distributions are calibrated from World Bank bilateral flow data and CFPB studies on US-side sender behaviour. The dominant corridors in the corpus reflect actual US emigration patterns — Mexico, Philippines, India, China, Vietnam, Dominican Republic, Guatemala — with realistic frequency and channel distributions.

Does the corpus include crypto-based remittances?+

Yes — about 12% of the remittance-sending households use stablecoin-based remittance channels (USDC via apps like MoneyGram's Stellar integration, or peer-to-peer through exchanges). This reflects the actual emerging-corridor share where stablecoin remittances are now economically dominant. The structured remittance data includes the channel field so you can filter for crypto-rail flows specifically.

How do you avoid stereotyping in the underserved-segment archetypes?+

The archetypes are designed around financial-behaviour profiles, not demographic identity. Recent immigrants in the corpus span multiple regions and education levels; cash-economy participants include licensed contractors, agricultural workers, and small-shop owners; post-bankruptcy households cover a wide income range. The Methodology PDF documents the calibration sources so you can verify the populations are evidence-based rather than caricatured.

Is the unbanked share of the corpus realistic?+

The corpus over-represents unbanked and underbanked households relative to the general population (where unbanked is ~5%, underbanked ~14%) because the Data Set is designed specifically for inclusive-finance product development. Anyone using these for population-level statistics should re-weight by the FDIC base rates.

Are minor-language preferences captured?+

Each household has a `demographics.language_preference` field with the household's preferred language for financial communications. The distribution reflects FDIC and CFPB data on US underbanked-population language preferences — Spanish is the largest secondary language, with smaller representations for Vietnamese, Mandarin, Tagalog, Korean, and others. This is intended for product-localisation planning, not for routing decisions that could create disparate-impact concerns.

Can I use this Data Set for AML / SAR rule design?+

Yes — that's a primary use case. The remittance-flow patterns, cash-deposit cycles, and account-relationship histories provide the realistic test fixtures for tuning AML rules so they don't false-positive on lawful underbanked behaviour. The Methodology PDF includes a section specifically on AML rule calibration against this corpus.

How does this fit alongside B23 (CRA / Underserved Lending)?+

B23 is for lending: credit decisions, alternative data, second-chance loans. B29 is for banking access: checking products, prepaid migration, remittance, AML. CDFIs and inclusive-finance fintechs typically buy both. If you're choosing one, B23 if your product is lending; B29 if your product is transactional or banking-relationship.

Related Wealth Data Sets

$3,500
one-time purchase
80 households (ZIP)
Methodology PDF
JSON, CSV formats
Account required to purchase

Purchases are for internal use only. Redistribution or resale of data is prohibited under the WealthSchema Data License.

View data license →